Lab 4: Data Representation and Compression
This lab addresses representation of digital data as binary sequences. The first two pages introduce bits, bytes, and binary sequences and how they are used to store text, images, and other data. The following two pages explain how a binary sequence can represent numbers—both integers and non-integers—and in particular, explain floating point as an approximation for real numbers. These are followed by a page covering binary number representation. Binary is a place value system just like decimal representation; binary uses only two digits (0 and 1) instead of ten (0 through 9). The last page addresses data compression and the difference between lossy and lossless data compression formats, which lose and preserve original information respectively.
Note that there's no such thing as a "binary number." The way we represent a number—as XXXIRoman or 3110 or 111112 or 1F16 or "thirty-one" or "treinta y uno"—is language or notation. The number itself is the same regardless of how we express it. It helps the distinction to say "binary numerals" or "binary representation" instead of "binary numbers." Students understand the conversions better when they remain clear that only the notation, not the number, changes. Explicitly making the analogy with Roman numerals may help; students are likely to understand and be able to extend from the idea that XXXI and 31 are the same number written different ways.
Pacing
The 6 lab pages could be split across 4–7 days (
150–300 minutes). Expected times to complete follow:
Lab Pages
-
Page 1: Bits.
-
Learning Goals:
- Understand that data is represented using bits.
- Understand that n bits are needed to represent 2n values.
-
Tips:
- Don't talk about "binary" on this page. Students should not be thinking about place value but only about questions such as "how many bits do you need to represent the colors of the rainbow?" or "how many bits do you need to represent the months of the year?"
-
Be sure students understand the tip in the orange box before moving on to exercise 7, "how many values can be represented in a 64-bit word?" Every ten bits of multiplies the number of possible values by about 1000; so 60 bits would offer a thousand thousand thousand thousand thousand thousand (six of them) possible values, and the remaining four bits represent another factor of 16. If your students are already proficient with exponentials, they may not need this trick and can instead see that doubling from 32 to 64 bits squares the number of values, so the answer is about 4 billion squared or 16 billion billion (16 quintillion).
- Instead of trying to make sense of thousand thousand thousand thousand thousand thousand, they can just count off six thousand-group names: thousand, million, billion, trillion, quadrillion, quintillion.
- The student page says that 8-bit bytes are used mainly to represent text characters, and that's true in countries with European languages, but for representing emoji, mathematics, and languages with accents or ideograms, more than one byte per character may be required. These details are presented as optional reading hidden in a yellow box between problems 9 and 10.
-
Page 2: Binary Sequences.
-
Learning Goals:
- Understand that the meaning of a string of bits depends on the context in which it is used.
-
Tips:
- This page uses the phrase "binary sequences" as a technical term but otherwise doesn't talk about binary.
- Students may reasonably ask, "Is a kilobyte 1000 bytes, or 1024 bytes?" This turns out to be a complicated question: If we're talking about the RAM, a kilobyte is 1024 bytes, but if we're talking about a disk, it's 1000 bytes. In an attempt to resolve this ambiguity, the International Electrotechnical Commission has introduced new metric-ish names for the powers of two wherein 1024 bytes a "kibibyte" or KiB. These new power-of-two names all have "bi" replacing the second syllable of the metric name: kibi, mebi, gibi, tebi, pebi, etc.
-
Page 3: Representing Whole Numbers.
-
Learning Goals:
- Understand that the choice of representation has implications for the kinds of numbers that can be stored and how precisely they can be stored.
- Understand that the standard mechanism of storing numbers, fixed width representation, limits the size of a number.
- Understand that fixed width representation has killed people and that better representations (such as bignums) are available.
-
Tips:
- We discuss only nonnegative integers, except to say that half the integers are negative so the maximum value in a single word is 263 rather than 264.
- After students work through the lab page, you might review by asking, "Why did Snap! display
20!
in ordinary, whole number representation but 30!
in scientific notation (with bignums turned off)?"
-
Page 4: Floating Point.
-
Learning Goals:
- Understand that floating point is a representation for storing numbers that is analogous to scientific notation.
- Understand that, in many languages, if a whole number is above a certain size, it will be stored as a floating point value and therefore be rounded.
-
Tips:
- This page includes rational numbers. In case it comes up, note that any rational number can be represented exactly as a ratio, a fraction (a pair of integers, e.g., 1/3), but irrational numbers cannot.
- You don't have to say this to students, but knowing it may help you answer their questions: Since the floating point format has a finite number of bits, and since irrational numbers require infinitely many digits, every representable number is rational. Irrational numbers can only be approximated in floating point. (Note that the converse isn't true: not every rational number is representable.)
- This page explains that other data types, including lists, are represented as binary sequences but not how they are represented (e.g., pointers to memory addresses).
-
Page 5: Binary Representation.
-
Learning Goals:
- Understand how numbers are stored in binary notation.
- Translate numbers between binary and decimal notations.
-
Tips:
- Again, don't say "binary number" or "decimal number." Binary and decimal are notations, and the instances of those notations are numerals, not numbers. It's the same number no matter how it's represented.
- You may be tempted to teach students to do binary arithmetic, but the goal is for students to learn to translate between representations. Binary arithmetic is not required for the AP exam and may confuse students learning translation, which is likely to appear on the exam.
- Technically, what's happening in these algorithms is not "translating binary to decimal" or "translating decimal to binary." It's translating binary representation to and from a number, which has no base. To see this, go through the algorithms and note that they never consider the digits of a "decimal" number. They just do arithmetic on the numbers. People say "binary to decimal" and "decimal to binary" because we have to write down the numbers some way, and the way everyone knows is decimal notation. By contrast, the algorithms do pay attention to the individual digits and place value of the binary notation. Some students who are confused by the usual language might do better with this explanation.
-
Page 6: Data Compression.
-
Learning Goals:
- Understand lossless vs. lossy compression and when each is appropriate.
-
Tips:
- We use run-length encoding as an example of lossless compression because this algorithm, unlike the one used for PNG files, is easy for students to understand. If you want to present a similarly simple algorithm for lossy compression, ask students to consider reducing the set of possible colors by restricting each of red, green, and blue to the (hexadecimal) values 00, 11, 22, ... dd, ee, and ff. For example, the color "cinnamon" would be rounded to dd6622, and then stored in files as d62. (Browsers actually do accept these three-hex-digit color codes.) Each color number would then take two bytes (including transparency) instead of four. You can see that that d62 is a little different from the real "cinnamon" (d2691e), but this restriction is hardly noticeable in real pictures:

BJC Videos from UC Berkeley
No YouTube access at your school?
-
Abstraction: Numbers
- The first 3 minutes and 14 seconds covers converting binary to decimal (most related to page 5). The remainder of the video covers hexadecimal, which is not covered in this lab, and compares these notations and covers examples of their uses.
- If you share this video with students...
- You may wish to validate students' mathematical experience by remarking that not everyone learns base 10 place values in kindergarten, as stated at 1:07.
- There is a mistake in the video at 2:51. You might ask students to watch carefully to find it.
-
Abstraction: Base Conversion
- The first 3 minutes and 14 seconds covers converting decimal to binary (most related to page 5). The remainder of the video covers hexadecimal, which is not covered in this lab.
- At this point in the course, this is not your student's first algorithm, as stated at 0:13
Solutions
Correlation with 2020 AP CS Principles Framework
Computational Thinking Practices: Skills
- 1.D: Evaluate solution options.
- 2.B: Implement an algorithm in a program.
- 3.C: Explain how abstraction manages complexity.
Learning Objectives:
- DAT-1.A: Explain how data can be represented using bits. (3.C)
- DAT-1.B: Explain the consequences of using bits to represent data. (1.D)
-
DAT-1.C: For binary numbers:
- Calculate the binary (base 2) equivalent of a positive integer (base 10) and vice versa. (2.B)
- Compare and order binary numbers. (2.B)
- DAT-1.D: Compare data compression algorithms to determine which is best in a particular context. (1.D)
Essential Knowledge:
- DAT-1.A.2: Computing devices represent data digitally, meaning that the lowest-level components of any value are bits.
- DAT-1.A.3: Bit is shorthand for binary digit and is either 0 or 1.
- DAT-1.A.4: A byte is 8 bits.
- DAT-1.A.6: Bits are grouped to represent abstractions. These abstractions include, but are not limited to, numbers, characters, and color.
- DAT-1.A.7: The same sequence of bits may represent different types of data in different contexts.
- DAT-1.A.8: Analog data have values that change smoothly, rather than in discrete intervals, over time. Some examples of analog data include pitch and volume of music, colors of a painting, or position of a sprinter during a race.
- DAT-1.A.9: The use of digital data to approximate real-world analog data is an example of abstraction.
- DAT-1.A.10: Analog data can be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples. The samples are measured to figure out the exact bits required to store each sample.
- DAT-1.B.1: In many programming languages, integers are represented by a fixed number of bits, which limits the range of integer values and mathematical operations on those values. This limitation can result in overflow or other errors.
- DAT-1.B.2: Other programming languages provide an abstraction through which the size of representable integers is limited only by the size of the computer's memory; this is the case for the language defined in the exam reference sheet.
- DAT-1.B.3: In programming languages, the fixed number of bits used to represent real numbers limits the range and mathematical operations on these values; this limitation can result in round-off and other errors. Some real numbers are represented as approximations in computer storage.
- DAT-1.C.1: Number bases, including binary and decimal, are used to represent data.
- DAT-1.C.2: Binary (base 2) uses only combinations of the digits zero and one.
- DAT-1.C.3: Decimal (base 10) uses only combinations of the digits 0 – 9.
- DAT-1.C.4: As with decimal, a digit’s position in the binary sequence determines its numeric value. The numeric value is equal to the bit's value (0 or 1) multiplied by the place value of its position.
- DAT-1.C.5: The place value of each position is determined by the base raised to the power of the position. Positions are numbered starting at the rightmost position with 0 and increasing by 1 for each subsequent position to the left.
- DAT-1.D.1: Data compression can reduce the size (number of bits) of transmitted or stored data.
- DAT-1.D.2: Fewer bits does not necessarily mean less information.
- DAT-1.D.3: The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied.
- DAT-1.D.4: Lossless data compression algorithms can usually reduce the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
- DAT-1.D.5: Lossy data compression algorithms can significantly reduce the number of bits stored or transmitted but only allow reconstruction of an approximation of the original data.
- DAT-1.D.6: Lossy data compression algorithms can usually reduce the number of bits stored or transmitted more than lossless compression algorithms.
- DAT-1.D.7: In situations where quality or ability to reconstruct the original is maximally important, lossless compression algorithms are typically chosen.
- DAT-1.D.8: In situations where minimizing data size or transmission time is maximally important, lossy compression algorithms are typically chosen.
- AAP-1.A.3: Some programming languages provide types to represent data, which are referenced using variables. These types include numbers, Booleans, lists, and strings.
- AAP-1.A.4: Some values are better suited to representation using one type of data rather than another.