Unit 4 Lab 4: Data Representation and Compression, Page 2

So far we've been working with small chunks of data, from Boolean values (one bit) to characters (eight bits). But of course some information in your computer or smartphone is much bigger than that. For starters, characters aren't generally used one at a time; they're used in text strings such as "Welcome to the Beauty and Joy of Computing." These 43 characters occupy 43 bytes of computer memory. But the real champion users of space are media files: pictures, sounds (mostly music), and video.

If we could see inside the memory's bits, a section of the memory might look something like this:

That shows just 449 bits. A 16GB cell phone has 16 gigabytes (about 16 billion bytes) of storage with each byte containing 8 bits. That's 128,000,000,000 bits. Printed on paper as ones and zeros, the 16GB phone's memory would take nearly 40,000,000 pages. The information in storage—whether it is a text message, a photograph, a song, a computer program, or a list of phone numbers—all looks the same, like a sequence of bits that are either On or Off (one or zero), a binary sequence.

How much information fits in a gigabyte?

Here are a few rough examples of what kind of data would fit in how much memory:

name	amount	example
bit	either a 1 or a 0	1
byte	8 bits	11011001
kilobyte	2¹⁰ (1,024) bytes	a couple of paragraphs
megabyte	2²⁰ (1,048,576) bytes	about 1 book
gigabyte	2³⁰ (1,073,741,824) bytes	a little more than 1 CD
terabyte	2⁴⁰ (1,099,511,627,776) bytes	about 1,500 CDs
petabyte	2⁵⁰ (1,125,899,906,842,624) bytes	about 20 million filing cabinets of text
exabyte	2⁶⁰ (1,152,921,504,606,846,976) bytes	about 20% of all the words ever spoken by humankind

As we write this in 2017, it's common to have a terabyte disk drive on your desk. Web services deal with petabytes or exabytes of data.

Where do these prefixes like "tera-" and "peta-" come from?

When we write big numbers, we put commas every three digits (counting from the right). Each group of three has a name: thousand, million, billion, and so on. So, the number 1,234,567,890 is pronounced "one billion, 234 million, 567 thousand, 890." Those group names ("thousand" and so on) also have prefix names used in metric measurements:

prefix	amount	amount as numeral
kilo-	thousand	1,000
mega-	million	1,000,000
giga-	billion	1,000,000,000
tera-	trillion (a million million)	1,000,000,000,000
peta-	quadrillion	1,000,000,000,000,000
exa-	quintillion (a billion billion)	1,000,000,000,000,000,000

Digits for groupings smaller than one (fractions) have metric prefixes too:

prefix	amount	amount as fraction
milli-	thousandth	1/1,000
micro-	millionth	1/1,000,000
nano-	billionth	1/1,000,000,000
pico-	trillionth	1/1,000,000,000,000
femto-	quadrillionth	1/1,000,000,000,000,000
atto-	quintillionth	1/1,000,000,000,000,000,000

The fractional names are used to measure times in the computer, such as a nanosecond memory access time, or distances between wires on a chip, which are measured in nanometers.

"Binary sequence" is a very broad category, and often, several layers of abstraction are built on it. For example, you can include a picture in an email or text message, in which case, the message includes a picture, which is a kind of file, which is a kind of binary sequence.

It's unclear what to do here. It might be better to give them some strings to translate. --MF, 6/1/20

Brian has more or less convinced Mary to change the block names from "convert decimal to binary" to "convert number to binary" and vice versa AND they both agreed to drop the recursion inside the convert number to binary block. The starter project, solutions, and references to it in the page will all need edits. We haven't decided if changes are needed on the Binary Representation page. --MF, 6/5/20

Take a look at these 3 custom blocks that you will use to explore binary sequences:
- A reporter that accepts a string of text as input and translates that text into a binary sequence:
- A reporter that accepts a binary sequence as input and translates it into text:
- You can use the second and third inputs to control where the block breaks the sequence to start a new line and also how large the image is drawn.
  A command block that accepts a binary sequence as input and draws a black and white image on the stage where each 0 in the sequence becomes a white "pixel" and each 1 becomes a black "pixel."
Translate a short text string into a binary sequence.
- Find the set (output) to... instruction and change the input text to a short text string of your choosing. The reported binary sequence will be stored in the output variable with quotes around it.
- Access the output by right-clicking (or control-clicking) on the OUTPUT watcher on the stage and choosing "export..." The binary sequence will download as a text file. Copy the binary sequence out of the file, but not the quotes.

Even Snap! has bugs. When you paste this data into Snap!, it may extend beyond the edges of the box. Developers are working to fix this.
image of output pasted into block and extending past the edges of the block

image of output pasted into block and extending past the edges of the block

Paste the outputted binary sequence into the translate binary sequence to text block and run it. (It may take a moment to report.)
- Is your original text reported back? (If not, you may have included the quotes or lost a bit or two while copying.)
- Once you've gotten your original text to report back, try making some changes.
  - What happens if you change one bit?
  - What happens if you add a bit somewhere in the middle of the sequence?
  - What happens if you add a bit at the beginning?

DAT-1.A

Go back to the exported output.txt file and copy your original binary sequence again (without the quotes). Paste it into the translate binary sequence to B&W image block and run the block. You are not likely to see anything meaningful. Why not?
Try this binary sequence in translate binary sequence to B&W image with the second input set to 14 pixels wide:

00000110000000000001000110000000010000000000001100100110000011111111000001100111100000010010110011000111001111100000100110110000000001000000000000110000000000111000000011000100011000010000000100000110000110000000111111000000
You see should something like the BJC logo:

DAT-1.A

What do you get if you translate that same binary sequence into text? Why?

Not all data are naturally digital. (That is, they may not be individual values that can be represented in the form of binary sequences.) Some real-world values (such as the pitch and volume of music, the colors of a painting, or the position of a sprinter during a race) change smoothly over time or position; they are analog. When analog data are encoded digitally (as bits on a computer), their values are approximated. This is an example of abstraction. The continuously changing air pressure of a sound, for example, is sampled (measured) thousands of times a second, and the samples are stored as bits.

But different languages use data types differently. In high-level languages, that data type code is attached to the value itself. In lower-level languages, when you make a variable, you have to say what type of value it will contain, and the data type is attached to the variable, so you can't get exact answers when the values are integers and also be able to handle non-integer values of the same variable. So instead of seeing
script variables (foo)

you see things like
integer (foo)

Snap! has strengths that many programming languages do not, and it's very likely that your next year's computer science class will use one of those other languages. If that's the case, you'll have to make sure that the data type you declare for a variable matches what you are going to put in it.

This question is similar to those you will see on the AP CSP exam.

A particular online retail company uses 9-bit binary sequences to identify each unique product for sale. Expecting to increase the number of products it sells, the company is planning to switch to 10-bit binary sequences. Which of the statements below best describes the consequence of using 10-bit sequences instead of 9-bit sequences?

Two more products can be identified uniquely.

Compute how many products can be identified before and after the change.

Ten more products can be identified uniquely.

Compute how many products can be identified before and after the change.

Twice as many products can be identified uniquely.

Correct. Before 2⁹=512 products could identified and now 2¹⁰=1024 products can be identified.

Ten times as many products can be identified uniquely.

Compute how many products can be identified before and after the change.

Look inside the translate text to binary sequence and translate binary sequence to text reporters. Describe how these two reporters work. There are several custom blocks inside:
- pack 8-bit byte takes a binary sequence of 8 bits or less and add enough zeros to the front to make a whole byte. How is this used?
- translate text to Unicode list takes a text string and outputs a list of each character's Unicode value. Why is a list output helpful here?

Binary Sequences