Data Compression
On this page, you learn about different data compression algorithms.
DAT-1.D.1
The size of data (the number of bits required to store it) affects the time it takes to send that data across the Internet. So, people use data compression algorithms to reduce the size of images, sounds, movies and some other kinds of data.
DAT-1.D.3
The amount of size reduction depends on two things:
- the amount of redundancy in the original data
- the compression algorithm applied
There are two broad categories of data compression algorithms: lossless and lossy, depending on whether information is lost.
:
Lossless Compression
Lossless data compression algorithms (such as PNG) are reversible (there is no loss in quality); you can reconstruct the original data.
DAT-1.D.4
Lossless compression works by removing redundant data. These algorithms can usually reduce the number of bits required to store or transmit the data while guaranteeing that the original data can be perfectly reconstructed.
Run-length encoding is an example of lossless compression. Consider the 158 pixels in the top row of the BJC logo (at right). The first 60 pixels are white. Then come five pixels of yellowish orange (the top slice of the "b"). And the rest of that row is white.
......
Instead of storing all 158 pixels individually, we could compress them with run-length encoding and just store six values (three numbers and three colors):
pixel count |
color code |
60 |
FFFFFF |
5 |
E5A84A |
93 |
FFFFFF |
DAT-1.D.2
Those six values (60, FFFFFF, 5, E5A84A, 93, FFFFFF) can be reconstructed into that whole first row of the image (158 pixels). So, fewer bits does not necessarily mean less information.
:
Lossy Compression
Lossy data compression algorithms are not fully reversible; you can reconstruct only an approximation of the original data.
DAT-1.D.5
Lossy Compression works by removing details that people aren't likely to notice. The most commonly used lossy compression algorithm for pictures is called JPEG (or JPG, both pronounced "jay peg" for "Joint Photographic Experts Group," the committee that invented it). JPEG works by preserving most of the brightness information for each pixel (since human eyes are sensitive to that) and performing a kind of averaging process to the color information (because human eyes aren't as good at distinguishing color, especially colors close to white).
Below are an original, uncompressed picture of pebbles in a pond and a highly compressed JPEG of the same image. Can you tell which is which?
You probably can tell which is which, especially if you looked for sharp edges or very shiny spots. But the compressed file uses 1/30th of the space used by the original, and you could still tell that it's a picture of rocks. So, for many purposes the compressed version would be good enough. Lossy algorithms usually let you control the degree of precision, and generally, people select less extreme compression settings, so the compressed file looks much more like the original than this example.
What size is this file when encoded in different formats?
Here are the sizes of the pond pebbles picture in four different formats:
format |
size |
BMP encoding every pixel individually (shown above) |
148 kB |
PNG |
106 kB |
JPEG with least compression |
94 kB |
JPEG with most compression (shown above) |
5 kB |
The MP3 format, which you almost certainly use for portable music files, is a lossy compression format. It tends to emphasize high frequencies, so people accustomed to MP3 music find uncompressed versions of the same music boomy (bassy).
Which is best?
Both types of data compression exist because each is useful in certain circumstances:
DAT-1.D.7
- Lossless compression is a good choice when there are very sharp transitions between colors (such as in logos) or when it's essential to be able to recreate original data precisely (such as the code for a program or the text of a book).
DAT-1.D.6, DAT-1.D.8
- Lossy compression is a good choice when the data does not require precision (such as images, sound, or movies, which people may not even notice have been compressed) and when reducing number of bits stored or transmitted is most important.
-
These questions are similar to those you will see on the AP CSP exam.
A film student records a movie on his smartphone and then saves a copy on his computer. He notices that the saved copy is of much lower image quality than the original. Which of the following could NOT be a possible explanation for the lower image quality?
The movie was saved using fewer bits per second (a lower bit rate) than the original movie.
This is likely what happened. Which one could NOT be a possible explanation?
The copy of the movie file was somehow corrupted in the process of saving.
This is possible; however if the file is corrupted, it is unlikely to have a consistent negative impact on image quality.
The movie was saved using a lossy compression technique.
This is very likely. Which one could NOT be a possible explanation?
Whenever a file is saved from one computer to another, some information is always lost.
Correct. It is possible to make exact duplicates of digital information without any loss.
A visual artist is processing a digital image. Which of the following describe a lossless transformation from which the original image can be recovered? Choose two answers.
Creating the negative of an image, where colors are reversed (dark areas appear light).
Correct. This transformation is reversible and is an example of a lossless transformation.
Blurring the edges of an image.
The blurring blends colors at the edges of the image and once colors have blended it is impossible to retrieve the original RGB values of the pixels involved.
Creating a grayscale copy of an image.
The grayscale of an image replaces each RGB value with their average and once the amounts of red, green, and blue have been averaged together, it is impossible to retrieve the original RGB values of the pixels.
Creating a vertically flipped copy of the image.
Correct. This transformation is reversible and is an example of a lossless transformation.
DAT-1.D
For which of the following kinds of data would lossy compression be okay? Check as many as apply.
The HTML code for this web page.
Would you be happy if some of the words on the page disappeared?
Your computer's desktop picture.
Correct. The picture could have a few wrong pixels and would still look okay.
A live-action movie on Netflix.
Correct. The movie could have a few corrupted frames and would still look okay.
A cartoon on Netflix.
Actually, corrupted frames are more noticeable in a cartoon, which has solid areas separated by sharp edges. (Fortunately, using techniques such as run length encoding, it's relatively easy to get a lossless, highly compressed version of a cartoon.)
A digital book, to be read on a computer.
Digital books aren't stored as pictures, but as text. Any error will be noticeable, as gibberish characters on the page.