This picture of the BJC logo (shown right) is 158 pixels wide and 186 pixels tall, for a total of 29,388 pixels. The BMP (bitmap) format includes each of those pixels in the picture file, at four bytes per pixel, so the file size is about 120kB.
That is an inefficient way to store the information. Think about the 158 pixels in the top row. The first 60 or so are white. Then come five pixels of yellowish orange (the top slice of the "b"). And the rest of that row is white.
......
Instead of storing all 158 pixels individually, we could compress them with run-length encoding and just store six values (three numbers and three colors):
pixel count | color code |
---|---|
60 | FFFFFF |
5 | E5A84A |
93 | FFFFFF |
These days, the size of one picture isn't so significant, but think about every frame of a movie, and think about the time required to send the information over the Internet. Compression makes it easier to stream that movie to you.
Lossless compression means that no information is lost.
Run-length encoding is a lossless compression format; it doesn't lose any information. The original picture can be reconstructed with every pixel exactly correct. But run-length encoding doesn't do well if the picture is a photograph where every pixel may be (at least slightly) different in color from its neighbors. If the length of each color run is just one pixel, both run length and color will take twice as much space as just storing the color of each pixel. Another lossless image format you may have heard of is PNG (Portable Network Graphics, pronounced "ping").
Lossy means that some information is lost.
Lossy compression algorithms let file sizes be even smaller, but the original picture can't be perfectly reconstructed; information is lost. This would be terrible if these algorithms were used to compress a computer program or a novel, but people's perception of images do not require extreme precision. Similarly, sounds and movies can survive lossy compression without most people noticing.
The most commonly used lossy compression algorithm for pictures is called JPEG (or JPG, both pronounced "jay peg" for "Joint Photographic Experts Group," the committee that invented it). Lossy algorithms usually let you control the degree of precision.
Below are an original, uncompressed BMP and a highly compressed JPG of a picture measuring 256×192 pixels. Can you tell which is which?
Here are the sizes of this picture in four different formats:
format | size |
---|---|
BMP encoding every pixel individually (shown above) | 148 kB |
PNG | 106 kB |
JPEG with least compression | 94 kB |
JPEG with most compression (shown above) | 5 kB |
Lossless compression (like PNG) is reversible (no loss in quality); you can reconstruct the original data. It works by removing redundant data.
Lossy compression (like JPG) is not fully reversible; you can only reconstruct an approximation of the original data. It works by removing details that people aren't likely to notice.