In this lab, you will explore how different kinds of information are represented in a computer.
On this page, you will learn about bits, the basic units of data in computing.
A bit is a single unit of data that can only have one of two values. We usually represent the two values as 0 (off) and 1 (on).
As you probably know, information travels over wires inside the computer, and each wire is either on or off, with no intermediate states allowed. This small piece of information is called a bit, the smallest possible unit of information in the digital domain.
What does the value of a bit mean? By convention, the two states of a bit are interpreted as 0 and 1, but that doesn't mean they have to represent numbers. A single bit can represent
But what if the traffic light also needs a yellow value? It's tempting to say that, for example, 0 volts on the wire means red, 1 volt means yellow, and 2 volts means green. Long ago, there were computers that worked that way, but there are good reasons to stick with two possible values per wire.
The fundamental building block of computer circuitry is the transistor. In a digital computer, the input to a transistor is either zero or whatever voltage represents one. But electrical circuits aren't perfect; the input may be a little larger or smaller than it should be.
This is a rough graph of the actual input-output behavior of a transistor. Don't worry about the details; just notice the two blue flat parts of the graph. Within the "cutoff" region, small changes to the input voltage do not change the output voltage at all; the output is always zero volts. Likewise within the "saturation" region, small input changes don't affect the output voltage; this output is interpreted as a one. This is how transistors are used as switches in a computer. If there were three flat parts of the curve, maybe we would have three possible values for each wire.
Transistors are versatile devices. When used in the middle, linear (pink) part of the graph, they're amplifiers; a small variation in input voltage produces a large variation in output voltage. That's how they're used to play music in a stereo.
first bit | second bit | meaning |
---|---|---|
0 | 0 | red |
0 | 1 | yellow |
1 | 0 | green |
1 | 1 | (unused) |
There are four possible combinations of two bits, so with two bits we can represent up to four different values, even though we only need three for the traffic light.
Each added bit doubles the number of values you can represent. This means that representing complex situations doesn't cost a lot of hardware; ten bits is enough to represent over 1000 distinct values.
A byte is eight bits.
A word is a sequence of however many bits the CPU processes at a time. As of 2017, words are 32 or 64 bits.
Bits aren't expensive, but what is expensive is the circuitry to let the programmer use exactly the smallest number of bits for a particular problem.
How many distinct values can be represented in 32 bits? You don't have to memorize the answer, because you can quickly approximate it using the fact that 210 = 1024, which is about 1000. This means that every ten bits of width multiplies the number of values that can be represented by about 1000. So, 10 bits allows about a thousand values, 20 bits is about a million values, 30 bits is about a billion, and 32 bits allows over four billion values (because we double the billion two more times for the difference between 30 and 32).
You might find this trick helpful on the AP exam.
Four billion values sounds like it ought to be enough, but it's not if you're an astronomer or a banker (or Google or Facebook). That's why we now have 64-bit computers, which as of 2019 are the standard. (Apple has just removed support for 32-bit programs from MacOS.)
The main use of eight-bit bytes is to represent characters of text.
Computers used six-bit-wide character codes for many years, but to have both UPPER CASE and lower case letters and punctuation requires seven bits. The first officially recognized character encoding was the seven bit ASCII (American Standard Code for Information Interchange) character set. It included an optional eighth bit for error detection, which was taken over to include accented characters in Spanish, French, German, and some other European languages. For example, there is an accented character in the name of the main developer of Snap!, Jens Mönig, who is German. (The closest English sound is the "u" in "lunch.")
As the use of computers and the Internet spread around the world, people wanted to be able to write Chinese, Japanese, Arabic, Kabyle, Russian, Tamil, etc. The Unicode character set supports about 1900 languages, using 32 modern alphabets and 107 historical alphabets that are no longer in living use. The complete Unicode character set includes 136,755 characters.
The most straightforward representation of Unicode uses one 32-bit word per character, which is more than enough. But program developers consider that an inefficient use of computer memory, and also, a lot of old software still in use was written when eight bits per character was standard. So Unicode characters are generally represented in a multi-byte representation in which the original 128 ASCII characters occupy one byte, while other characters may require up to four bytes. (It's also possible to use a multi-byte sequence to tell your word processing software that you want to use one-byte or two-byte codes to represent a particular non-Latin alphabet.)