Sound Bit-Depth

As you read this discussion and look at the examples, please always keep two things in mind.  Sound is analog - naturally smooth curves that look like the blue lines below.  Digital recordings are not smooth - they are blocky mathematical representatives of discrete points on the curve, that taken together, try their best to describe the smooth curve.  The better the digitization, the more data points, the better the approximation of the smooth curve will be.

Converting a sound to digital data is the job of a recorder or computer. It samples and measures the sound at blazing fast speeds and records it to a hard drive or other storage device. The sampling rate or sampling frequency defines the number of samples per second taken from the continuous audio signal. Samples per second are sometimes referred to as Hertz (Hz). Music for a cd is sampled 44,100 times every second or 44.1KHz.

1 KHz = 1000 per second

The sample rate limits the highest frequency that can be recorded. Nyquist Threory says that the highest frequency that can be recorded is half the sample rate. The highest pitch sounds most humans hear is around 20KHz - so Nyquist Theory says a sample rate of 40KHz is needed. Cd's are recorded at 44.1 KHz to give sounds a little more definition than is needed to satisfy music listeners.

Digital cd audio recorded with a sample rate of 44.1 KHz actually takes a single data point description of the sound 44,100 times every second.

In digital audio, bit depth describes the number of bits of information recorded for each sample. On the computer, a 16-bit sound is described by a 16-letter 'word' of data for each sample time. Bit depth directly corresponds to the resolution of each sample in a set of digital audio data. Common examples of bit depth include CD audio, which is recorded at 16 bits, and DVD-Audio, which can support up to 24-bit audio.

The best way I can describe what the computer does with sound when it records digital audio is by example. So follow along and we're going to convert an analog sound wave to a digital description, sometimes called ADC (analog-digital-conversion) by hand. The computer is very good at regenerating sound from limited digital data, but for the examples, everything is simplified to help explain this interpretation.

The graph below represents sound. It is air being pushed and pulled up and down graphed as time goes by to the right.

Plucking on a guitar string generates a sound, vibrating up and down as time goes by, the smooth blue wave. Now let's do some analog to digital conversion just like a computer. To do what the computer does, you have trace the blue line, by putting red dots on it. But the only places you can put the dots are at the intersection of the gray grid lines. That's right, trace the line using red dots only at the gray line intersections.

It didn't do a very good job tracing the blue line. 2-bit data doesn't do a very good job measuring the wave exactly. We need more of the intersecting gray lines near the blue wave.

The vertical gray lines.

Each vertical grid line represents an exact moment in time where the sound is sampled. At each of those moments in time, the computer or recorder takes a measurement and records it. The measurement is shown as a red dot. Some of these dots aren't very close to the blue wave of the actual sound. Obviously more of the lines are needed, so more of the dots can be placed more often and nearer to the actual sound. One can easily increase the number of vertical lines by measuring more samples more often. What is really needed is more horizontal possibilities.

The horizontal gray lines.

The horizontal lines are the signal description. The description is recorded by the computer as a 'word.' The 'word' used to describe the wave can only have as many letters in it as the number of 'bits.' We obviously need more horizontal lines too, more letters in the 'word' if we're going to do a better job tracing the sound. Why are there so few gray horizontal lines? Because this sample is 2-bit sound.

In computer language, a 'bit' is the abbreviation for a single 'binary digit.' Binary numbers are base-2; each digit can only be a '0' or a '1'. Counting in binary is simple. Using only 2 digits (2-bits) and using only 0 and 1. Counting in 2-bit binary beginning at zero, 00, 01, 10, 11. That's all the counting one can do in 2-bit binary. That's the same as counting in base 10 decimal 0, 1, 2, 3. A 2-bit sound description can only have one of those 4 possible descriptions at each sample point. I could have just as easily labeled the horizontal lines 00, 01, 10, and 11, like the computer does, instead of labeling them 0, 1, 2, 3.

Recording 2-bit sound, only two binary digits can be used (only 2 letters in the word). There are only 4 gray lines numbered 0-3. That means the computer must select the intersection of gray lines that is closest to the blue line. That's what it must do. You can see that it's difficult to describe a sound well with only a 2-bit description.

To play the sound, the computer has to reverse the process and convert the set of digital audio samples back to analog, called a digital-analog-conversion or DAC, to regenerate the sound. That means to play the sound, the cd player has to convert just the red dots back into the blue line. Here's what just the 2-bit data file would look like. Converting just those few dots back to the smooth blue wave takes a smart computer.

The data above don't give enough accurate information to recreate the blue line completely. But computers are very good at recreating waves with limited data. According to Nyquist, I already have enough vertical lines and data points, but to do a good job getting exactly the right wave back, my data points would have to be much more accurate. They would have to be right on the blue wave. In these examples, we're going to speed up the process and the conversation by improving both the accuracy of the data and by adding more data points, to clarify the ADC process quickly, in spite of Nyquist.

Let's increase the computer data's resolution to 3-bit, double the sampling rate, and try tracing the blue wave again.

Digital Signal - The white blocks show the digital representation of the sound. The "blocks" look much closer to the wave in 3-bit.

With 3-bit resolution there are 8 possible descriptions, numbered 0-7. Why are there 8?

Again the counting in binary, just for practice. Counting in 3-bit binary (3-letter word) from zero - 000, 001, 010, 011, 100, 101, 110, 111. That's just like counting in base 10 decimal - 0, 1, 2, 3, 4, 5, 6, 7. There are only 8 possible numbers. So the 3-bit sound has 8 possible data descriptions, numbered 0-7 shown in the example by the 8 gray horizontal lines.

Again with the red dots, trying to get them as close to the blue sound wave as possible. Still not a very good result, but if you just had the red dots, could you guess the shape of the original blue wave? The computer would.

On to the 4-bit example and again doubling the sample rate.

This represents a Discrete Sampled Signal - The red arrows depict the sampling of a single data point at each of the sampling times by the computer. The computer stores the point as a time increment and a 4-bit number in the sound's data set.

This time, you begin to really see the original wave just by looking at the digital dots. Calculating the possible levels again by counting, this time in 4-bit binary, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111. That's 16 possible levels. I didn't number the lines for you this time. See the pattern? Notice that the number of possible descriptions doubles every time we add a 'bit.' The formula is: # of levels = 2n, where n = # of bits

Now here's a 5-bit sound, again with the sampling rate doubled to speed up the process, my last example.

Now you really could regenerate the sound just from our red digital dots even without a computer. There are 32 possible descriptions and it does a pretty good job of tracing the blue wave.

So what does it mean when the WT - Foxpro argument gets going and one guy starts slinging numbers like 32 KHz and 16-bit? That means that the sound recorder took 32,000 samples of the sound in a second. Each of those 32,000 samples is described using a 16-letter word. And doing the binary - decimal conversion for you without counting, a 16-bit sound has 65,536 possible levels. I can't draw 66,000 horizontal lines in the example.

Or someone else says that their sounds are 44.1 KHz and 24-bit. That's 44,100 samples every second, or one every .000023 seconds and using a '24-letter binary word,' to a resolution of one in 16,777,216.

That's the actual difference between 16-bit (one in 65 thousand) and 24-bit (one in 17 million). They are both very fine increments. It means cutting the same sound into 65,000 measurable increments or into 17 million increments. Do you think the human ear can hear the difference? The human ear is much more sensitive to small changes than the eye, but these slices of resolution are now very, very small. Are they useful?

After reading quite a bit about 24-bit and 16-bit audio, it is not totally clear to me that a human can detect a difference between the 16-bit and 24-bit sound. Highly touted sound qualities like dynamic range and sound-to-noise ratio are directly linked to bit depth and those product specifications help sell products, from e-calls to stereos to very expensive sound studio recording electronics. Some audiophiles and music producers argue that under very strict circumstances, certain kinds of sounds, with the best of equipment, stereo gear costing mid-5-figures, some differences can be detected between 16-bit and 24-bit audio. The listener also needs pristine ears not ruined by gunfire or dulled by age. However, most sound experts agree that higher sampling rates and greater bit-depth add some overhead for editing and are much preferred. Manipulating the sounds is just better when there are more data to manipulate. It has been a recent development that modern computers, storage media, and software are finally available to handle crunching all the numbers in recording and editing very quickly and cheaply.

As an analogy, digital photography has gone through similar explosive growth in capability. High resolution cameras, blazing fast storage media, huge disk drives and flash cards, and amazing software, make capturing and editing photos, operations that would have taken hours a decade ago, possible in seconds. Cameras and computers, unimagined a decade ago, cost only a few hundred dollars at Wal-Mart. Comparing photos to our high-bit sounds again, a 10 megapixel photo squeezed into a 4 x 6 print is about 20,000 dots per inch (dpi). But my eye can't tell the difference between 180 dpi and 300 dpi in the printed photo.  Most computer monitors are only 70-90 dots per inch.  Even HD smart phones won't go beyond 200 dpi.  But given a choice, I'd much rather check the focus and adjust the color in the bigger file and then reduce it to the smaller size we're eventually going to use. I don't need 10 megapixels to post a photo on the web, but it's nice to have them for editing overhead.

Some sound editing software upsamples less well resolved sounds to 32-bit when they're input. The software I use does exactly that. In the constant battle of numbers where more is better, there's even a commercial sound editing software suite called Sonar that says it's already capable of 500 KHz sample rates and 64-bit sound. 64-bit sound translates to a resolution of 264= 1.8 x 1019 = or one in 18,000,000,000,000,000,000. That is resolution equal to the thickness of a piece of paper compared to 8 trips to the Sun and back. I'm not sure my ears need that much audio resolution. LOL - I can't see 72dpi without my reading glasses.

Do you remember Nyquist Theory from the beginning of this post? Nyquist theory said it takes twice as many samples as the frequency of the sound. Nyquist Theory says that given enough letters in the word, the computer can accurately describe the sound waves shown in the examples above in 2 dots! Your computer or digital recorder does much much more than what I described here. I simplified the examples greatly to illustrate the basics of sampling rates and bit depth. I hope it explains some of the 'Hertz' and 'Bits' mumbo-jumbo that some folks throw around without really understanding it, and that I've left you with a basic understanding what the hell they're talking about both here and at other predator web sites.