
An Introduction to Digital Audio Recording
Converting Sound into NumbersIn a digital recording system, sound is represented as a series of numbers, with each number representing the voltage, or amplitude, of a sound wave at a particular moment in time. The numbers are generated by an analog-to-digital converter, or ADC, which converts the signal from an analog audio source (such as a guitar or a microphone) connected to its input into numbers. The ADC reads the input signal several thousand times a second, and outputs a number based on the input that is read. This number is called a sample. The number of samples taken per second is called the sample rate. On playback, the process happens in reverse: The series of numbers is played back through a digital-to-analog converter, or DAC, which converts the numbers back into an analog signal. This signal can then be sent to an amplifier and speakers for listening. In computers, binary numbers are used to store the values that make up the samples. Only two characters, 1 and 0, are used. The value of a character depends on its place in the number, just as in the familiar decimal system. Here are a few binary/decimal equivalents: BINARY DECIMAL Figure A. Binary numbers and their decimal equivalents
Each digit in the number is called a bit. The binary numbers expressed in figure A are sixteen bits long, and have a maximum value of 65,535. The more bits that are used to store the sampled value, the more closely it will represent the source signal. In a 16-bit system, there are 65,535 possible combinations of zeroes and ones, so 65,535 different voltages can be digitally represented. (see figure A above).
Figure B. The more bits there are available, the more accurate the representation of the signal and the greater the dynamic range.
Your Echo audio interface’s analog inputs use 24-bit ADCs, which means that the incoming signal can be represented by any of over 16 million possible values. The output DACs are also 24-bit; again, over 16 million values are possible. The S/PDIF inputs and outputs also support signals with up to 24-bit resolution. Your Echo audio interface processes signals internally with 24-bit resolution to insure that there is no degradation to the audio signal as it is processed through the system. The number of bits available also determines the potential dynamic range of the device. Moving a binary number one space to the left multiplies the value by two, so each additional bit doubles the number of possible values that may be represented. Each doubling of the number of values provides 6dB of additional dynamic range (see decibel section below). So, a 24-bit system can theoretically provide 144dB of dynamic range. (6dB times 24 bits = 144dB) versus a 16-bit system with a maximum dynamic range of only 96 dB. Also important to the quality of a digital recording is the frequency with which the samples are stored, called the sample rate. In order for a waveform to be faithfully digitized, it must be sampled at a minimum of twice the highest frequency to be stored. Failure to sample frequently enough results in a kind of distortion called aliasing. (If you like technical issues, do some research on The Nyquist Theorem which explains why this happens). In addition to aliasing, sampling too slowly will result in reduced high frequency reproduction. Your Echo audio interface allows you to sample sound at up to 96,000 times per second. Once the waveform has been transformed into digital bits, it must be stored. When sampling in stereo at 96kHz using a 24-bit word size, the system has to accommodate 4,608,000 bits per second. In the past, storing this vast amount of data was problematic. Today, computer-based digital recording systems record the data directly to the computer’s hard disk. Today’s hard disks are capable of storing large amounts of data, though the performance of hard drives can vary substantially. The speed and size of your hard drive will be a major determining factor in how many tracks of audio you will be able to simultaneously record and playback. DecibelsAudio signal levels are generally expressed in units called “decibels” which are abbreviated as “dB”. This is a “logarithmic” scale where each doubling of signal level is represented by an increase of 6dB. Therefore a signal of 6dB is twice as big as a 0dB signal and a signal of 12dB is four times as big as a 0dB signal. Since digital audio signals are represented by binary data, each bit of audio information represents 6dB. A 16-bit number can represent a total range of 96dB and a 24-bit number can represent a total range of 144dB (6 times the number of bits). It’s much easier to say that one signal is 72dB less than another instead of saying it is 1/4096 the size of the other one. It also more accurately represents the way we hear sounds, since the smaller signal in the above example will still be audible and not appear to be only 1/4096 as loud when we listen to it. Just as there are different types of degrees used to represent temperature (Fahrenheit, Celsius, etc), there are different types of decibels used to represent the level of analog audio signals. The most common are dBu and dBV decibels. Both of these represent voltage levels and still double for every increase of 6dB. It is only the reference point, or 0dB level that is different. A 0dBV signal has a voltage level of 1.0 volts. A 0dBu signal has a voltage level of .775 volts. Since .775 is approximately 2dB less than 1.0, converting dBV levels into dBu levels is as simple as subtracting 2dB (2.21 to be exact). Signals are also occasionally represented with units of dBm. This is an older unit that measures power instead of voltage levels with 0dBm representing 1 milliwatt. Earlier tube-based audio equipment used standardized input and output impedances of 600 ohms, so a 0dBm signal was produced with a voltage of .775 volts. Since most of today’s equipment uses impedances other than 600 ohms, it is more useful to represent signals by voltages rather than power and the dBu unit was introduced. A signal level of 0dBu is identical to a level of 0dBm. Digital signals, after they are recorded, no longer directly represent any physical quantity such as voltage or power and 0dB is generally used to represent a “full-scale” or maximum signal level. All other signal levels are lower and are expressed as negative decibels. Most meters on digital equipment have 0dB at the top and range downward from there. A signal that is 30dB below full scale would simply be referred to as a –30dB signal. Nominal Signal Levels and HeadroomToday’s equipment is generally referred to as +4 dBu equipment (professional) or –10dBV equipment (consumer). These levels are the typical or “nominal” signal levels you can expect to see with professional (studio) equipment such as mixers or with consumer equipment such as home stereos and CD players. A +4 dBu signal has a voltage level of 1.23 volts and a –10 dBV signal has a voltage level of .316 volts. The above nominal levels represent typical or average levels that are often exceeded when recording loud signals such as drum beats. The difference between the nominal level and the loudest signal that can be recorded without clipping is called “headroom”. Your Echo card provides approximately 14dB of headroom allowing an 18dBu signal to be recorded. Audio MetersAudio meters are an objective way of visually monitering your audio levels and assuring that you have sufficient headroom and dynamic range. Analog and digital meters calibration The way you set the level of your digital meter is different from an analog meter. Digital meters display dBFS (Decibel relative to Full Scale) which has a maximum level of 0dBFS. Analog meters display dBu ( uploaded). On an analog meter, 0 dB is the optimal recording or output level of a device. If the voltage is much higher, the signal may distort. If the voltage is much lower, the signal may be lost in the noise inherent in the device. The calibration between an analog and digital device varies from countries and mediums. Most European countries, specifying +18 dBu at 0 dBFS
Average and Peak Audio Level Take a closer look at an
audio waveform to better understand how it corresponds to what you
hear during playback.
Unbalanced and Balanced Inputs and OutputsAn unbalanced signal, commonly used for guitars and consumer electronics, contains two components, a ground signal and a “hot” or active signal. The ground is the barrel of a ¼” connector and the shell of an “RCA” style connector. A balanced signal contains two active signals instead of one in addition to the ground. These are referred to as the “plus” and “minus” signals. A balanced input amplifier amplifies the difference between these two signals. Any extraneous noise picked up from power lines or other sources will appear equally on both the plus and minus inputs. This is called “common mode” noise since it is common to both signals and the input amplifier will subtract the noise on the minus input from the noise on the plus input. If the input amplifier is perfectly balanced and the noise on both plus and minus is precisely equal, the noise will completely cancel out. In the real world this is not the case and some of the common mode noise will still make it through, although at a much reduced level. How well an input amplifier rejects this common mode noise is called the “common mode rejection ratio” (abbreviated as CMRR) and is expressed in dB.
Balanced signals connect with either XLR connectors or TRS (tip, ring sleeve) connectors. Your Echo card uses TRS connectors for connecting balanced line level signals. The three sections of a TRS connector are used to transmit the three components of a balanced signal (T = plus, R = minus, S = ground). Gina24 will also accommodate the two conductor unbalanced style connector. Dynamic RangeDynamic range represents the difference between the maximum signal that can be recorded and the “noise floor”, or level of noise with no signal present. A system with a high dynamic range will be quieter than one with a lower dynamic range. Dynamic range is a very important specification, and your Echo Digital Audio interface uses converters that have very high dynamic range. Theoretically, a 24-bit system has a dynamic range of 144dB and a 16-bit system has a dynamic range of 96dB. Two questions immediately come to mind: Why does my Echo Digital
Audio interface only have a dynamic range 114 dB? First, today’s analog-to-digital converters typically produce a full-scale input voltage with an input of +7dBu. If they were to have 144dB of dynamic range, they would have to be capable of resolving signals as small as –137 dBu (7dBu – 144dBu) or approximately 10 nano-volts. That’s 10 one-billionths of a volt! Transistors and resistors produce noise in this range just by having electrons moving around due to heat. Even if the converters could be perfectly designed to read these levels, the low noise requirements of the surrounding circuitry such as power supplies and amplifiers would be so stringent that they would either be impossible or too expensive to build. In answering the second
question, consider the fact that music is often compressed or amplified
after it is recorded, and that some headroom
is necessary when recording to avoid clipping. The only way that
96dB would be adequate is if all music were recorded so that the
peaks were just under full-scale and no compressing AES/EBU and S/PDIFThe digital audio standard called AES/EBU or AES3, is used for carrying digital audio signals between various devices. It was developed by the Audio Engineering Society(AES) and the European Broadcasting Union (EBU). Both AES and EBU versions of the standard exist. Several different physical connectors are also defined as part of the overall group of standards. A related system, S/PDIF, was developed and standardized as IEC 60958 essentially as a consumer version of AES/EBU, using connectors more commonly found in the consumer market, however S/PDIF is now used in professional situations where cost or limited space is a concern. AES3 and AES3id - Short and Long DistancesAES3 uses 110 ohm shielded twisted pair (STP) cable with XLR connectors up to a distance of 100 meters. AES3id uses 75 ohm coaxial cable and BNC connectors for up to 1,000 meters. "Unbalanced" coax is better for long distances than "balanced" twisted pairs. S/PDIFS/PDIF is the consumer version of AES/EBU and uses a lower signal voltage. They both support the same audio data with slight differences in the frame bits. Conversion between these interfaces must be handled with electronic circuits, not by adapting one connector to another. AES/EBU vs. S/PDIFAES3 uses shielded twisted pair cables while the AES3id variation shares the same cable as the consumer-oriented S/PDIF interface. ADAT and S/MUXADAT (Alesis Digital Audio Tape) was first introduced in 1991 and was used for simultaneously recording eight tracks of digital audio at once onto Super VHS magnetic tape - a tape format similar to that used by consumer VCRs. "ADAT" is also used as an abbreviation for the ADAT light-pipe protocol, which transfers 8 tracks in a single fiber optic cable. The ADAT cable standard is no longer strictly tied to ADAT tape machines, and is now utilized by a wide range of devices including digital audio interfaces, synthesizers and digital mixers. Digital Audio Workstation (DAW) of the original benefits of utilizing ADAT versus AES/EBU or S/PDIF was that a single cable could carry up to eight channels of audio. Higher sample rates can be used with a proportionately reduced number of channels: Four at 96kHz or two at 192kHz. WordClockWordClock is used to synchronize other devices, such as multiple audio recording interfaces, digital audio tape machines, compact disc players, which interconnect via digital audio. S/PDIF, AES EBU, ADAT and other formats use a WordClock. Word clock should not be confused with Time Code. Word clock is used entirely to keep a perfectly-timed and constant bitrate and avoid data errors. The WordClock generator, usually built-in to analog to digital converters, creates digital pulses which contain no other data. Things that should remain consistent are a 75 ohm output impedance, 75 ohm cables and a 75 ohm terminating resistor at the end of a chain or cable. All our audio interfaces have self terminated WordClock. The phase of the digital WordClock signal is another important factor. The wrong phase can cause the order of the left and right inputs of a stereo pair or the order of paired input channels to be reversed. MTCMIDI time code (MTC) embeds the same timing information as standard SMPTE time code as a series of small 'quarter-frame' MIDI messages. MTC allows the synchronization of a Digital Audio Workstation (DAW) with other devices that can synchronize to MTC or for these devices to 'slave' to a tape machine that is striped with SMPTE. For this to happen a SMPTE to MTC converter needs to be employed. Please note that it is possible for a tape machine to synchronize to an MTC signal (if converted to SMPTE), if the tape machine is able to 'slave' to incoming timecode via motor control, which is a rare feature. FireWire 1394a and 1394bFireWire is a high-speed digital interface that comes in two varieties —1394a (or just 1394) and 1394b. The 1394a standard (FireWire 400) supports data transfer rates up to 400 Mbps and uses a 4 or 6-pin connection. The 1394b standard (FireWire 800) can transfer data up to 800 Mbps and uses a 9-pin connection. An 6-pin to 9-pin adapter can be used to plug a 1394a cable into a 1394b port, but the speed will be limited to 400 Mbps. All our AudioFire interfaces are compatible with Apple computers with a 1394a and 1394b connection. With Windows operating systems, 1394a is recommended. |
COPYRIGHT 2011 ECHO DIGITAL AUDIO CORPORATION |