Introduction to Audio Codecs

Getting that perfect sound quality in a call is the ultimate VoIP Nirvana. In this post I’ll (try to) explain in layman’s terms how audio is captured and transmitted from what is spoken to what is heard.

Speech is produced in analogue form, this is illustrated in the image below

Analogue speech\audio can be transmitted across a pair of copper wires but this is limited in that you will need a pair of wires for each conversation\connection. Of course this becomes impractical when you need to scale up as the cable capacity needs to grow.

To increase the number of conversations across copper, technologies such as PCM (Pulse Code Modulation) is used. PCM is a method used to sample analogue audio and represent this in a digital format. PCM allows for multiple channels to operate across the same copper wires by separating the channels into time-slots. This is known as TDM or Time Division Multiplexing.

PCM is considered a codec used to sample analogue signals. PCM can be broken down into 3 steps:

1. Sampling – collecting representative samples from the analogue input
2. Quantization – a sort of rounding off of the samples to produce a more accurate output
3. Coding – 

Interestingly PCM is often referred to as G.711 and vice versa.


First we start with the input signal

To code audio to digital format it is sampled at thousands of times per second. The sampling method used most widely is called PAM (Pulse Amplitude Modulation) and basically is an 8 bit string of 1’s and 0’s representing a digitized sample of the audio.

The image to the right indicates a standard analog signal – the initial input.

The G.711 codec samples audio at 8,000 samples per second. It converts each tiny sample into a PAM Sample (8 bits), So 1 second of analog audio contains a total of 64, 000 bits (8000 Hz sampling frequency x 8 bits per sample = 64,000 bits) – or 64 kbit/s.

Once this signal has gone through the sampling process it is represented by multiple PAM samples, see PAM Sample image


The official definition is that it is the systematic method of providing standard binary numbering to PAM samples for PCM conversion.

In Layman’s terms, it bridges the gaps between the samples and makes the samples sound more human when the samples are reassembled. The Quantization information is included in the 8 bit samples.


The process of compiling the 8 bits from the sampling and Quantization inputs. So simply put, coming up with the 1’s and 0’s for each 8 bit sample based on a set of rules.

The rule set that defines this process is called a Codec (Coder- Decoder). Of course there are a number of different codecs, each with their own set of rules (and features).

The primary differences between codecs are generally:-
1. Bit Rate – the number of samples taken per second
2. MOS (Mean Opinion Scores) – a score between 1-5 for quality, 5 being perfect

For example, there are different sampling rates in VoIP depending on the codec being used, the most common being:

64,000 times per second
32,000 times per second
8,000 times per second

A G.729 codec has a sampling rate of 8,000 times per second and is the most commonly used codec in VoIP. Whereas G.711 has a sampling rate of 64,000 times per second.

Of course the higher the sampling rate the more accurately the decoded audio will be as there are smaller gaps between the audio samples.

Generally speaking, the higher the bit rate the better the audio sounds (MOS goes up) and, since there are more bits to transmit more bandwidth is used.

About Paul B

My name is Paul Bloem and I am employed at Lexel Systems in New Zealand as a Principal Consultant for Unified Communications. I have been working on enterprise voice solutions for over 20 years. My first 10 years were spent working for a Telco in South Africa (Telcom SA). This is where all the groundwork happened as I was exposed to just about every aspect of telecommunication you could imagine. I develop an interest in PBX technologies and eventually became the go-to guy. Next, I had a 10 year run at Siemens South Africa, most of my time there was as a Technical Trainer. During this time VoIP hit the world stage, I had the privilege of introducing VoIP both as H.323 and later SIP across the Siemens HiPath 4000 solution stack. In 2008 I immigrated to New Zealand with my newly attained MCSE, I was ready to go where no PBX Techie had gone before. I was employed to explore OCS 2007 and that was pretty much the beginning of the end for me. I have been working on OCS and Lync ever since. My current role focuses exclusively on Lync and associated technologies.. That includes pre-sales, consulting, architecture and design, training and support. I even get to play in the development space from time to time - focus on play ;-) I was nominated as a Microsoft VTSP for Lync early in 2013 and also awarded Microsoft's MVP award for Lync in 2014.
This entry was posted in Audio Coding, Audio to digital, PCM. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s