Getting that perfect sound quality in a call is the ultimate VoIP Nirvana. In this post I’ll (try to) explain in layman’s terms how audio is captured and transmitted from what is spoken to what is heard.

Speech is produced in analogue form, this is illustrated in the image below

Analogue speech\audio can be transmitted across a pair of copper wires but this is limited in that you will need a pair of wires for each conversation\connection. Of course this becomes impractical when you need to scale up as the cable capacity needs to grow.

To increase the number of conversations across copper, technologies such as PCM (Pulse Code Modulation) is used. PCM is a method used to sample analogue audio and represent this in a digital format. PCM allows for multiple channels to operate across the same copper wires by separating the channels into time-slots. This is known as TDM or Time Division Multiplexing.

PCM is considered a codec used to sample analogue signals. PCM can be broken down into 3 steps:

1. Sampling – collecting representative samples from the analogue input
2. Quantization – a sort of rounding off of the samples to produce a more accurate output
3. Coding – 

Interestingly PCM is often referred to as G.711 and vice versa.


First we start with the input signal

To code audio to digital format it is sampled at thousands of times per second. The sampling method used most widely is called PAM (Pulse Amplitude Modulation) and basically is an 8 bit string of 1’s and 0’s representing a digitized sample of the audio.

The image to the right indicates a standard analog signal – the initial input.

The G.711 codec samples audio at 8,000 samples per second. It converts each tiny sample into a PAM Sample (8 bits), So 1 second of analog audio contains a total of 64, 000 bits (8000 Hz sampling frequency x 8 bits per sample = 64,000 bits) – or 64 kbit/s.

Once this signal has gone through the sampling process it is represented by multiple PAM samples, see PAM Sample image


The official definition is that it is the systematic method of providing standard binary numbering to PAM samples for PCM conversion.

In Layman’s terms, it bridges the gaps between the samples and makes the samples sound more human when the samples are reassembled. The Quantization information is included in the 8 bit samples.


The process of compiling the 8 bits from the sampling and Quantization inputs. So simply put, coming up with the 1’s and 0’s for each 8 bit sample based on a set of rules.

The rule set that defines this process is called a Codec (Coder- Decoder). Of course there are a number of different codecs, each with their own set of rules (and features).

The primary differences between codecs are generally:-
1. Bit Rate – the number of samples taken per second
2. MOS (Mean Opinion Scores) – a score between 1-5 for quality, 5 being perfect

For example, there are different sampling rates in VoIP depending on the codec being used, the most common being:

64,000 times per second
32,000 times per second
8,000 times per second

A G.729 codec has a sampling rate of 8,000 times per second and is the most commonly used codec in VoIP. Whereas G.711 has a sampling rate of 64,000 times per second.

Of course the higher the sampling rate the more accurately the decoded audio will be as there are smaller gaps between the audio samples.

Generally speaking, the higher the bit rate the better the audio sounds (MOS goes up) and, since there are more bits to transmit more bandwidth is used.