Getting that perfect sound quality in a call is the ultimate VoIP Nirvana. In this post I’ll (try to) explain in layman’s terms how audio is captured and transmitted from what is spoken to what is heard.
Speech is produced in analogue form, this is illustrated in the image below
Analogue speech\audio can be transmitted across a pair of copper wires but this is limited in that you will need a pair of wires for each conversation\connection. Of course this becomes impractical when you need to scale up as the cable capacity needs to grow.
To increase the number of conversations across copper, technologies such as PCM (Pulse Code Modulation) is used. PCM is a method used to sample analogue audio and represent this in a digital format. PCM allows for multiple channels to operate across the same copper wires by separating the channels into time-slots. This is known as TDM or Time Division Multiplexing.
PCM is considered a codec used to sample analogue signals. PCM can be broken down into 3 steps:
Sampling

The image to the right indicates a standard analog signal – the initial input.
Once this signal has gone t
hrough the sampling process it is represented by multiple PAM samples, see PAM Sample imageQuantization
The official definition is that it is the systematic method of providing standard binary numbering to PAM samples for PCM conversion.
Coding
For example, there are different sampling rates in VoIP depending on the codec being used, the most common being:
64,000 times per second
32,000 times per second
8,000 times per second
A G.729 codec has a sampling rate of 8,000 times per second and is the most commonly used codec in VoIP. Whereas G.711 has a sampling rate of 64,000 times per second.
Of course the higher the sampling rate the more accurately the decoded audio will be as there are smaller gaps between the audio samples.