Compression modes

If you're talking to another user on the same high-speed local area network, or you're one of the lucky few with a high bandwidth connection to the Internet backbone, there's no need to bother compressing audio. The data rate of 8000 bytes per second is modest compared to other Internet applications such as file transfer and accessing graphics-intensive pages on the World-Wide Web.

The rest of us, faced with a bottleneck of anywhere from 14,400 to 65,536 bits per second between our machine and the rest of the world, have to find a way to squeeze 8000 bytes per second into a communications channel with a capacity between 1440 and 6500 bytes per second. Speak Freely provides a variety of compression modes, each with different trade-offs among efficiency of compression, loss of fidelity in the compression process, and the amount of computation required to compress and decompress. Speak Freely's built-in performance benchmark may help you determine which modes are suitable based on the performance of your computer.

Compression options

Compression is selected by checking one or more of the compression items on the Options menu. The chosen compression mode(s) apply to all sound transmitted to open connections: sound files as well as live audio. Compression modes cannot be changed while you're transmitting live audio; click the mouse in each transmitting connection window to pause transmission, change the compression mode, then click or double click to resume transmission.

If no compression is selected, Speak Freely requires your network to reliably transmit 8000 characters per second. If it's slower than that, the person you're talking to will hear pauses in the sound they receive and sound will be lost. Most local area networks, unless extremely heavily loaded, have no difficulty transmitting data at this rate--in fact, most are capable of speeds on the order of a million characters per second. It's when you leave your local network and venture into the worldwide Internet that compression becomes crucial. Even uses with broadband Internet connections should generally use compression, both to minimise congestion on the Internet, and to avoid problems if the person they're talking to does not have a comparably fast connection.

For asynchronous serial communication, the data rate in bytes per second is about one tenth the speed in bits per second so it's clear that even a 64 Kb line can't transmit uncompressed sound at 8000 bytes per second. Speak Freely provides three forms of compression which can be selected independently or in combination to reduce the data rate.

"Simple compression" discards every other sample and thereby halves the data rate to 4000 bytes per second, within the capability of a 64 Kb connection. On the receiving end, the elided samples are synthesised by averaging adjacent samples. Simple compression requires very little CPU time but it substantially degrades sound quality--high frequency components are lost and weird sampling aliasing can occur. Still, voice is generally intelligible and it's certainly better than random pauses and lost sound.

"GSM" compression (the default mode) employs the algorithm GSM (Global System Mobile) telephones use to reduce the data rate by a factor of almost five with little degradation of voice-grade audio. Enabling this option reduces the data rate from 8000 bytes per second to 1650 bytes per second, which renders a connection by 28.8 Kb modem usable. The catch is that GSM encoding is a very complicated process and, if your computer isn't fast enough, it won't be able to keep up with the audio coming in. (Decoding requires only about half the computation as encoding.) To use GSM compression, you'll need a fast 486, Pentium, or later generation processor. Thus, a slower network connection increases the demand on your computer.

"ADPCM" compression uses Adaptive Differential Pulse Code Modulation to halve the data rate to 4000 bytes per second. The compression is identical to that accomplished by Simple compression, but the loss in fidelity is much less; for voice grade audio, it's barely perceptible. ADPCM encoding and decoding requires more computation than Simple compression but enormously less than GSM; if your computer is too slow for GSM and the compression achieved by ADPCM is adequate for your network link, it's the best choice.

You can combine Simple and either GSM or ADPCM compression. The CPU requirement is only slightly greater than for GSM or ADPCM compression alone and the sound quality is about the same as for Simple compression. Simple and GSM compression combined yield a data rate of 800 bytes per second, which a 14.4 Kb network link can handle. Simple and ADPCM compression together yield a data rate of 2000 bytes per second, within the capability of a 28.8 Kb link.

"LPC" compression uses Linear Predictive Coding to reduce the data rate by more than a factor of 12. This achieves the greatest degree of compression of any of the available options but, like GSM, it is extremely computationally intense. LPC requires many calculations to be done in floating point; if your machine does not have a math coprocessor, it will almost certainly be unable to do LPC compression and decompression in real time. LPC compression is extremely sensitive to high frequency noise and clipping caused by setting the audio input level too high. If you hear frequent bursts of loud static, try reducing the gain on the microphone or speaking further away from it. Also, try to avoid the pops that result from talking directly into the mike; they also create bursts of noise. Finally, users with high pitched voices may not be able to use LPC compression at all: it just loses too much high-frequency information. If GSM is a cellular phone, think of LPC as a shortwave radio. It doesn't always work, you have to be careful to get the best results, and even in the best of circumstances there will be some noise and distortion. But, like shortwave, it lets you communicate (or at least try) when nothing else will work. If your network link is so slow that none of the other forms of compression are usable, give it a try.

"CELP" compression employs the United States Department of Defense Federal Standard 1016 CELP (Code-Excited Linear Prediction) algorithm to compress voice grade audio to a data rate of 4800 bits per second--a factor of 13 to one: less than half the bandwidth of GSM compression, yet with comparable fidelity. The CELP algorithm is, however, extremely computationally intense on the compression side (but not to decompress, on machines with fast floating point hardware). A 400 MHz Pentium II is about the minimum hardware required to transmit CELP in real-time.

"LPC-10" compression uses a different form of Linear Predictive Coding, as specified by United States Department of Defense Federal Standard 1015 / NATO-STANAG-4198, republished as Federal Information Processing Standards Publication 137 (FIPS Pub 137). LPC-10 compression encodes real-time audio into a 2400 bit per second stream. Even accounting for the additional information required to transfer audio packets over the network, LPC-10 compresses audio to only 346 bytes per second--a factor of more than 26 to 1. Audio fidelity in LPC-10 compression is less than that of GSM compression, but entirely adequate for voice-grade communications. As with the LPC compression mode described above, try to avoid driving the audio input into clipping with overly-loud signals, and eliminate hum and background noise which can interfere with the compression process. The principal disadvantage of LPC-10 compression is that it is extraordinarily computationally intense, and does most of its calculations in floating point. A math coprocessor (or on-chip floating point unit as found in 486DX and Pentium processors) is absolutely required to run LPC-10 compression in real time, and slower machines may not be able to use LPC-10 even if equipped with a math coprocessor.

Only one of the compression modes GSM, ADPCM, LPC, CELP, and LPC-10 may be selected at once. Choosing any of them turns off a previously-selected mode.

Here's a summary of the various compression options available to you:

  Bytes per Kilobits per Need fast Sound
Compression second second CPU? fidelity
No compression 8000 80000 No Best
Simple 4000 40000 No Poor
ADPCM 4000 40000 No Good
Simple + ADPCM 2000 20000 No Lousy
GSM 1650 16500 Yes Good
Simple + GSM 825 8250 Yes Lousy
LPC 650 6500 Yes Depends
CELP 600 6000 Extremely Good
LPC-10 346 3460 Extremely Okay

You can experiment to determine which settings work best by connecting to an echo server which returns any sound you send to it after a 10 second delay.

Robust transmission

The extreme degree of compression achieved by compression modes such as GSM, CELP, LPC, and LPC-10, which encode audio into much less bandwidth than many Internet links, allows Speak Freely, to offer an optional Robust Transmission mode. By default, Speak Freely sends a single copy of each sound packet to the site you're connected to. In Robust Transmission mode, between two and eight copies of every sound packet are sent, each containing a sequence number that allows the recipient to discard duplicate or out-of-sequence packets. If the Internet link between you and the person you're talking to is congested and you're experiencing drop-outs, Robust Transmission mode may substantially improve the quality of the connection.