The name of this column, Multimedia Medley, has until now implied only CD-ROM systems and applications. Not so anymore. Over the past few months I have heard impressive audio, backed by impressive audio software applications, played out online on the World Wide Web. More will follow, as the potential of real-time audio is considerable.
Audio has been available for quite some time through the Internet but in an inconvenient fashion. Various audio snippets have been posted on the Internet that you could download if you and your network connection had the stamina. These could then be played back off-line.
The current trend is to serve audio on demand, with minimal delay. Real-time playback usually starts 20 to 30 seconds after you click on an icon to initiate it. While the music plays, you may keep on browsing the Web site. A prominent example is one of the online music stores where I recently listened to a few 30-second AM or higher-quality samples of the latest Beatles album while browsing the liner notes or looking up the album cover and the price.
I should point out that I have a direct Ethernet connection (courtesy of fellow professor Larry Osborne, who, together with his students, wired all the faculty offices and labs in our school a few months ago) and that such a connection is significantly faster than the best dial-up connection this side of (the struggling) ISDN. The bare telecommunications minimum for real-time audio is a 14.4-KB/second modem, but I would urge you to try it on a 28.8-KB/second modem or direct connection. I did, and while not even the Ethernet connection would make me ditch any of my audio equipment, I see attractive features in real-time audio online.
A Little Math as a Starter
Let the professor in me take over for a moment. This time don't skip the simple math part. It'll really help. Be careful not to confuse the bits and pieces, er, bytes, as so many professionals and journalists do (including me some time ago in this column, but at least I apologized). I don't mind so much when calculations ignore the start and stop bit needed for transferring each 8-bit byte or when 1 kilobyte is converted to 1,000 bytes instead of 1,024 (though a few bits here and there soon add up to megabytes when it comes to audio and video clip transfer). But it is essential to understand that even first-generation CD-ROM players offer a transfer rate that is 50 times faster than a 28.8 KB/second dial-up modem.
CD-audio-quality recording, the standard these days, takes 85.94 kilobytes for each second in mono and twice that for stereo. This translates to more than 5 megabytes in mono and 10 megabytes in stereo per minute. CDs have at best about 65 minutes of music. That requires 640-670 MB of storage capacity. There are numerous combinations for the two key criteria of audio recording, based on sampling the analog sound source for digitization. (See Table 1.)
One criterion is the sampling frequency, which defines how many times in each second a sample is taken. It ranges from a mediocre 4,000 samples per second (4 KHz-- the approximate equivalent of a long-distance call from a developing country on a monsoon day) to the professional studio system quality of 48,000 samples per second (48 KHz).
The other is the amount of information to be recorded in each sampling. This is usually 8 or 16 bits (1 or 2 bytes) and is often referred to as sound resolution. It is similar to screen resolution for images. The more information stored and displayed for an image, the higher the fidelity of the image.
CD-ROM drives play back high-fidelity CD stereo sound at 150 KB/second. It is no accident that this is the speed of the single-speed CD-ROM drives. Sound is played back at this rate even from multiple speed (2x, 4x, 6x) drives. This load would make even the fasted modems (28.8 kilobits per second=3.52 kilobytes per second), based on traditional telephone-line connection, choke for real-time playback. And it would take almost 50 minutes to transfer 1 minute.
The paltry AM-quality mono sound recorded at the rate of 8,000 samples with 8 bits (1 byte) of information per second also needs almost 8 kilobytes per second. As the speed (throughput) of such modems cannot be further increased because of the quality of the telephone lines, the problem had to be approached from the other side for real-time playback.
[Part 1 of 2]
Sampling Sampling Storage requirements frequency resolution bits/sec bytes/sec Kbytes/sec
8KHz/sec 8-bit 64 000 8 000 7 81 11KHz/sec 8-bit 88 000 11 000 10 74 22KHz/sec 8-bit 176 000 22 000 21 48 44KHz/sec 8-bit 352 000 44 000 42 97 8KHz/sec 16-bit 128 000 16 000 15 63 11KHz/sec 16-bit 176 000 22 000 21 48 22KHz/sec 16-bit 352 000 44 000 42 97 44KHz/sec 16-bit 704 000 88 000 85 94
[Part 2 of 2]
Sampling Sampling Storage requirements frequency resolution Kbytes/min Stereo
8KHz/sec 8-bit 468 75 937 50 11KHz/sec 8-bit 644 53 1 289 06 22KHz/sec 8-bit 1 289 06 2 578 13 44KHz/sec 8-bit 2 578 13 5 156 25 8KHz/sec 16-bit 937 50 1 875 00 11KHz/sec 16-bit 1 289 06 2 578 13 22KHz/sec 16-bit 2 578 13 5 156 25 44KHz/sec 16-bit 5 156 25 10 312 50
Compressing/Decompressing, and Streaming Audio Files
The key is to compress the audio for transfer and decompress it on the fly for playback without too much loss. This technique has been used for program and image files and is now available for sound files as well. There are five companies that offer the technique. Four of them use proprietary compression technologies and file formats. Only one uses the standard for compressing audio and video, the so-called MPEG compression/decompression specified by the Motion Picture Engineering Group.
The audio recordings are compressed either off-line or online. This latter makes it possible to provide live feeds, as we heard during Clinton's State of the Union address in January. The pricing of this compression software is usually based on the number of simultaneous users to be served, but there are free compression programs as well.
The compressed software is stored on the hard disk of the server. When a user connected to the server clicks on an audio file, the server starts sending the audio file. This is called "streaming." For 20 to 30 seconds, audio data is pumped into a buffer on the recipient's machine and decompressed; then playback starts. While the sound is played back from the buffer, the server keeps steaming the next packet. If there is heavy traffic anywhere on the network, the new packet may not arrive in time for smooth playback. This is when you hear stuttering audio or a complete pause that may have nothing to do with the compressed audio quality. They are merely a sign of a "traffic jam." Think of it as driving up Broadway at 5 p.m. on a Friday and again at 4 a.m. on a Sunday. Same car, same road, grossly different throughput. I once listened to more than 4 minutes of streamed audio of a lovely Scottish ballad (http://www.almac.co.uk/es/tunes) flowing without a hitch; then on another occasion, I heard a 30-second soundbite break apart into stop-and-go 3-second pieces, which destroyed the listening experience.
Beyond this and, obviously, the quality of your soundboard and speaker, the quality of the compressed audio will have an impact on your listening pleasure. To thin the datastream, many of the audio recordings are done at 8 KHz and 8-bit sampling. This is perfectly good for speech. Music needs a minimum 11-KHz sampling rate and 16-bit sound resolution. Even then you will hardly mistake it for your Bang & Olufson deck, but you'll find it adequate for getting a feel for the melody, style, and rhythm.
Play It, Sam, Then Play It Again
To play your audio, you must have a playback program. There are five companies offering the technology for real-time audio: Progressive Network RealAudio, Xing Technologies Streamworks, DSP Group TrueSpeech, Vocaltec Internet Wave, and the newest kid on the block, VoxWare Toolvox.
The playback software programs are free, and they are readily available not only from their developers' home pages but also through many sites that serve audio or have audio discussion groups. They are to be installed and then configured as helper applications for your browser (Netscape, Spyglass, Internet Explorer). Configuration may be done automatically or manually at the time the helper is installed, or it may take place on the first occasion when audio streaming starts, and the browser gives you the option to configure a "viewer." (See Figure 1.) It may seem ironic that these audio helpers are named viewers, but this is a generic term. And after all, the helper applications show nice playback menus while streaming and playing back the sound.
The players differ significantly in such features as playback optimization, progress indication, positioning the commencement of playback, volume control, and amount of information displayed about the sound file. In a future column, I plan to review the major features of these programs. I'll also point out some leading Web sites offering one or more types of streamed audio from interesting archives, plus live feeds of radio broadcasts. Prick up your ears, if you will.