Subject: FAQ: Audio File Formats (part 1 of 2) Newsgroups: alt.binaries.sounds.misc,alt.binaries.sounds.d,comp.dsp,news.answers,comp.answers Followup-to: alt.binaries.sounds.d,comp.dsp Reply-to: guido@cwi.nl Approved: news-answers-request@MIT.Edu Archive-name: audio-fmts/part1 Submitted-by: Guido van Rossum Version: 3.06 Last-modified: 27-Oct-1993 FAQ: Audio File Formats ======================= Table of contents ----------------- Introduction Device characteristics Popular sampling rates Compression schemes Current hardware File formats File conversions Playing audio files on UNIX Playing audio files on micros The Sound Site Newsletter Posting sounds Appendices (in part 2): FTP access for non-internet sites AIFF Format (Audio IFF) The NeXT/Sun audio file format IFF/8SVX Format Playing sound on a PC The EA-IFF-85 documentation US Federal Standard 1016 availability Creative Voice (VOC) file format RIFF WAVE (.WAV) file format U-LAW and A-LAW definitions AVR File Format The Amiga MOD Format Introduction ------------ This is version 3 of this FAQ, which I started in November 1991 under the name "The audio formats guide". I bumped the major version number again at the occasion of the split in two parts: part one is the main text and part two consists of the collection of appendices. I am posting this about once a fortnight, either unchanged (just to inform new readers), or updated (if I learn more or when new hardware or software becomes popular). I post to alt.binaries.sounds.{misc,d} and to comp.dsp, for maximal coverage of people interested in audio, and to {news,comp}.answers, for easy reference. The entire FAQ is also available by anonymous ftp from ftp.cwi.nl [192.16.184.180], directory pub/audio, files AudioFormats.{part1,part2}. BTW: All FAQs, including this one, are available for anonymous ftp on the archive site rtfm.mit.edu in directory /pub/usenet/news.answers/. The name under which a FAQ is archived appears in the "Archive-Name:" line at the top of the article. This FAQ is archived as audio-fmts/part[12]. A companion posting with subject "Changes to: ..." is occasionally posted listing the diffs between a new version and the last. This is not reposted, and it is suppressed when the diffs are bigger than the new version. Send updates, comments and questions to . I'd like to thank everyone who sent updates in the past. --Guido van Rossum, CWI, Amsterdam Device characteristics ---------------------- In this text, I will only use the term "sample" to refer to a single output value from an A/D converter, i.e., a small integer number (usually 8 or 16 bits). Audio data is characterized by the following parameters, which correspond to settings of the A/D converter when the data was recorded. Naturally, the same settings must be used to play the data. - sampling rate (in samples per second), e.g. 8000 or 44100 - number of bits per sample, e.g. 8 or 16 - number of channels (1 for mono, 2 for stereo, etc.) Approximate sampling rates are often quoted in Hz or kHz ([kilo-] Hertz), however, the politically correct term is samples per second (samples/sec). Sampling rates are always measured per channel, so for stereo data recorded at 8000 samples/sec, there are actually 16000 samples in a second. I will sometimes write 8 k as a shorthand for 8000 samples/sec. Multi-channel samples are generally interleaved on a frame-by-frame basis: if there are N channels, the data is a sequence of frames, where each frame contains N samples, one from each channel. (Thus, the sampling rate is really the number of *frames* per second.) For stereo, the left channel usually comes first. The specification of the number of bits for U-LAW (pronounced mu-law -- the u really stands for the Greek letter mu) samples is somewhat problematic. These samples are logarithmically encoded in 8 bits, like a tiny floating point number; however, their dynamic range is that of 12 bit linear data. Source for converting to/from U-LAW (written by Jef Poskanzer) is distributed as part of the SOX package mentioned below; it can easily be ripped apart to serve in other applications. The official definition is the CCITT standard G.711. There exists another encoding similar to U-LAW, called A-LAW, which is used as a European telephony standard. There is less support for it in UNIX workstations. (See the Appendix for some formulae describing U-LAW and A-LAW.) Popular sampling rates ---------------------- Some sampling rates are more popular than others, for various reasons. Some recording hardware is restricted to (approximations of) some of these rates, some playback hardware has direct support for some. The popularity of divisors of common rates can be explained by the simplicity of clock frequency dividing circuits :-). Samples/sec Description 5500 One fourth of the Mac sampling rate (rarely seen). 7333 One third of the Mac sampling rate (rarely seen). 8000 Exactly 8000 samples/sec is a telephony standard that goes together with U-LAW (and also A-LAW) encoding. Some systems use an slightly different rate; in particular, the NeXT workstation uses 8012.8210513, apparently the rate used by Telco CODECs. 11 k Either 11025, a quarter of the CD sampling rate, or half the Mac sampling rate (perhaps the most popular rate on the Mac). 16000 Used by, e.g. the G.722 compression standard. 18.9 k CD-ROM/XA standard. 22 k Either 22050, half the CD sampling rate, or the Mac rate; the latter is precisely 22254.545454545454 but usually misquoted as 22000. (Historical note: 22254.5454... was the horizontal scan rate of the original 128k Mac.) 32000 Used in digital radio, NICAM (Nearly-Instantaneous Companded Audio Multiplex [IBA/BREMA/BBC]) and other TV work, at least in the UK; also long play DAT and Japanese HDTV. 37.8 k CD-ROM/XA standard for higher quality. 44056 This weird rate is used by professional audio equipment to fit an integral number of samples in a video frame. 44100 The CD sampling rate. (DAT players recording digitally from CD also use this rate.) 48000 The DAT (Digital Audio Tape) sampling rate for domestic use. Files samples on SoundBlaster hardware have sampling rates that are divisors of 1000000. While professinal musicians disagree, most people don't have a problem if recorded sound is played at a slightly different rate, say, 1-2%. On the other hand, if recorded data is being fed into a playback device in real time (say, over a network), even the smallest difference in sampling rate can frustrate the buffering scheme used... There may be an emerging tendency to standardize on only a few sampling rates and encoding styles, even if the file formats may differ. The suggested rates and styles are: rate (samp/sec) style mono/stereo 8000 8-bit U-LAW mono 22050 8-bit linear unsigned mono and stereo 44100 16-bit linear signed mono and stereo Compression schemes ------------------- Strange though it seems, audio data is remarkably hard to compress effectively. For 8-bit data, a Huffman encoding of the deltas between successive samples is relatively successful. For 16-bit data, companies like Sony and Philips have spent millions to develop proprietary schemes. Information about PASC (Philips' scheme) can be found in Advanced Digital Audio by Ken C. Pohlmann. Public standards for voice compression are slowly gaining popularity, e.g. CCITT G.721 (ADPCM at 32 kbits/sec) and G.723 (ADPCM at 24 and 40 kbits/sec). (ADPCM == Adaptive Delta Pulse Code Modulation.) Sun Microsoft has placed the source code of a portable implementation of these algorithms (as well as G.711, which defines A-LAW and U-LAW) in the public domain (needless to say, their proprietary implementation distributed in binary form with Solaris is better :-). One place to ftp this source code from is ftp.cwi.nl:/pub/audio/ccitt-adpcm.tar.Z. Source for another 32 kbits/sec ADPCM implementation, assumed to be compatible with Intel's DVI audio format, can be ftp'ed from ftp.cwi.nl:/pub/audio/adpcm.shar. (** NOTE: if you are using v1.0, you should get v1.1, released 17-Dec-1992, which fixes a serious bug -- the quality of v1.1 is claimed to be better than U-LAW **) GSM 06.10 is a speech encoding in use in Europe that compresses 160 13-bit samples into 260 bits (or 33 bytes), i.e. 1650 bytes/sec (at 8000 samples/sec). A free implementation can be ftp'ed from tub.cs.tu-berlin.de, file /pub/tubmik/gsm-1.0.tar.Z. There are also two US federal standards, 1016 (Code excited linear prediction (CELP), 4800 bits/s) and 1015 (LPC-10E, 2400 bits/s). See also the appendix for 1016. Tony Robinson has written a good FAST loss-less compression for lots of different audio formats (particularly good for WAV and MOD files). The software is available by anonymous ftp from svr-ftp.eng.cam.ac.uk [129.169.24.20], directory misc, file shorten-1.08.tar.Z. (Note that U-LAW and silence detection can also be considered compression schemes.) Here's a note about audio codings by Van Jacobson : Several people used the words "LPC" and "CELP" interchangably. They are very different. An LPC (Linear Predictive Coding) coder fits speech to a simple, analytic model of the vocal tract, then throws away the speech & ships the parameters of the best-fit model. An LPC decoder uses those parameters to generate synthetic speech that is usually more-or-less similar to the original. The result is intelligible but sounds like a machine is talking. A CELP (Code Excited Linear Predictor) coder does the same LPC modeling but then computes the errors between the original speech & the synthetic model and transmits both model parameters and a very compressed representation of the errors (the compressed representation is an index into a 'code book' shared between coders & decoders -- this is why it's called "Code Excited"). A CELP coder does much more work than an LPC coder (usually about an order of magnitude more) but the result is much higher quality speech: The FIPS-1016 CELP we're working on is essentially the same quality as the 32Kb/s ADPCM coder but uses only 4.8Kb/s (the same as the LPC coder). The comp.compression FAQ has some text on the 6:1 audio compression scheme used by MPEG (a video compression standard-to-be). It's interesting to note that video compression reaches much higher ratios (like 26:1). This FAQ is ftp'able from rtfm.mit.edu [18.72.1.58] in directory /pub/usenet/news.answers/compression-faq, files part1 and part2. Comp.compression also carries a regular posting "How to uncompress anything" by David Lemson , which (tersely) hints on which program you need to uncompress a file whose name ends in . for almost any conceivable . Ftp'able from ftp.cso.uiuc.edu (128.174.5.59) in the directory /doc/pcnet as the file compression. Documentation on a digital cellular telephone system by Qualcomm Inc. can be ftp'ed from ftp.qualcomm.com:/pub/cdma; the vocoder is in appendix A. Apple has an Audio Compression/Expansion scheme called ACE (on the GS) / MACE (on the Macintosh). It's a lossy scheme that attempts to predict where the wave will go on the next sample. There's very little quality change on 8:4 compression, somewhat more for 8:3. It does guarantee exactly 50% or 62.5% compression, though. I believe MACE uses larger ratios/more loss, but I'm unsure of the specific numbers. (Marc Sira) Current hardware ---------------- I am aware of the following computer systems that can play back and (sometimes) record audio data, with their characteristics. Note that for most systems you can also buy "professional" sampling hardware, which supports much better quality, e.g. >= 44.1 k 16 bits stereo. The characteristics listed here are a rough estimate of the capabilities of the basic hardware only (and even here I am on thin ice, with systems becoming ever more powerful). machine bits max sampling rate #output channels Mac (all types) 8 22k 1 Mac (newer ones) 16 64k 4(128) Apple IIgs 8 32k / >70k 16(st) PC/soundblaster pro 8 ?/(22k st, 44.1k mo) 1(st) PC/soundblaster 16 16 44.1k 1(st) PC/pas 8 44.1k st, 88.2k mo 1(st) PC/pas-16 16 44.1k st, 88.2k mo 1(st) PC/turtle beach multisound 16 44.1k 1(st) PC/cards with aria chipset 16 44.1k 1(st) PC/roland rap-10 16 44.1k 1(st) PC/gravis ultrasound 8/16 44.1k 14-32(st) Atari ST 8 22k 1 Atari STE,TT 8 50k 2 Atari Falcon 030 16 50k 8(st) Amiga 8 varies above 29k 4(st) Sun Sparc U-LAW 8k 1 Sun Sparcst. 10 U-LAW,8,16 48k 1(st) NeXT U-LAW,8,16 44.1k 1(st) SGI Indigo 8,16 48k 4(st) SGI Indigo2,Indy 8,16 48k 16(st,4-channel) Acorn Archimedes ~U-LAW ~180k 8(st) Sony NWS-3xxx U,A,8,16 8-37.8k 1(st) Sony NWS-5xxx U,A,8,16 8-48k 1(st) VAXstation 4000 U-LAW 8k 1 DEC 3000/300-500 U-LAW 8k 1 DEC 5000/20-25 U-LAW 8k 1 Tandy 1000/*L* 8 22k 3 Tandy 2500 8 22k 3 HP9000/705,710,425e U,A-LAW,16 8k 1 HP9000/715,725,735 U,A-LAW,16 48k 1(st) HP9000/755 option: U,A-LAW,16 48k 1(st) NCD MCX terminal U,A,8,16 ? 1(st) 4(st) means "four voices, stereo"; sampling rates xx/yy are different recording/playback rates; *L* is any type with 'L' in it. All these machines can play back sound without additional hardware, although the needed software is not always standard; also, some machines need external hardware to record sound (or to record at higher quality, like the NeXT, whose built-in sampling hardware only does 8000 samples/sec in U-LAW). Please don't send me details on optional or 3rd party hardware, there is too much and it is really beyond the scope of this FAQ. In particular, there is a separate newsgroup devoted to PC sound cards: comp.sys.ibm.pc.soundcard, which includes FAQ of its own (also posted to comp.answers and news.answers). The new VAXstation 4000 (VLC and model 60) series lets you PLAY audio (.au) files, and the package DECsound will let you do the recording. In fact, DECsound is given away free with Motif 1.1 and supports the VAXstation, Sun SPARCstation, DECvoice, and DECaudio devices. Sun sound files work without change. The Alpha systems (DEC 3000 Model 300, 400, 500) also have DECsound bundled with Motif. Notes for the DECstation 5000/20-25: You need either XMedia tools from DEC ($$$$), or the AudioFile package (which works nicely) from crl.dec.com (see below). The audio device is "/dev/bba", you cannot send ".au" files directly to the device, the Xmedia/AF software provide an "audioserver" which must be run to play/record sounds. The SGI Personal IRIS 4D/30 and 4D/35 have the same capabilities as the Indigo. The audio board was optional on the 4D/30. The Indigo2 and Indy features are a superset of the Indigo features. The new Apple Macs have more powerful audio hardware; the latest models have built-in microphones. Software exists for the PC that can play sound on its 1-bit speaker using pulse width modulation (see appendix); the Soundblaster board records at rates up to 13 k and plays back up to 22 k (weird combination, but that's the way it is). Here's some info about the newest Atari machine, the Falcon030. This machine has stereo 16 bit CODECs and a 32 MHz Motorola 56001 that can handle 8 channels of 16 bit audio, up to 50 khz/channel with simultaneous playback and record. The Falcon DMA sound engine is also compatible with the 8 bit stereo DMA used on the STe and TT. All of these systems use signed data. On the NeXT, the Motorola 56001 DSP chip is programmable and you can (in principle) do what you want. The SGI Indigo uses the same DSP chip but it can't be programmed by users -- SGI prefers to offer it as a shared system resource to multiple applications, thus enabling developers to program audio with their Audio Library and avoid code modifications for execution on future machines with different audio hardware, i.e. a different DSP. For example, the Indigo2 and Indy do not have a DSP chip. The Amiga also has a 6-bit volume, which can be used to produce something like a 14-bit output for each voice. The hardware can also use one of each voice-pair to modulate the other in FM (period) or AM (volume, 6-bits). The Acorn Archimedes uses a variation on U-LAW with the bit order reversed and the sign bit in bit 0. Being a 'minority' architecture, Arc owners are quite adept at converting sound/image formats from other machines, and it is unlikely that you'll ever encounter sound in one of the Arc's own formats (there are several). The NCD MCX terminal has audio integrated with its X server. The NCDAudio server is an extension of the X server, working together with it, with stress on the networking capability of sound transmission. The NCDAudio API provides format handling (ULAW8, Linear Unsig 8, Linear Sig 8, Linear Sig 16 MSB, Linear Unsig 16 MSB), flowing (to the server, from the server, to the i/o, from the i/o), wave form generators (Square, Sine, Saw, Constant) and the capability of area broadcast using UDP. Provision for manipulating data files (SND, WAV, VOC & AU) is also provided. CD-I machines form a special category. The following formats are used: - PCM 44.1 kHz standard CD format - ADPCM - Addaptive Delta PCM - Level A 37.8 kHz 8-bit - Level B 37.8 kHz 4-bit - Level C 18.9 kHz 4-bit File formats ------------ Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however. File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and "raw" formats, where the device parameters and encoding are fixed. Self-describing file formats generally define a family of data encodings, where a header fields indicates the particular encoding variant used. Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample). The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g. a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly spoken, channel interleaving is also part of the encoding, although so far I have seen little variation in this area. Some file formats apply some kind of compression to the data, e.g. Huffman encoding, or simple silence deletion. Here's an overview of popular file formats. Self-describing file formats ---------------------------- extension, name origin variable parameters (fixed; comments) .au or .snd NeXT, Sun rate, #channels, encoding, info string .aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info .aif(f), AIFC Apple, SGI same (extension of AIFF with compression) .iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits) .voc Soundblaster rate (8 bits/1 ch; can use silence deletion) .wav, WAVE Microsoft rate, #channels, sample width, lots of info .sf IRCAM rate, #channels, encoding, info none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression) none, MIME Internet (see below) none, NIST SPHERE DARPA speech community (see below) .mod or .nst Amiga (see below) Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format, or even a headerless Amiga format. I know nothing for sure about the origin of HCOM files, only that there are a lot of them floating around on our system and probably at FTP sites over the world. The filenames usually don't have a ".hcom" extension, but this is what SOX (see below) uses. The file format recognized by SOX includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it it is 8 bits unsigned data. IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc. AIFF, AIFC and WAVE are similar in spirit but allow more freedom in encoding style (other than 8 bit/sample), amongst others. There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS. Appendices describes the NeXT and VOC formats; pointers to more info about AIFF, AIFC, 8SVX and WAVE (which are too complex to describe here) are also in appendices. DEC systems (e.g. DECstation 5000) use a variant of the NeXT format that uses little-endian encoding and has a different magic number (0x0064732E in little-endian encoding). Standard file formats used in the CD-I world are IFF but on the disc they're in realtime files. An interesting "interchange format" for audio data is described in the proposed Internet Standard "MIME", which describes a family of transport encodings and structuring devices for electronic mail. This is an extensible format, and initially standardizes a type of audio data dubbed "audio/basic", which is 8-bit U-LAW data sampled at 8000 samples/sec. The "IRCAM" sound file system has now been superseded by the so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software release. More recently, there has been an effort at Princeton (Prof. Paul Lansky) and Stanford (Stephen Travis Pope) to standardize several extensions to BICSF. A description of BICSF and the Princeton/Stanford extensions is available by anonymous ftp from ftp.cwi.nl [192.16.184.180], in directory /pub/audio/BICSF-info. This file contains further ftp pointers to software. A sound file format popular in the DARPA speech community is the NIST SPHERE standard. The most recent version of the SPHERE package is available via anonymous ftp from jaguar.ncsl.nist.gov [129.6.48.157] in compressed tar form as "sphere-v.tar.Z" (where "v" is the version code). The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. I have placed a short description of NIST SPHERE on ftp.cwi.nl:/pub/audio/NIST-SPHERE. Finally, a somewhat different but popular format are "MOD" files, usually with extension ".mod" or ".nst" (they can also have a prefix of "mod."). This originated at the Amiga but players now exist for many platforms. MOD files are music files containing 2 parts: (1) a bank of digitized samples; (2) sequencing information describing how and when to play the samples. See the appendix "The Amiga MOD Format" for a description of this file format (and pointers to ftp'able players and example MOD files). Headerless file formats ----------------------- extension origin parameters or name .snd, .fssd Mac, PC variable rate, 1 channel, 8 bits unsigned .ul US telephony 8 k, 1 channel, 8 bit "U-LAW" encoding .snd? Amiga variable rate, 1 channel, 8 bits signed It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b ) SOX/DOS MAC Sound Format file ext type Mac program to convert to 'snd' ---------------------- -------- ---- ------------------------------- Mac snd .snd sfil [n/a] Amiga IFF/8SVX .iff AmigaSndConverter, BST Amiga SoundTracker .mod STrk ModVoicer Audio IFF .aiff AIFF SoundExtractor, Sample Editor, UUTool, BST, M5Mac DSP Designer DSPs SoundHack IRCAM .sf IRCM SoundHack MacMix MSND SoundHack RIFF WAVE .wav SoundExtractor, BST, Balthazar SoundBlaster .voc SoundExtractor, BST SoundDesigner/AudioMedia Sd2f SoundHack Sound[Edit|Cap|Wave] .hcom FSSD SoundExtractor, SoundEdit, Wavicle, BST Sun uLaw/Next .snd .au/.snd NxTS SoundExtractor, SoundHack, au<->snd, UUTool, BST File conversions ---------------- SOX (UNIX, PC, Amiga) --------------------- The most versatile tool for converting between various audio formats is SOX ("Sound Exchange"). It can read and write various types of audio files, and optionally applies some special effects (e.g. echo, channel averaging, or rate conversion). SOX recognizes all filename extensions listed above except ".snd", which would be ambiguous anyway, and ".wav" (but there's a patch, see below). Use type ".au" for NeXT ".snd" files. Mac and PC ".snd" files are completely described by these parameters: -t raw -b -u -r 11000 (or -r 22000 or -r 7333 or -r 5500; 11000 seems to be the most common rate). The source for SOX, version 6, platchlevel 8, was posted to alt.sources, and should be widely archived. (Patch 9 was posted later and incporporates some important .wav fixes.) To save you the trouble of hunting it down, it can be gotten by anonymous ftp from wuarchive.wustl.edu, in the directory usenet/alt.sources/articles, files 7288.Z through 7295.Z. (These files are compressed news articles containing shar files, if you hadn't guessed.) I am sure many sites have similar archives, I'm just listing one that I know of and which carries a lot of this kind of stuff. (Also see the appendix if you don't have Internet access.) A compressed tar file containing the same version of SOX is available by anonymous ftp from ftp.cwi.nl [192.16.184.180], in directory /pub/audio/sox7.tar.Z. You may be able to locate a nearer version using archie! Ports of SOX: - The source as posted should compile on any UNIX and PC system. - A PC version is available by ftp from ftp.cwi.nl (see above) as pub/audio/sox5dos.zip; also available from the garbo mail server. - The latest Amiga SOX is available via anonymous ftp to wuarchive.wustl.edu, files systems/amiga/audio/utils/amisox*. (See below for a non-SOX solution.) The final release of r6 will compile as distributed on the Amiga with SAS/C version 6. Binaries (since many Amiga users do not own compilers) will continue to be available for FTP. SOX usage hints: - Often, the filename extension of sound files posted on the net is wrong. Don't give up, try a few other possibilities using the "-t " option. Remember that the most common file type is unsigned bytes, which can be indicated with "-t ub". You'll have to guess the proper sampling rate, but often it's 11k or 22k. - In particular, with SOX version 4 (or earlier), you have to specify "-t 8svx" for files with an .iff extension. - When converting linear samples to U-LAW using the .au type for the output file, you must specify "-U" for the output file, otherwise you will end up with a file containing a NeXT/Sun header but linear samples -- only the NeXT will play such files correctly. Also, you must explicitly specify an output sampling rate with "-r 8000". (This may seem fixed for most cases in version 5, but it is still occasionally necessary, so I'm keeping this warning in.) Sun Sparc --------- On Sun Sparcs, starting at SunOS 4.1, a program "raw2audio" is provided by Sun (in /usr/demo/SOUND -- see below) which takes a raw U-LAW file and turns it into a ".au" file by prefixing it with an appropriate header. NeXT ---- On NeXTs, you can usually rename .au files to .snd and it'll work like a charm, but some .au files lack header info that the NeXT needs. This can be fixed by using sndconvert: sndconvert -c 1 -f 1 -s 8012.8210513 -o nextfile.snd sunfile.au SGI Indigo, Indigo2, Indy and Personal IRIS ------------------------------------------- SGI supports "soundfiler" (in /usr/sbin), a program similar in spirit to SOX but with a GUI. Soundfiler plays aiff, aifc, NeXT/Sun and .wav formats. It can do conversions between any of these formats and to and from raw formats including mulaw. It also does sample rate conversions. Three shell commands are also provided that give the same functionality: "sfplay", "sfconvert", and "aifcresample" (all in /usr/sbin). Amiga ----- Mike Cramer's SoundZAP can do no effects except rate change and it only does conversions to IFF, but it is generally much faster than SOX. (Ftp'able from the same directory as amisox above.) Newer versions of OmniPlay (see below) will also convert to IFF. Tandy ----- The Tandy 1000 uses a (proprietary?) compressed format. There is a PD Mac to Tandy conversion program called CONVERT. Leonard Erickson writes: There is a WAV driver from Tandy if people ask. There also appears to be a program that purports to convert other formats to Tandy, but I haven't tested this one yet. Apple Macintosh --------------- Bill Houle sent the following list: Popular commercial apps are indicated with a [*]. All other programs mentioned are shareware/freeware available from SUMEX and the various mirror sites, or check archie for the nearest FTP location. MAC SOUND CONVERSION PROGRAMS SoundHack [Tom Erbe, tom@mills.edu] Can read/write Sound Designer II, Audio IFF, IRCAM, DSP Designer and NeXT .snd (or Sun .au); 8-bit uLaw, 8-bit linear, 32-bit floating point and 16-bit linear data encoding. Can read (but not write) raw data files. Implements soundfile convolution, a phase vocoder, a binaural filter and an amplitude analysis & gain change module. SoundExtractor [Alberto Ricci, FRicci@polito.it] Extracts 'snd' resources, AIFF, SoundEdit, VOC, and WAV data from practically anything, converting to 'snd' files. Balthazar [Craig Marciniak, AOL:TemplarDev] Converts WAV files to 'snd'. Brian's Sound Tool [Brian Scott, bscott@ironbark.ucnv.edu.au] Converts 'snd' or SoundEdit to WAV. Can also convert WAV, VOC, AIFF, Amiga 8SVX and uLaw to 'snd'. AmigaSndConverter [Povl H. Pederson, eco861771@ecostat.aau.dk] Converts Amiga IFF/8SVX to Mac 'snd'. au<->Mac [Victor J. Heinz, vic:wbst128@xerox.com] Converts Sun uLaw to Mac 'snd'. ULAW [Rod Kennedy, rod@faceng.anu.edu.au] Converts 'snd' to Sun uLaw. UUTool [Bernie Wieser, wieser@acs.ucalgary.ca] Primarily a uuencode/decode program, but in true Swiss Army Knife fashion can also read/write Sun uLaw, AIFF, and 'snd' files. ModVoicer [Kip Walker, Kip_Walker@mcimail.com] Converts Amiga MOD voices into SoundEdit files or 'snd' resources. Music 5 Mac [Simone Bettini, space@maya.dei.unipd.it] Primarily a Music Synthesis system, but can also convert between 'snd', AIFF, and IBM .DAT(?). Playing audio files on UNIX --------------------------- The commands needed to play an audio file depend on the file format and the available hardware and software. Most systems can only directly play sound in their native format; use a conversion program (see above) to play other formats. Sun Sparcstation running SunOS 4.x ---------------------------------- Raw U-LAW files can be played using "cat file >/dev/audio". A whole package for dealing with ".au" files is provided by Sun on an experimental basis, in /usr/demo/SOUND. You may have to compile the programs first. (If you can't find this directory, either you are not running SunOS 4.1 yet, or your system administrator hasn't installed it -- go ask him for it, not me!) The program "play" in this directory recognizes all files in Sun/NeXT format, but a SS 1 or 2 can play only those using U-LAW encoding at 8 k -- the SS 10 hardware plays other encodings, too. If you ca't find "play", you can also cat a ".au" file to /dev/audio, if it uses U-LAW; the header will sound like a short burst of noise but the rest of the data will sound OK (really, the only difference in this case between raw U-LAW and ".au" files is the header; the U-LAW data is exactly the same). Finally, OpenWindows 3.0 has a full-fledged audio tool. You can drop audio file icons into it, edit them, etc. Sun Sparcstation running Solaris 2.0 ------------------------------------ Under SVR4 (and hence Solaris 2.0), writing to /dev/audio from the shell is a bad idea, because the device driver will flush its queue as soon as the file is closed. Use "audioplay" instead. The supported formats and sampling rates are the same as above. NeXT ---- On NeXT machines, the standard "sndplay" program can play all NeXT format files (this include Sun ".au" files). It supports at least U-LAW at 8 k and 16 bits samples at 22 or 44.1 k. It attempts on-the-fly conversions for other formats. Sound files are also played if you double-click on them in the file browser. SGI Indigo, Indigo2, Indy and Personal IRIS ------------------------------------------- On SGI Indigo, Indigo2, Indy and the 4D/30 and /35 Personal IRIS workstations, "WorkSpace" plays audio files in .aiff, .aifc, .au, and .wav formats if you double click them and the sampling rate is one of 8000, 11025, 16000, 22050, 32000, 44100, or 48000. On the Personal IRIS, you need to have the audio board installed (check the output from hinv) and you must run IRIX 3.3.2 or 4.0 or higher. These files can also be played with "soundfiler" and "sfplay". ".aiff" and ".aifc" files at the above sampling rates can also be played with playaifc. (All in /usr/sbin) There is no simple /dev/audio interface on these SGI machines. (There was one on 4D/25 machines, reading and writing signed linear 8-bit samples at rates of 8, 16 and 32 k.) A program "playulaw" was posted as part of the "radio 2.0" release that I posted to several source groups; it plays raw U-LAW files on the Indigo, Indigo2, Indy or Personal IRIS audio hardware. Sony NEWS --------- The whole current Sony NEWS line (laptop, desktop, server) have builtin sound capabilities. You can buy an external board for the older NEWS machines. In the default mode (8k/8-bit mulaw), Sun .au files are directly supported (you can 'cat' .au files to /dev/sb0 and have them play.) The /usr/sony/bin/sbplay command on NEWS-OS 6.0 also supports Sun .au files. Others ------ Most other UNIX boxes don't have audio hardware and thus can't play audio data. This is actually rapidly changing and most new hardware that hits the market has some form of audio support. Unfortunately there is no single portable interface for audio that comes near the acceptance and functionality (let alone code size :-) of X11 for graphics. There are at least two network-transparent packages, both in some way based on the X11 architecture, that attempt to fillo the gap: DEC CRL's AudioFile supports Digital RISC systems running Ultrix, Digital Alpha AXP systems running OSF/1, Sun Sparcs, and SGI AL-capable systems (e.g., Indigo, Indy). The source kit is located at ftp site crl.dec.com [192.58.206.2] in /pub/DEC/AF. NCD's NetAudio supports NCD's MCX line of X terminals as well as Suns, using the /dev/audio interface (they claim it should be easy to port). The source it located at ftp.x.org [198.112.44.100] in contrib/netaudio. Playing audio files on the Vaxstation 4000 (VMS) ------------------------------------------------ 1) Without DECsound ".au" files can be played by COPYING them to device "SOA0:". This device is set up by enabling the driver SODRIVER. You can use the following command file: $!---------------- cut here ------------------------------- $! sound_setup.com enable SOUND driver $ run sys$system:sysgen connect soa0 /adapter=0 /csr=%x0e00 /vector=%o304 /driver=sodriver exit $ exit $!----------------- cut here ------------------------------------ 2) With DECsound (bundled with motif) Just start DECsound by selecting it from the session manager in the applications menu. (Not there use "@vue$library:sound$vue_startup"). Make sure settings; device type (vaxstation 4000) and play settings (headphone jack) are selected. To play files from the DCL prompt (handy if you want to play sounds on a remote workstation) set a symbol up as follows; PLAY == "$DECSOUND -VOLUME 50 -PLAY" usage; DCL> play sound.au 3) Audio port The external audio port comes with a telephone-jack-like port. For starters, you can plug a telephone RECEIVER right into this port to hear your first sound files. After that, you can use the adapter (that came with the VaxStation), and plug in a small set of stereo speakers or headphones (the kind you'd plug into a WALKMAN, for example), for more volume. The adapter also has a microphone plug so that you can record sounds if DECsound is installed. Playing audio files on micros ----------------------------- Most micros have at least a speaker built in, so theoretically all you need is the right software. Unfortunately most systems don't come bundled with sound-playing software, so there are many public domain or shareware software packages, each with their own bugs and features. Most separate sound recording hardware also comes with playing software, most of which can play sound (in the file format used by that hardware) even on machines that don't have that hardware installed. PC or compatible ---------------- Chris S. Craig announces the following software for PCs: ScopeTrax This is a complete PC sound player/editor package. Sounds can be played back at ANY rate between 1kHz to 65kHz through the PC speaker or the Sound Blaster. It supports several file formats including VOC, IFF/8SVX, raw signed and raw unsigned. A separate executable is provided to convert .au and mu-law to raw format. ScopeTrax requires EGA/VGA graphics for editing and displaying sounds on a REALTIME oscilloscope. The package also includes: * An expanded memory player which can play sounds larger than 640K in size. * Basic (rough) sound compression/uncompression utilities. * Complete documentation. The package is FREEWARE! It is available on SIMTEL in the PD1:[MSDOS.SOUND] directory. One of the appendices below contains a list of more programs to play sound on the PC. Atari ----- For sounds on Atari STs - programs are in the atari/sound/players directory on atari.archive.umich.edu (141.211.164.8). Tandy ----- On a Tandy 1000, sounds can be played and recorded with DeskMate Sound (SOUND.PDM), or if they not stored in compressed format, they can also be played be a program called PLAYSND. No indication of whether PLAYSND is PD or not. It hasn't been updated since March of 89. Amiga ----- On the Amiga, OmniPlay by David Champion plays and converts IFF-8SVX, AIFF, WAV, VOC, .au, .snd, and 8 bit raw (signed, unsigned, u-law) samples. As of version 1.23, OmniPlay will also convert any playable sample to 8SVX. Files: wuarchive.wustl.edu in /systems/amiga/audio/sampleplayers/oplay123.lha (?) amiga.physik.unizh.ch in mus/play/oplay123.lha Apple Macintosh --------------- Malcolm Slaney from Apple writes: "We do have tools to play sound back on most of our Unix hosts. We wrote a program called TcpPlay that lets us read a sound file on a Unix host, open a TCP/IP connection to the Mac on my desk, and plays the file. We think of it as X windows for sound (at least a step in that direction.) This software is available for anonymous FTP from ftp.apple.com [IP address 130.43.2.3 -- Guido]. Look for ~ftp/pub/TcpPlay/TcpPlay.sit.hqx. Finally, there are MANY tools for working with sound on the Macintosh. Three applications that come to mind immediately are SoundEdit (formerly by Farralon and now by MacroMind/Paracomp), Alchemy and Eric Keller's Signalyze. There are lots of other tools available for sound editing (including some of the QuickTime Movie tools.)" Bill Houle sent the following lists: Popular commercial apps are indicated with a [*]. All other programs mentioned are shareware/freeware available from SUMEX and the various mirror sites, or check archie for the nearest FTP location. MAC SOUND EDITORS Sample Editor [Garrick McFarlane, McFarlaneGA@Kirk.Vax.Aston.Ac.UK] Plays AIFF and 'snd' sounds. Can convert between AIFF and 'snd'. Can record from built-in mic. Can add effects such as fade, normalize, delay, etc. Wavicle [Lee Fyock] Plays SoundEdit files. Can convert to 'snd'. Can record from built-in mic. Can add effects such as fade, filter, reverb, etc. [*]SoundEdit/SoundEdit Pro [Farallon/MacroMind*Paracomp] Plays SoundEdit and 'snd' sounds. Can read/write SoundEdit files and 'snd' sounds. Can record from built-in mic. Can add effects such as echo, filter, reverb, etc. MAC SOUND PLAYERS Sound-Tracker [Frank Seide] Plays Amiga SoundTracker files in foreground or background. Macintosh Tracker [Thomas R. Lawrance, tomlaw@world.std.com] Plays Amiga SoundTracker files in foreground or background. A port of Marc Espie's Unix Tracker version with Frank Seide's core player thrown in for good measure. The Player [Antoine Rosset & Mike Venturi] Plays AIFF, SoundEdit, MOD, and 'snd' files. SoundMaster (aka [*]Kaboom!) [Bruce Tomlin] Associates SoundEdit files to MacOS events. SndControl [Riccardo Ettore, 72277.1344@compuserve.com] Associates 'snd' sounds to MacOS events. Canon 2 [Glenn Anderson, glenn@otago.ac.nz; Jeff Home, jeff@otago.ac.nz] Plays AIFF or 'snd' files in foreground or background. The Sound Site Newsletter ------------------------- An electronic publication with lots of info about digitised sound and sound formats, albeit mostly on PCs, is "The Sound Site Newsletter", maintained by David Komatsu . Issue 14 appeared in July 1993. As of that issue, the Sound Site Newsletter has expanded its charter to include commercial products and will appear monthly. There is now also a sound site network of ftp servers, bulletin boards and authors. The Sound Site Newsletter (once again!) has its own ftp site: sound.usach.cl. The Sound Newsletter is posted to: comp.sys.ibm.pc.soundcard comp.sys.ibm.pc.misc rec.games.misc FTP: oak.oakland.edu (misc/sound) garbo.uwasa.fi (pc/sound) sound.usach.cl (pub/Sound/Newsltr) [Home Base] Posting sounds -------------- The newsgroup alt.binaries.sounds.misc is dedicated to postings containing sound. (Discussions related to such postings belong in alt.binaries.sounds.d.) There is no set standard for posting sounds; uuencoded files in most popular formats are welcome, if split in parts under 50 kBytes. To accomodate automatic decoding software (such as the ":decode" command of the nn newsreader), please place a part indicator of the form (mm/nn) at the end of your subject meaning this is number mm of a total of nn part. It is recommended to post sounds in the format that was used for the original recording; conversions to other formats often lose information and would do people with identical hardware as the poster no favor. For instance, convering 8-bit linear sound to U-LAW loses the lower few bits of the data, and rate changing conversions almost always add noise. Converting from U-LAW to linear requires expansion to 16 bit samples if no information loss is allowed! U-LAW data is best posted with a NeXT/Sun header. If you have to post a file in a headerless format (usually 8-bit linear, like ".snd"), please add a description giving at least the sampling rate and whether the bytes are signed (zero at 0) or unsigned (zero at 0200). However, it is highly recommended to add a header that indicates the sampling rate and encoding scheme; if necessary you can use SOX to add a header of your choice to raw data. Compression of sound files usually isn't worth it; the standard "compress" algorithm doesn't save much when applied to sound data (typically at most 10-20 percent), and compression algorithms specifically designed for sound (e.g. NeXT's) are usually proprietary. (See also the section "Compression schemes" earlier.)