Subject: FAQ: Audio File Formats (part 2 of 2) Newsgroups: alt.binaries.sounds.misc,alt.binaries.sounds.d,comp.dsp,news.answers,comp.answers Followup-to: alt.binaries.sounds.d,comp.dsp Reply-to: guido@cwi.nl Approved: news-answers-request@MIT.Edu Archive-name: audio-fmts/part2 Submitted-by: Guido van Rossum Version: 3.05 Last-modified: 27-Sep-1993 Appendices ========== Here are some more detailed pieces of info that I received by e-mail. They are reproduced here virtually without much editing. Table of contents ----------------- FTP access for non-internet sites AIFF Format (Audio IFF) The NeXT/Sun audio file format IFF/8SVX Format Playing sound on a PC The EA-IFF-85 documentation US Federal Standard 1016 availability Creative Voice (VOC) file format RIFF WAVE (.WAV) file format U-LAW and A-LAW definitions AVR File Format The Amiga MOD Format ------------------------------------------------------------------------ FTP access for non-internet sites --------------------------------- From the sci.space FAQ: Sites not connected to the Internet cannot use FTP directly, but there are a few automated FTP servers which operate via email. Send mail containing only the word HELP to ftpmail@decwrl.dec.com or bitftp@pucc.princeton.edu, and the servers will send you instructions on how to make requests. (The bitftp service is no longer available through UUCP gateways due to complaints about overuse :-( ) Also: FAQ lists are available by anonymous FTP from rftm.mit.edu and by email from mail-server@rtfm.mit.edu (send a message containing "help" for instructions about the mail server). ------------------------------------------------------------------------ AIFF Format (Audio IFF) and AIFC -------------------------------- This format was developed by Apple for storing high-quality sampled sound and musical instrument info; it is also used by SGI and several professional audio packages (sorry, I know no names). An extension, called AIFC or AIFF-C, supports compression (see the last item below). I've made a BinHex'ed MacWrite version of the AIFF spec (no idea if it's the same text as mentioned below) available by anonymous ftp from ftp.cwi.nl [192.16.184.180]; the file is /pub/audio/AudioIFF1.2.hqx. A newer version is also available: /pub/audio/AudioIFF1.3.hqx. But you may be better off with the AIFF-C specs, see below. Mike Brindley (brindley@ece.orst.edu) writes: "The complete AIFF spec by Steve Milne, Matt Deatherage (Apple) is available in 'AMIGA ROM Kernal Reference Manual: Devices (3rd Edition)' 1991 by Commodore-Amiga, Inc.; Addison-Wesley Publishing Co.; ISBN 0-201-56775-X, starting on page 435 (this edition has a charcoal grey cover). It is available in most bookstores, and soon in many good librairies." According to Mark Callow (msc@sgi.com): A PostScript version of the AIFF-C specification is available via anonymous ftp on FTP.SGI.COM (192.48.153.1) as /sgi/aiff-c.9.26.91.ps. Benjamin Denckla writes: A piece of information that may be of some use to people who want to use AIFF files with their Macintosh Think C programs: AIFF data structures are contained in the file AIFF.h in the "Apple #Includes" folder that comes on the distribution disks. I found this out a little too late: I had already coded my own structures. I assume that this header file comes with Apple programming products like MPW [C|C++] as well. An important file format for the Mac which is only mentioned once in the FAQ is the Sound Designer II file format. There is also an older Sound Designer I format. I have the SDII format in electronic form but I don't think I'm at liberty to distribute it. It can be obtained by applying to become a 3rd Party Developer for Digidesign. This process is simple (1-page application) and free. Call Digidesign at 415-688-0600 for information. The SDII file format is interesting in that all non-sample data (sample rate, channels, etc.) is contained in the resource fork and the data fork contains sample data only. ------------------------------------------------------------------------ The NeXT/Sun audio file format ------------------------------ Here's the complete story on the file format, from the NeXT documentation. (Note that the "magic" number is ((int)0x2e736e64), which equals ".snd".) Also, at the end, I've added a litte document that someone posted to the net a couple of years ago, that describes the format in a bit-by-bit fashion rather than from C. I received this from Doug Keislar, NeXT Computer. This is also the Sun format, except that Sun doesn't recognize as many format codes. I added the numeric codes to the table of formats and sorted it. SNDSoundStruct: How a NeXT Computer Represents Sound The NeXT sound software defines the SNDSoundStruct structure to represent sound. This structure defines the soundfile and Mach-O sound segment formats and the sound pasteboard type. It's also used to describe sounds in Interface Builder. In addition, each instance of the Sound Kit's Sound class encapsulates a SNDSoundStruct and provides methods to access and modify its attributes. Basic sound operations, such as playing, recording, and cut-and-paste editing, are most easily performed by a Sound object. In many cases, the Sound Kit obviates the need for in-depth understanding of the SNDSoundStruct architecture. For example, if you simply want to incorporate sound effects into an application, or to provide a simple graphic sound editor (such as the one in the Mail application), you needn't be aware of the details of the SNDSoundStruct. However, if you want to closely examine or manipulate sound data you should be familiar with this structure. The SNDSoundStruct contains a header, information that describes the attributes of a sound, followed by the data (usually samples) that represents the sound. The structure is defined (in sound/soundstruct.h) as: typedef struct { int magic; /* magic number SND_MAGIC */ int dataLocation; /* offset or pointer to the data */ int dataSize; /* number of bytes of data */ int dataFormat; /* the data format code */ int samplingRate; /* the sampling rate */ int channelCount; /* the number of channels */ char info[4]; /* optional text information */ } SNDSoundStruct; SNDSoundStruct Fields magic magic is a magic number that's used to identify the structure as a SNDSoundStruct. Keep in mind that the structure also defines the soundfile and Mach-O sound segment formats, so the magic number is also used to identify these entities as containing a sound. dataLocation It was mentioned above that the SNDSoundStruct contains a header followed by sound data. In reality, the structure only contains the header; the data itself is external to, although usually contiguous with, the structure. (Nonetheless, it's often useful to speak of the SNDSoundStruct as the header and the data.) dataLocation is used to point to the data. Usually, this value is an offset (in bytes) from the beginning of the SNDSoundStruct to the first byte of sound data. The data, in this case, immediately follows the structure, so dataLocation can also be thought of as the size of the structure's header. The other use of dataLocation, as an address that locates data that isn't contiguous with the structure, is described in "Format Codes," below. dataSize, dataFormat, samplingRate, and channelCount These fields describe the sound data. dataSize is its size in bytes (not including the size of the SNDSoundStruct). dataFormat is a code that identifies the type of sound. For sampled sounds, this is the quantization format. However, the data can also be instructions for synthesizing a sound on the DSP. The codes are listed and explained in "Format Codes," below. samplingRate is the sampling rate (if the data is samples). Three sampling rates, represented as integer constants, are supported by the hardware: Constant Sampling Rate (samples/sec) SND_RATE_CODEC 8012.821 (CODEC input) SND_RATE_LOW 22050.0 (low sampling rate output) SND_RATE_HIGH 44100.0 (high sampling rate output) channelCount is the number of channels of sampled sound. info info is a NULL-terminated string that you can supply to provide a textual description of the sound. The size of the info field is set when the structure is created and thereafter can't be enlarged. It's at least four bytes long (even if it's unused). Format Codes A sound's format is represented as a positive 32-bit integer. NeXT reserves the integers 0 through 255; you can define your own format and represent it with an integer greater than 255. Most of the formats defined by NeXT describe the amplitude quantization of sampled sound data: Value Code Format 0 SND_FORMAT_UNSPECIFIED unspecified format 1 SND_FORMAT_MULAW_8 8-bit mu-law samples 2 SND_FORMAT_LINEAR_8 8-bit linear samples 3 SND_FORMAT_LINEAR_16 16-bit linear samples 4 SND_FORMAT_LINEAR_24 24-bit linear samples 5 SND_FORMAT_LINEAR_32 32-bit linear samples 6 SND_FORMAT_FLOAT floating-point samples 7 SND_FORMAT_DOUBLE double-precision float samples 8 SND_FORMAT_INDIRECT fragmented sampled data 9 SND_FORMAT_NESTED ? 10 SND_FORMAT_DSP_CORE DSP program 11 SND_FORMAT_DSP_DATA_8 8-bit fixed-point samples 12 SND_FORMAT_DSP_DATA_16 16-bit fixed-point samples 13 SND_FORMAT_DSP_DATA_24 24-bit fixed-point samples 14 SND_FORMAT_DSP_DATA_32 32-bit fixed-point samples 15 ? 16 SND_FORMAT_DISPLAY non-audio display data 17 SND_FORMAT_MULAW_SQUELCH ? 18 SND_FORMAT_EMPHASIZED 16-bit linear with emphasis 19 SND_FORMAT_COMPRESSED 16-bit linear with compression 20 SND_FORMAT_COMPRESSED_EMPHASIZED A combination of the two above 21 SND_FORMAT_DSP_COMMANDS Music Kit DSP commands 22 SND_FORMAT_DSP_COMMANDS_SAMPLES ? [Some new ones supported by Sun. This is all I currently know. --GvR] 23 SND_FORMAT_ADPCM_G721 24 SND_FORMAT_ADPCM_G722 25 SND_FORMAT_ADPCM_G723_3 26 SND_FORMAT_ADPCM_G723_5 27 SND_FORMAT_ALAW_8 Most formats identify different sizes and types of sampled data. Some deserve special note: -- SND_FORMAT_DSP_CORE format contains data that represents a loadable DSP core program. Sounds in this format are required by the SNDBootDSP() and SNDRunDSP() functions. You create a SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension ".lod") with the SNDReadDSPfile() function. -- SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that contain DSP commands created by the Music Kit. Sounds in this format can only be created through the Music Kit's Orchestra class, but can be played back through the SNDStartPlaying() function. -- SND_FORMAT_DISPLAY format is used by the Sound Kit's SoundView class. Such sounds can't be played. -- SND_FORMAT_INDIRECT indicates data that has become fragmented, as described in a separate section, below. -- SND_FORMAT_UNSPECIFIED is used for unrecognized formats. Fragmented Sound Data Sound data is usually stored in a contiguous block of memory. However, when sampled sound data is edited (such that a portion of the sound is deleted or a portion inserted), the data may become discontiguous, or fragmented. Each fragment of data is given its own SNDSoundStruct header; thus, each fragment becomes a separate SNDSoundStruct structure. The addresses of these new structures are collected into a contiguous, NULL-terminated block; the dataLocation field of the original SNDSoundStruct is set to the address of this block, while the original format, sampling rate, and channel count are copied into the new SNDSoundStructs. Fragmentation serves one purpose: It avoids the high cost of moving data when the sound is edited. Playback of a fragmented sound is transparent-you never need to know whether the sound is fragmented before playing it. However, playback of a heavily fragmented sound is less efficient than that of a contiguous sound. The SNDCompactSamples() C function can be used to compact fragmented sound data. Sampled sound data is naturally unfragmented. A sound that's freshly recorded or retrieved from a soundfile, the Mach-O segment, or the pasteboard won't be fragmented. Keep in mind that only sampled data can become fragmented. _________________________ >From mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps Wed Apr 4 23:56:23 EST 1990 Article 5779 of comp.sys.next: Path: mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps >From: eps@toaster.SFSU.EDU (Eric P. Scott) Newsgroups: comp.sys.next Subject: Re: Format of NeXT sndfile headers? Message-ID: <445@toaster.SFSU.EDU> Date: 31 Mar 90 21:36:17 GMT References: <14978@phoenix.Princeton.EDU> Reply-To: eps@cs.SFSU.EDU (Eric P. Scott) Organization: San Francisco State University Lines: 42 In article <14978@phoenix.Princeton.EDU> bskendig@phoenix.Princeton.EDU (Brian Kendig) writes: >I'd like to take a program I have that converts Macintosh sound files >to NeXT sndfiles and polish it up a bit to go the other direction as >well. Two people have already submitted programs that do this (Christopher Lane and Robert Hood); check the various NeXT archive sites. > Could someone please give me the format of a NeXT sndfile >header? "big-endian" 0 1 2 3 +-------+-------+-------+-------+ 0 | 0x2e | 0x73 | 0x6e | 0x64 | "magic" number +-------+-------+-------+-------+ 4 | | data location +-------+-------+-------+-------+ 8 | | data size +-------+-------+-------+-------+ 12 | | data format (enum) +-------+-------+-------+-------+ 16 | | sampling rate (int) +-------+-------+-------+-------+ 20 | | channel count +-------+-------+-------+-------+ 24 | | | | | (optional) info string 28 = minimum value for data location data format values can be found in /usr/include/sound/soundstruct.h Most common combinations: sampling channel data rate count format voice file 8012 1 1 = 8-bit mu-law system beep 22050 2 3 = 16-bit linear CD-quality 44100 2 3 = 16-bit linear ------------------------------------------------------------------------ IFF/8SVX Format --------------- Newsgroups: alt.binaries.sounds.d,alt.sex.sounds Subject: Format of the IFF header (Amiga sounds) Message-ID: <2509@tardis.Tymnet.COM> From: jms@tardis.Tymnet.COM (Joe Smith) Date: 23 Oct 91 23:54:38 GMT Followup-To: alt.binaries.sounds.d Organization: BT North America (Tymnet) The first 12 bytes of an IFF file are used to distinguish between an Amiga picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file conforming to the IFF specification. The middle 4 bytes is the count of bytes that follow the "FORM" and byte count longwords. (Numbers are stored in M68000 form, high order byte first.) ------------------------------------------ FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long. 0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR F O R M 15040 8 S V X V H D R 0010: 00000014 00003A98 00000000 00000000 ......:......... 20 15000 0 0 0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:. 10000 1 0 1.0 B O D Y 15000 0000000..03 = "FORM", identifies this as an IFF format file. FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.) FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice. ????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY. VHDR+00..03 (ULONG) = number of bytes to follow. VHDR+04..07 (ULONG) = samples in the high octave 1-shot part. VHDR+08..0B (ULONG) = samples in the high octave repeat part. VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0. VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.) VHDR+12 (UBYTE) = number of octaves of waveforms in sample. VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding). VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.) ????+00..03 = "BODY", identifies the start of the audio data. BODY+00..03 (ULONG) = number of bytes to follow. BODY+04..NNNNN = Data, signed bytes, from -128 to +127. 0030: 04030201 02030303 04050605 05060605 0040: 06080806 07060505 04020202 01FF0000 0050: 00000000 FF00FFFF FFFEFDFD FDFEFFFF 0060: FDFDFF00 00FFFFFF 00000000 00FFFF00 0070: 00000000 00FF0000 00FFFEFF 00000000 0080: 00010000 000101FF FF0000FE FEFFFFFE 0090: FDFDFEFD FDFFFFFC FDFEFDFD FEFFFEFE 00A0: FFFEFEFE FEFEFEFF FFFFFEFF 00FFFF01 This small section of the audio sample shows the number ranging from -5 (0xFD) to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or "(c) " may be present, and may be in any order. You will have to check the byte count in each chunk to determine how many bytes to skip. ------------------------------------------------------------------------ Playing sound on a PC --------------------- From: Eric A Rasmussen Any turbo PC (8088 at 8 Mhz or greater)/286/386/486/etc. can produce a quality playback of single channel 8 bit sounds on the internal (1 bit, 1 channel) speaker by utilizing Pulse-Width-Modulation, which toggles the speaker faster than it can physically move to simulate positions between fully on and fully off. There are several PD programs of this nature that I know of: REMAC - Plays MAC format sound files. Files on the Macintosh, at least the sound files that I've ripped apart, seem to contain 3 parts. The first two are info like what the file icon looks like and other header type info. The third part contains the raw sample data, and it is this portion of the file which is saved to a seperate file, often named with the .snd extension by PC users. Personally, I like to name the files .s1, .s2, .s3, or .s4 to indicate the sampling rate of the file. (-s# is how to specify the playback rate in REMAC.) REMAC provides playback rates of 5550hz, 7333hz, 11 khz, & 22 khz. REMAC2 - Same as REMAC, but sounds better on higher speed machines. REPLAY - Basically same as REMAC, but for playback of Atari ST sounds. Apparently, the Atari has two sound formats, one of which sounds like garbage if played by REMAC or REPLAY in the incorrect mode. The other file format works fine with REMAC and so appears to be 'normal' unsigned 8-bit data. REPLAY provides playback rates of 11.5 khz, 12.5 khz, 14 khz, 16 khz, 18.5 khz, 22khz, & 27 khz. These three programs are all by the same author, Richard E. Zobell who does not have an internet mail address to my knowledge, but does have a GEnie email address of R.ZOBELL. Additionally, there are various stand-alone demos which use the internal speaker, of which there is one called mushroom which plays a 30 second advertising jingle for magic mushroom room deoderizers which is pretty humerous. I've used this player to playback samples that I ripped out of the commercial game program Mean Streets, which uses something they call RealSound (tm) to playback digital samples on the internal speaker. (Of course, I only do this on my own system, and since I own the game, I see no problems with it.) For owners of 8 Mhz 286's and above, the option to play 4 channel 8 bit sounds (with decent quality) on the internal speaker is also a reality. Quite a number of PD programs exist to do this, including, but not limited to: ModEdit, ModPlay, ScreamTracker, STM, Star Trekker, Tetra, and probably a few more. All these programs basically make use of various sound formats used by the Amiga line of computers. These include .stm files, .mod files [a.k.a. mod. files], and .nst files [really the same hing]. Also, these programs pretty much all have the option to playback the sound to add-on hardware such as the SoundBlaster card, the Covox series of devices, and also to direct the data to either one or two (for stereo) parallel ports, which you could attach your own D/A's to. (From what I have seen, the Covox is basically an small amplified speaker with a D/A which plugs into the parallel port. This sounds very similiar to the Disney Sound System (DSS) which people have been talking about recently.) ------------------------------------------------------------------------ The EA-IFF-85 documentation --------------------------- From: dgc3@midway.uchicago.edu As promised, here's an ftp location for the EA-IFF-85 documentation. It's the November 1988 release as revised by Commodore (the last public release), with specifications for IFF FORMs for graphics, sound, formatted text, and more. IFF FORMS now exist for other media, including structured drawing, and new documentation is now available only from Commodore. The documentation is at grind.isca.uiowa.edu [128.255.19.233], in the directory /amiga/f1/ff185. The complete file list is as follows: DOCUMENTS.zoo EXAMPLES.zoo EXECUTABLE.zoo INCLUDE.zoo LINKER_INFO.zoo OBJECT.zoo SOURCE.zoo TP_IFF_Specs.zoo All files except DOCUMENTS.zoo are Amiga-specific, but may be used as a basis for conversion to other platforms. Well, I take that tentatively back. I don't know what TP_IFF_Specs.zoo contains, so it might be non-Amiga-specific. ------------------------------------------------------------------------ US Federal Standard 1016 availability ------------------------------------- From: jpcampb@afterlife.ncsc.mil (Joe Campbell) The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are available for worldwide distribution (on DOS diskettes, but configured to compile on Sun SPARC stations) from NTIS and DTIC. Example input and processed speech files are included. A Technical Information Bulletin (TIB), "Details to Assist in Implementation of Federal Standard 1016 CELP," and the official standard, "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)," are also available. This is available through the National Technical Information Service: NTIS U.S. Department of Commerce 5285 Port Royal Road Springfield, VA 22161 USA (703) 487-4650 The "AD" ordering number for the CELP software is AD M000 118 (US$ 90.00) and for the TIB it's AD A256 629 (US$ 17.50). The LPC-10 standard, described below, is FIPS Pub 137 (US$ 12.50). There is a $3.00 shipping charge on all U.S. orders. The telephone number for their automated system is 703-487-4650, or 703-487-4600 if you'd prefer to talk with a real person. (U.S. DoD personnel and contractors can receive the package from the Defense Technical Information Center: DTIC, Building 5, Cameron Station, Alexandria, VA 22304-6145. Their telephone number is 703-274-7633.) The following articles describe the Federal-Standard-1016 4.8-kbps CELP coder (it's unnecessary to read more than one): Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155. Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in Advances in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133. Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech Technology Magazine, April/May 1990, p. 58-64. The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps linear prediction coder (LPC-10) was republished as a Federal Information Processing Standards Publication 137 (FIPS Pub 137). It is described in: Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10," Speech Technology Magazine, April 1982, p. 40-49. There is also a section about FS-1015 in the book: Panos E. Papamichalis, Practical Approaches to Speech Coding, Prentice-Hall, 1987. The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in: Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced Classification of Speech with Applications to the U.S. Government LPC-10E Algorithm," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986, p. 473-6. Copies of the official standard "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)" are available for US$ 5.00 each from: GSA Federal Supply Service Bureau Specification Section, Suite 8100 470 E. L'Enfant Place, S.W. Washington, DC 20407 (202)755-0325 Realtime DSP code for FS-1015 and FS-1016 is sold by: John DellaMorte DSP Software Engineering 165 Middlesex Tpk, Suite 206 Bedford, MA 01730 USA 1-617-275-3733 1-617-275-4323 (fax) dspse.bedford@channel1.com DSP Software Engineering's FS-1016 code can run on a DSP Research's Tiger 30 (a PC board with a TMS320C3x and analog interface suited to development work). DSP Research 1095 E. Duane Ave. Sunnyvale, CA 94086 USA (408)773-1042 (408)736-3451 (fax) From: cfreese@super.org (Craig F. Reese) Newsgroups: comp.speech,comp.dsp,comp.compression.research Subject: CELP 3.2a release now available Organization: Supercomputing Research Center (Bowie, MD) Date: Tue, 3 Aug 1993 14:55:25 GMT 3 August 1993 CELP 3.2a Release Dear CELPers, We have placed an updated version of the FS-1016 CELP 3.2 code in the anonymous FTP area on super.org (192.31.192.1). It's in: /pub/celp_3.2a.tar.Z (please be sure to do the ftp in binary mode). This is essentially the PC release that was on fumar, except that we started directly from the PC disks. The value added is that we have made over 69 corrections and fixes. Most of these were necessary because of the 8 character file name limit on DOS, but there are some others, as well. The code (C, FORTRAN, diskio) all has been built and tested on a Sun4 under SunOS4.1.3. If you want to run it somewhere else, then you may have to do a bit of work. (A Solaris 2.x-compatible release is planned soon.) [One note to PCers. The files: [ [ cbsearch.F celp.F csub.F mexcite.F psearch.F [ [are meant to be passed through the C preprocessor (cpp). [We gather that DOS (or whatever it's called) can't distinguish [the .F from a .f. Be careful! Very limited support is available from the authors (Joe, et al.). Please do not send questions or suggestions without first reading the documentation (README files, the Technical Information Bulletin, etc.). The authors would enjoy hearing from you, but they have limited time for support and would like to use it as efficiently as possible. They welcome bug reports, but, again, please read the documentation first. All users of FS-1016 CELP software are strongly encouraged to acquire the latest release (version 3.2a as of this writing). We do not know how long we will be able to leave the software on this site, but it should be _at_least_ through 1 October 1993 (if you find it missing, please drop me (Craig) a note). Please try to get the software during off hours (8 p.m. - 7 a.m. Eastern Standard time) or folks here might complain and we'll have to get rid of the code (if that happens, we'll try to pass it on to someone else, who can put it on the net). We would be more than happy for someone to copy it and make it available elsewhere. Good Luck, Craig F. Reese (cfreese@super.org) IDA/Supercomputing Research Center Joe Campbell (jpcampb@afterlife.ncsc.mil) Department of Defense P.S. Just so you all know, I (Craig) am not actually involved in CELP work. I mainly got with Joe to help make the software available on the Internet. In the course of doing so, I cleaned up much of it, but I am not, by any stretch, a CELP expert and will most likely be unable to answer any technical questions concerning it. ;^) From: tobiasr@monolith.lrmsc.loral.com (Richard Tobias) For U.S. FED-STD-1016 (4800 bps CELP) _realtime_ DSP code and information about products using this code using the AT&T DSP32C and AT&T DSP3210, contact: White Eagle Systems Technology, Inc. 1123 Queensbridge Way San Jose, CA 95120 (408) 997-2706 (408) 997-3584 (fax) rjjt@netcom.com From: Cole Erskine [paraphrased] Analogical Systems has a _real-time_ multirate implementation of U.S. Federal Standard 1016 CELP operating at bit rates of 4800, 7200, and 9600 bps on a single 27MHz Motorola DSP56001. Source and object code is available for a one-time license fee. FREE, _real-time_ demonstration software for the Ariel PC-56D is available for those who already have such a board by contacting Analogical Systems. The demo software allows you to record and playback CELP files to and from the PC's hard disk. Analogical Systems 2916 Ramona Street Palo Alto, CA 94306 Tel: +1 (415) 323-3232 FAX: +1 (415) 323-4222 ------------------------------------------------------------------------ Creative Voice (VOC) file format -------------------------------- From: galt@dsd.es.com (byte numbers are hex!) HEADER (bytes 00-19) Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block] - --------------------------------------------------------------- HEADER: ======= byte # Description ------ ------------------------------------------ 00-12 "Creative Voice File" 13 1A (eof to abort printing of file) 14-15 Offset of first datablock in .voc file (std 1A 00 in Intel Notation) 16-17 Version number (minor,major) (VOC-HDR puts 0A 01) 18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11) - --------------------------------------------------------------- DATA BLOCK: =========== Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes) NOTE: Terminator Block is an exception -- it has only the TYPE byte. TYPE Description Size (3-byte int) Info ---- ----------- ----------------- ----------------------- 00 Terminator (NONE) (NONE) 01 Sound data 2+length of data * 02 Sound continue length of data Voice Data 03 Silence 3 ** 04 Marker 2 Marker# (2 bytes) 05 ASCII length of string null terminated string 06 Repeat 2 Count# (2 bytes) 07 End repeat 0 (NONE) 08 Extended 4 *** *Sound Info Format: **Silence Info Format: --------------------- ---------------------------- 00 Sample Rate 00-01 Length of silence - 1 01 Compression Type 02 Sample Rate 02+ Voice Data ***Extended Info Format: --------------------- 00-01 Time Constant: Mono: 65536 - (256000000/sample_rate) Stereo: 65536 - (25600000/(2*sample_rate)) 02 Pack 03 Mode: 0 = mono 1 = stereo Marker# -- Driver keeps the most recent marker in a status byte Count# -- Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD repetitions or FFFF for endless repetitions Sample Rate -- SR byte = 256-(1000000/sample_rate) Length of silence -- in units of sampling cycle Compression Type -- of voice data 8-bits = 0 4-bits = 1 2.6-bits = 2 2-bits = 3 Multi DAC = 3+(# of channels) [interesting-- this isn't in the developer's manual] ------------------------------------------------------------------------ RIFF WAVE (.WAV) file format ---------------------------- RIFF is a format by Microsoft and IBM which is similar in spirit and functionality as EA-IFF-85, but not compatible (and it's in little-endian byte order, of course :-). WAVE is RIFF's equivalent of AIFF, and its inclusion in Microsoft Windows 3.1 has suddenly made it important to know about. Rob Ryan was kind enough to send me a description of the RIFF format. Unfortunately, it is too big to include here (27 k), but I've made it available for anonymous ftp as ftp.cwi.nl:/pub/audio/RIFF-format. And here's a pointer to the official description from Matt Saettler, Microsoft Multimedia: "The complete definition of the WAVE file format as defined by IBM/Microsoft is available for anon. FTP from ftp.uu.net in the vendor/microsoft/multimedia directory." (Rob Ryan's version may actually be an extract from one of the files stored there.) ------------------------------------------------------------------------ U-LAW and A-LAW definitions --------------------------- [Adapted from information provided by duggan@cc.gatech.edu (Rick Duggan) and davep@zenobia.phys.unsw.EDU.AU (David Perry)] u-LAW (really mu-LAW) is sgn(m) ( |m |) |m | y= ------- ln( 1+ u|--|) |--| =< 1 ln(1+u) ( |mp|) |mp| A-LAW is | A (m ) |m | 1 | ------- (--) |--| =< - | 1+ln A (mp) |mp| A y=| | sgn(m) ( |m |) 1 |m | | ------ ( 1+ ln A|--|) - =< |--| =< 1 | 1+ln A ( |mp|) A |mp| Values of u=100 and 255, A=87.6, mp is the Peak message value, m is the current quantised message value. (The formulae get simpler if you substitute x for m/mp and sgn(x) for sgn(m); then -1 <= x <= 1.) Converting from u-LAW to A-LAW is in a sense "lossy" since there are quantizing errors introduced in the conversion. "..the u-LAW used in North America and Japan, and the A-LAW used in Europe and the rest of the world and international routes.." References: Modern Digital and Analog Communication Systems, B.P.Lathi., 2nd ed. ISBN 0-03-027933-X Transmission Systems for Communications Fifth Edition by Members of the Technical Staff at Bell Telephone Laboratories Bell Telephone Laboratories, Incorporated Copyright 1959, 1964, 1970, 1982 A note on the resolution of U-LAW by Frank Klemm : 8 bit U-LAW has the same lowest magnitude like 12 bit linear and 12 bit U-LAW like 16 linear. Device/Coding Resolution Resolution on maximal level on low level 8 bit linear 8 8 8 bit ulaw 6 12 (used for digital telephone) 12 bit linear 12 12 12 bit ulaw 10 16 (used in DAT/Longplay) 16 bit linear 16 16 estimated for some analoge technique: tape recorder (HiFi DIN) 8 9 (no Problem today) tape recorder (semiprofessional) 10.5 13.5 ------------------------------------------------------------------------ AVR File Format --------------- From: hyc@hanauma.Jpl.Nasa.Gov (Howard Chu) A lot of PD software exists to play Mac .snd files on the ST. One other format that seems pretty popular (used by a number of commercial packages) is the AVR format (from Audio Visual Research). This format has a 128 byte header that looks like this: char magic[4]="2BIT"; char name[8]; /* null-padded sample name */ short mono; /* 0 = mono, 0xffff = stereo */ short rez; /* 8 = 8 bit, 16 = 16 bit */ short sign; /* 0 = unsigned, 0xffff = signed */ short loop; /* 0 = no loop, 0xffff = looping sample */ short midi; /* 0xffff = no MIDI note assigned, 0xffXX = single key note assignment 0xLLHH = key split, low/hi note */ long rate; /* sample frequency in hertz */ long size; /* sample length in bytes or words (see rez) */ long lbeg; /* offset to start of loop in bytes or words. set to zero if unused. */ long lend; /* offset to end of loop in bytes or words. set to sample length if unused. */ short res1; /* Reserved, MIDI keyboard split */ short res2; /* Reserved, sample compression */ short res3; /* Reserved */ char ext[20]; /* Additional filename space, used if (name[7] != 0) */ char user[64]; /* User defined. Typically ASCII message. */ ----------------------------------------------------------------------- The Amiga MOD Format -------------------- From: norlin@mailhost.ecn.uoknor.edu (Norman Lin) MOD files are music files containing 2 parts: (1) a bank of digitized samples (2) sequencing information describing how and when to play the samples MOD files originated on the Amiga, but because of their flexibility and the extremely large number of MOD files available, MOD players are now available for a variety of machines (IBM PC, Mac, Sparc Station, etc.) The samples in a MOD file are raw, 8 bit, signed, headerless, linear digital data. There may be up to 31 distinct samples in a MOD file, each with a length of up to 128K (though most are much smaller; say, 10K - 60K). An older MOD format only allowed for up to 15 samples in a MOD file; you don't see many of these anymore. There is no standard sampling rate for these samples. [But see below.] The sequencing information in a MOD file contains 4 tracks of information describing which, when, for how long, and at what frequency samples should be played. This means that a MOD file can have up to 31 distinct (digitized) instrument sounds, with up to 4 playing simultaneously at any given point. This allows a wide variety of orchestrational possibilities, including use of voice samples or creation of one's own instruments (with appropriate sampling hardware/software). The ability to use one's own samples as instruments is a flexibility that other music files/formats do not share, and is one of the reasons MOD files are so popular, numerous, and diverse. 15 instrument MODs, as noted above, are somewhat older than 31 instrument MODs and are not (at least not by me) seen very often anymore. Their format is identical to that of 31 instrument MODs except: (1) Since there are only 15 samples, the information for the last (15th) sample starts at byte 440 and goes through byte 469. (2) The songlength is at byte 470 (contrast with byte 950 in 31 instrument MOD) (3) Byte 471 appears to be ignored, but has been observed to be 127. (Sorry, this is from observation only) (4) Byte 472 begins the pattern sequence table (contrast with byte 952 in a 31 instrument MOD) (5) Patterns start at byte 600 (contrast with byte 1084 in 31 instrument MOD) "ProTracker," an Amiga MOD file creator/editor, is available for ftp everywhere as pt??.lzh. From: Apollo Wong From: M.J.H.Cox@bradford.ac.uk (Mark Cox) Newsgroups: alt.sb.programmer Subject: Re: Format for MOD files... Message-ID: <1992Mar18.103608.4061@bradford.ac.uk> Date: 18 Mar 92 10:36:08 GMT Organization: University of Bradford, UK wdc50@DUTS.ccc.amdahl.com (Winthrop D Chan) writes: >I'd like to know if anyone has a reference document on the format of the >Amiga Sound/NoiseTracker (MOD) files. The author of Modplay said he was going >to release such a document sometime last year, but he never did. If anyone I found this one, which covers it better than I can explain it - if you use this in conjunction with the documentation that comes with Norman Lin's Modedit program it should pretty much cover it. Mark J Cox /*********************************************************************** Protracker 1.1B Song/Module Format: ----------------------------------- Offset Bytes Description ------ ----- ----------- 0 20 Songname. Remember to put trailing null bytes at the end... Information for sample 1-31: Offset Bytes Description ------ ----- ----------- 20 22 Samplename for sample 1. Pad with null bytes. 42 2 Samplelength for sample 1. Stored as number of words. Multiply by two to get real sample length in bytes. 44 1 Lower four bits are the finetune value, stored as a signed four bit number. The upper four bits are not used, and should be set to zero. Value: Finetune: 0 0 1 +1 2 +2 3 +3 4 +4 5 +5 6 +6 7 +7 8 -8 9 -7 A -6 B -5 C -4 D -3 E -2 F -1 45 1 Volume for sample 1. Range is $00-$40, or 0-64 decimal. 46 2 Repeat point for sample 1. Stored as number of words offset from start of sample. Multiply by two to get offset in bytes. 48 2 Repeat Length for sample 1. Stored as number of words in loop. Multiply by two to get replen in bytes. Information for the next 30 samples starts here. It's just like the info for sample 1. Offset Bytes Description ------ ----- ----------- 50 30 Sample 2... 80 30 Sample 3... . . . 890 30 Sample 30... 920 30 Sample 31... Offset Bytes Description ------ ----- ----------- 950 1 Songlength. Range is 1-128. 951 1 Well... this little byte here is set to 127, so that old trackers will search through all patterns when loading. Noisetracker uses this byte for restart, but we don't. 952 128 Song positions 0-127. Each hold a number from 0-63 that tells the tracker what pattern to play at that position. 1080 4 The four letters "M.K." - This is something Mahoney & Kaktus inserted when they increased the number of samples from 15 to 31. If it's not there, the module/song uses 15 samples or the text has been removed to make the module harder to rip. Startrekker puts "FLT4" or "FLT8" there instead. Offset Bytes Description ------ ----- ----------- 1084 1024 Data for pattern 00. . . . xxxx Number of patterns stored is equal to the highest patternnumber in the song position table (at offset 952-1079). Each note is stored as 4 bytes, and all four notes at each position in the pattern are stored after each other. 00 - chan1 chan2 chan3 chan4 01 - chan1 chan2 chan3 chan4 02 - chan1 chan2 chan3 chan4 etc. Info for each note: _____byte 1_____ byte2_ _____byte 3_____ byte4_ / \ / \ / \ / \ 0000 0000-00000000 0000 0000-00000000 Upper four 12 bits for Lower four Effect command. bits of sam- note period. bits of sam- ple number. ple number. Periodtable for Tuning 0, Normal C-1 to B-1 : 856,808,762,720,678,640,604,570,538,508,480,453 C-2 to B-2 : 428,404,381,360,339,320,302,285,269,254,240,226 C-3 to B-3 : 214,202,190,180,170,160,151,143,135,127,120,113 To determine what note to show, scan through the table until you find the same period as the one stored in byte 1-2. Use the index to look up in a notenames table. This is the data stored in a normal song. A packed song starts with the four letters "PACK", but i don't know how the song is packed: You can get the source code for the cruncher/decruncher from us if you need it, but I don't understand it; I've just ripped it from another tracker... In a module, all the samples are stored right after the patterndata. To determine where a sample starts and stops, you use the sampleinfo structures in the beginning of the file (from offset 20). Take a look at the mt_init routine in the playroutine, and you'll see just how it is done. Lars "ZAP" Hamre/Amiga Freelancers ***********************************************************************/ -- Mark J Cox ----- Bradford, UK --- PS: A file with even *much* more info on MOD files, compiled by Lars Hamre, is available from ftp.cwi.nl:/pub/audio/MOD-info. Enjoy! FTP sites for MODs and MOD players ---------------------------------- Subject: MODS AND PLAYERS!! **READ** info/where to get them From: cjohnson@tartarus.uwa.edu.au (Christopher Johnson) Newsgroups: alt.binaries.sounds.d Message-ID: <1h32ivINNglu@uniwa.uwa.edu.au> Date: 21 Dec 92 00:19:43 GMT Organization: The University of Western Australia Hello world, For all those asking, here is where to get those mod players and mods. SNAKE.MCS.KENT.EDU is the best site for general stuff. look in /pub/SB-Adlib Simtel-20 or archie.au(simtel mirror) in for windows players ftp.cica.indiana.edu in pub/pc/win3/sound here is a short list of players mp or modplay BEST OVERALL mp219b.zip simtel and snake wowii best for vga/fast machines wowii12b.zip simtel and snake trakblaster best for compatability trak-something simtel and snake two versions, old one for slow machines ss cute display(hifi) have_sex.arj found on local BBS (western Australia White Ghost) superpro player generally good ssp.zip or similar found on night owl 7 CD player? cute display(hifi) player.zip or similar found on night owl 7 CD WINDOWS Winmod pro does protracker wmp????.zip cica winmod more stable winmod12.zip or similar cica Hope this helps, e-mail me if you find any more players and I will add them in for the next time mod player requests get a little out of hand. for mods ftp to wuarchive.wustl.edu and go to the amiga music directory (pub/amiga/music/ntsb ?????) that should do you for a while see you soon Chris. -----------------------------------------------------------------------