[Slightly edited from a mail I received --Guido]

From: Stephen Travis Pope
To: Guido van Rossum
Subject: FAQ: Audio File Formats--The High End

Dear Mr. van Rossum,

Hoe gaat't? (I worked in Amsterdam for a while and hou van Holland.)

INTRODUCTION

I recently came across InternetTalkRadio while working at the Swedish
Institute for Computer Science, and read with great interest your
document on audio file formats. I find this a very valuable service to
the community and have one question and one contribution. Maybe I
should make the contribution first.

I have been involved in computer music and DSP since the early 1970's,
and have used more sound file formats than I care to remember (well,
actually I can't remember several of them). While your document treats
in detail the requirements of, and formats used in, telecommunications
and personal computer-based musical applications, I think it would
profit from more detail about the high-end formats and sound file
systems used in multi-channel computer music production. I will attempt
to provide you with the information I'm aware of below, with the
assumption that you may edit it according to your needs if you choose
to include or mention it in future editions of your FAQ.

HISTORY

In your list of "self-describing file formats" you mention the "IRCAM"
sound file system. This software has now been superseded by the
so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software
release. I include the standard document describing BICSF as an
Appendix to this letter. More recently, there has been an effort at
Princeton (Prof. Paul Lansky) and Stanford (myself) to standardize
several extensions to BICSF, which I'll outline below.

During the late 1970's and early 1980's, several sites developed
UNIX-based sound file systems for use in computer music. These early
systems generally included real changes to the UNIX file system, so
that separate disks or disk partitions were used for sound storage.
(Many still feel this is a good idea.)

The "root" of most of this work is the "csound" file system (first
released around 1980) (not to be confused with the MIT programming
language of the same name--which it predates), developed by D. Gareth
Loy at the Computer Audio Research Lab (CARL) at UC San Diego. It is a
real-time, high-throughput sound file system that ran on DEC VAX and
PDP-11 computers before the advent of the Berkeley file system. Csound
is part of the CARL music software distribution. This package also
includes the cmusic language (a simple C-based Music V descendent
written by F. Richard Moore), and many other tools such as vocoders and
configurable reverberators. The CARL software distribution is still
available for a small license fee, and now runs on Sun, NeXT, SGI, and
various other UNIX hardware. The CARL software is documented profusely
in Dick Moore's book "The Elements of Computer Music" (see references).

Robert Gross (then at UCB, now at Sun), based his cylinder-contiguous
sound file system (CCSS) on this. Robert took it with him when he moved
to Paris to work at IRCAM in the early 1980's, and they extended it
there. Some time in the later 1980's the several strains of csound
spin-offs were merged into BICSF, which is still used in computer music
circles and offers several advantages over simpler systems such as the
NeXT/SPARC or even lower forms of sound file life.

In an effort to offer interoperability between the BICSF and NeXT/SPARC
systems, Paul Lansky at Princeton (the author of "cmix" tool kit, the
best thing since velcro if you ask me), altered the BICSF header so
that the first 28 bytes "just happen" to be identical to the NeXT/SPARC
header. The "dataLocation" offset is set to 1024 (or a multiple
thereof) to allow a large header. What comes between the "standard"
28-byte header and the sound may then include the additional
information described below. I have further extended this to allow
more detailed annotation of sound files. I needed this because I
realize computer music compositions that typically involve several
thousand source files amounting to several gigabytes, and required very
flexible and scalable tools. I interface to these formats with both C-
and Smalltalk-language programs (most of which are in the public
domain).  I will refer to this extended BICSF format as "BICSF" below.

THE BICSF SOUND FILE SYSTEM

Three topics are of interest in the BICSF system: the sound file header
structure, the sound file storage system, and the utilities the system
provides for sound file manipulation. I will discuss each of these
in turn.

All but the most Neanderthal sound file formats include some sort of
file header describing the sample rate, sample format, and other
relevant data. The flexibility of this data structure can have a large
influence on the power of the tools that one can build to manipulate
sound files. Modern multimedia and high-quality audio applications
really demand an easily extensible, scalable sound file format.

Going beyond the basic fields of a typical sound file header (e.g., the
NeXT/SPARC structure described in Appendix 2 of your FAQ), at least
three types of information should be stored with sound files: (1) an
ASCII text comment describing the sound file's contents; (2) the
maximum amplitude per channel (with the frame index where it appears);
and (3) a collection of named cue points in the sound file. Other
useful information that might be included in the header are the pitch
(scalar or vector), a transcription of the spoken text of a sound, the
envelope (an array of integer or floating-point values), etc.  Further,
processing-method-specific, features such as the names of compression
algorithms, noise gate thresholds, or other file names (for the case of
a "virtual" sound file, described next), are also found.

As an example, below is a print out of an extended BICSF sound file
header taken from the Smalltalk-based MODE tool kit (see references).
The lines in this dump correspond to the fields of the C-data structure
or the instance variables of a Smalltalk class description. Note that
strings are enclosed in single-quotes, and that hash-marks (#)
intriduce symbolic names in Smalltalk.

 name:      'snd/AllGatesAreOpen/Michi_1/slower_c/4a.snd'
 rate:      44100.0
 channels:  1
 format:    #linear16Bit
 duration:  1.42367 sec
 maxAmp:    Dictionary (#'1'->10700->23345)
 size:      126592 bytes
 modified:  93 Apr 25   5:05:22 pm
 text:      'droem och vaka' 
 comment:   'Transposed down about a minor third and slowed down by 35%'
 cueList:   Dictionary (#droem->(271 to: 29740), 
                  #och->(31815 to: 41035), 
                  #vaka->(41036 to: 62755))
 script:    'pv 44100 1024 8192 128 173 0.82 0 0 -i'
 parent:    'snd/AllGatesAreOpen/Michi_1/src/4a.snd'
 envelope:  (an array of 1024 integers)

The maximum amplitude field(s), which are printed above as Smalltalk
dictionaries, store the channel number, the sample frame at which the
value occurred, and the maximum sample's value, i.e., the file above
has one channel whose max. is 23345 at sample frame 10700). The cue
fields have symbolic names, and their values are sample intervals,
i.e., the word "och" ("and" in Swedish) begins at sample frame 31815
and ends at 41035. It is possible to have a sound file that has no
samples of its own, but only cue points into another sound file, a
"virtual sound file." The virtual sound file can include either a file
name and sample range, or a file name and a cue name.

[Implementation detail for C hackers] These additional fields can come
in any order and number and have variable lengths, so they are stored
in the header with a key (an integer that is #defined somewhere), a
length, and the data they hold onto.

In the csound and CCSS systems, the header also included disk cylinder
pointers, so that it could be stored separately from the sample data,
such as on a normal UNIX file system. More recent implementations have
the header followed immediately by the contiguous sample data, though
this has both advantages and disadvantages. A non-contiguous,
chunk-oriented format might be more flexible. There is still a debate
in the computer music and audio DSP community as to whether this is
necessary or desirable. On the one hand, the Berkeley file system and
its descendents can support partitions with large block sizes, thereby
enabling the high throughput required for real-time performance of
(e.g.,) quadrophonic 16-bit files ar 48 kHz (a frequently-used format).
On the other hand, as mentioned in the BICSF document below, "There are
several reasons to segregate soundfiles from regular UNIX files. [...]
You do not want realtime sound I/O to be in competition with
timesharing I/O. Expect an increase of up to 50% for having a separate
disk and controller for sound."

There are several interesting other features of extended-BICSF headers,
but this introduction should serve to heighten readers' awareness of
what is possible, and hopefully motivate the development of such
facilities based on other popular formats such as AIFF.

The utilities that are part of BICSF mirror the UNIX file manipulation
shell commands, but generally have "sf" appended to their names. The
user has a "current sound file directory" that is distinct from his or
her UNIX current working directory. In modern versions of BICSF, where
sound files are often stored as regular UNIX files, many of these (such
as "cpsf" or "rmsf"), are not needed. Others, such as "lsf," "fromsnd,"
and "tosnd" (previously called "sndin" and "sndout"), are still
generally needed, and are often given hideous and unnecessarily unclear
names such as "sndinfo." Several utilities exist that accept a variety
of sound file formats, such as the SGI Indigo machine's sound tools
that can process either AIFF or NeXT/SPARC files. (Perhaps we should
build "SOX" into our play programs so we don't have to use it
explicitly.)

AVAILABILITY

For more information on getting the CARL software distribution, contact
the center's director, F. Richard Moore (frm@ucsd.edu) or Susan Fichera
(sfl@sdcarl.ucsd.edu).

Paul Lansky's cmix tools are available via ftp from the directory
pub/music on the server princeton.edu.

The MODE Smalltalk tools are available via ftp from the directory
pub/st80 on the server ccrma-ftp.stanford.edu.

REFERENCES

Anyone performing sound I/O on a time-sharing system (like UNIX) should
be referred to Susan Fichera's excellent discussion of the issues
involved in real-time I/O in these real-time-hostile environments. Her
article is: "Machine Tongues XIII: Real-Time Audio Conversion under a
Time-Sharing Operating System" and appeared in "Computer Music Journal"
15(3):27-40 (Fall, 1991).

F. Richard (Dick) Moore's "The Elements of Computer Music" is highly
recommended as a general introduction to CM and digital audio signal
processing. It teaches his cmusic sound compiler language. It appeared
in 1990 from Prentice-Hall books.

My own MODE (Musical Object Development Environment) was described in
detail in the article "The Interim DynaPiano: An Integrated Computer
Tool and Instrument for Composers" in "Computer Music Journal"
16(3):73-91 (Fall, 1992).

A good introduction to software sound synthesis that also addresses
sound file management issues is "Machine Tongues XV: Three Packages for
Software Sound Synthesis" by yours truly in "Computer Music Journal"
17(2):23-54 (Summer, 1993). This article also introduces and compares
cmusic, csound (the language), and cmix.

====================================================================
====================================================================

Stephen Travis Pope
stp@ccrma.stanford.edu (in Palo Alto), stp@sics.se (in Stockholm)

==============================================================
==============================================================

APPENDIX: BICSF Description 
(written by ? around 1988, included here unedited) (available by ftp
from the file pub/st80/mode/doc/BICSF.t on ccrma-ftp.stanford.edu)


         BICSF Berkeley/IRCAM/CARL Sound Filesystem

                          ABSTRACT

          BICSF is a collection of programs which
     implement a filesystem for digital audio applica-
     tions running under Berkeley UNIX.  This document
     gives an overview and describes the installation
     procedure.

CREDITS

Contributors to this suite of programs are numerous, but the
main outlines of the system are due to the work of
+    Marshall Kirk  McKusick,  William  N.  Joy,  Samuel  J.
     Leffler, Robert S. Fabry for the creation of the Berke-
     ley Fast Filesystem,
+    Gareth Loy at CARL for the prototype CARL csound(1carl)
     filesystem,
+    Rob Gross and Dan Timis at IRCAM for  the  IRCAM  sound
     filesystem,
+    Brad Garton at Columbia  for  the  Digisound-16  device
     driver and associated play and record programs.

The soundfile system code here is largely that of Rob  Gross
and  Dan  Timis  of  the IRCAM group.  Author ascription has
been appended to the manual pages where known.

The device drivers were written by:
+    DSC-200: Rusty Wright at CARL,
+    Digisound 16: Brad Garton at Columbia Princeton,
+    Dyaxis: Susan Fichera at CARL.

THe Digisound 16 driver was updated for SUNOS4.0 by Susan Fichera.
The integration of these various sources  into  one  package
was done by Gareth Loy and Abe Singer at CARL and CMIL.

LIST OF PROGRAMS AND ALIASES

Following is a list  of  programs  and  aliases,  and  brief
descriptions:
      ALIASES USING STANDARD UNIX COMMANDS
        catsf   - concatenate soundfiles
        chgrpsf - change soundfile group ownership
        chmodsf - change soundfile mode
        chownsf - change soundfile ownership
        cpsf    - copy soundfile
        mkdirsf - make soundfile directory
        mvsf    - move a soundfile
        pwdsf   - print working soundfile directory
        rmdirsf - remove (empty) soundfile directory
        rmsf    - remove soundfile (or directory tree)

      BACKWARD COMPATABILITY
        sndin   - read from soundfile
        sndout  - write to soundfile

      SPECIAL PROGRAMS
        createsf        - prepare soundfile for recording
        fromsf  - read from soundfile
        gainsf  - normalize or adjust gain of soundfile
        lsf     - list sound files
        normsf  - normalize amplitude of soundfile
        pansf   - pan sound file
        peaksf  - compute peak amplitude and record in soundfile header
        querysf - print out contents of header
        restorsf - restore soundfile from csound dumpsf tape
        retrosf - retrograde a soundfile
        scalesf - gain scale a soundfile
        setsf   - set or modify soundfile header parameters
        sndawk  - signal modification language similar to awk for soundfiles
        swabsf  - swap bytes of samples in soundfile
        tarsf   - tape archive of soundfiles
        tosf    - write to soundfile
        transpsf        - transpose pitch of soundfile
        xdr     - convert soundfile to Sun external data representation

      PLAYBACK/RECORD/MONITOR PROGRAMS
        monitor - monitor digital output of ADCs
        play    - play soundfile
        record  - record soundfile

NAMES OF PROGRAMS

In the interests of name coherency, some programs have  been
renamed  from  their  original  forms  at  CARL,  IRCAM, and
Columbia-Princeton.
      PROGRAMS:
        ORIGINAL            RENAMED
        sfcreate            createsf
        sndcat              catsf
        sndgain             gainsf
        sndin               fromsf
        sndinfo             querysf
        sndnorm             normsf
        sndout              tosf
        sndpan              pansf
        sndpeak             peaksf
        sndreverse          retrosf
        sndscale            scalesf
        sndset              setsf
        sndtransp           transpsf

      PLAY, RECORD, ETC:
        DigiSound-16:       ai{play,record,monitor,reset}
        Dyaxis:             dy{play,record}
        DSC-200:            ds{play,record}

Aliases have been created for all the  original  names,  and
are   listed   along   with  the  rest  of  the  aliases  in
./bicsf/std.sfaliases.m4.

ORGANIZATION OF SOFTWARE

Software is divided into three groups:
+    device drivers, found in subdirectory ../sys,
+    applications programs which depend upon  type  of  con-
     verters,  found  in subdirectories ./{ds,ai,dy}play and
     ./{ds,ai,dy}record,
+    soundfile manipulation and signal  processing  programs
     (found in the rest of the directories).

BRIEF THEORY OF OPERATION

Using BICSF, one  is  presented  with  two  current  working
directories:  one's  regular  UNIX current working directory
(cwd), plus the BICSF cwd, which is initialized to point  to
one's  home  soundfile directory.  Soundfiles are ordinarily
partitioned on a separate disk from other  files.   However,
the  BICSF  soundfile  directory  is  really a standard UNIX
filesystem at bottom.  Having soundfiles on  separate  disks
from regular UNIX disks avoids competition for head movement
with regular UNIX processes.  It  is  also  advisable  where
possible  to  have  a separate disk controller for soundfile
disks to improve throughput for high sampling rates.

There are several reasons to segregate soundfiles from regu-
lar UNIX files.
+    Conventional wisdom is that the block/fragment size  of
     the soundfile partitions should be set to their maximum
     (currently 8K blocks and 8K fragments).  This is desir-
     able  for  maximum  disk  throughput.   The  bigger the
     blocks, the more efficient the disk I/O  can  be.   But
     UNIX files tend to favor smaller granularization, since
     there tend to be more of them,  and  they  tend  to  be
     small.   It  is more common to have UNIX partitions set
     to 4k/512 to allow more effective filling of the  disk.
     Thus, the two types of files demand different treatment
     to optimize space (for UNIX files) and speed (for BICSF
     files).
+    System  administration:  soundfiles  are  BIG.   It  is
     better to have them separate from regular UNIX files so
     you don't have to do huge system dumps of  user's  home
     directory  trees.   In  fact,  at  CARL, we do not dump
     soundfile systems, but leave this to the users to do as
     they see fit.
+    Speed of throughput: you do not want realtime sound I/O
     to  be  in competition with timesharing I/O.  Expect an
     increase of up to 50% for having a  separate  disk  and
     controller for sound.

The idea of simultaneous working directories  for  UNIX  and
BICSF  filesystems  overcomes  the problem of having to name
long absolute pathnames to get to  one's  soundfiles.   This
implementation (developed by Robert Gross) consists of a set
of aliases listed in ./bicsf/std.sfaliases.m4.  An  environ-
ment  variable  SFDIR contains the current working soundfile
directory.  The UNIX command has a  BICSF  counterpart  with
the following definition:
        alias     pwdsf     '(cd $SFDIR; /bin/pwd \!*)'

Likewise, the UNIX command has this counterpart:
        alias     catsf     '(cd $SFDIR; /cmil/bin/catsf \!*)'

cdsf, the BICSF equivalent of sets the SFDIR variable  (it's
definition  repays  careful  study). All BICSF programs must
have such an alias as shown above.

ADJUSTING FOR LOCAL CONDITIONS

You should  inspect  the  aliases  in  std.sfaliases.m4  and
std.cshrc.m4  to  make  sure  they agree with local require-
ments.  In particular, check the play, record,  and  monitor
aliases  in  std.sfaliases.m4,  and  set them to execute the
play/record programs for the converters you are using.  Also
check values of BINSF, ROOT_SFDIR, HOME_SFDIR, and SFDIR for
local conditions.

When the system  is  installed,  these  two  files  are  run
through  the UNIX macro preprocessor to resolve the location
of the programs the aliases refer to.   m4  macros  defining
standard pathnames for executables, manual pages, libraries,
alias files, sources,  etc.  must  be  listed  in  the  file
config.m4,  usually  located in /usr/include/carl/config.m4.
See config.m4(1carl) for details.

SOURCES

Sources may be placed in one  of  several  places  depending
upon   local   conventions.    At   CARL,   this   path   is
/carl/src/carl/src/bicsf.  Elsewhere, a good place to put it
(or find it) is /`hostname`/src/import/carl/src/bicsf, where
`hostname` is the name of your machine.

     The  applications  programs  depend  upon  a   library:
libbicsf.a.   After  creation, this library may be in one of
several places, depending upon local conventions.  At  CARL,
this  path is /carl/lib/libbicsf.a.  Elsewhere, a good place
to put it (or find it) is /`hostname`/lib/libbicsf.a.

     It can also be put in /usr/local/lib/libbicsf.a, but as
this  area  is usually wiped out across upgrades of UNIX, it
is preferable to make a  symbolic  link,  /usr/local/lib  ->
/`hostname`/lib.   In  this  way, the loader, can still find
local libraries, allowing the loader's -l flag convention:
     % cc file.c -lbicsf
to succeed.  Otherwise, a full path to  the  file  could  be
given:
     % cc file.c /`hostname`/lib/libbicsf.a

Include files in the source code all make generic references
to  include files.  The Makefiles in each directory are made
from their Makefile.m4 prototypes in each source  directory,
and  compile  the  programs to look in the correct locations
for include files.  These are  almost  universally  relative
paths   to   the  directory  ./include  (except  for  device
drivers).

HARDWARE INSTALLATION

Besides the installation of your converters, it is important
to block out appropriate partitions for BICSF soundfile par-
titions, and give  them  the  proper  block/fragment  sizes.
Conventional  wisdom  is  that you want to set them to 8K/8K
block/fragment size.   The larger the  block/fragment  size,
the  more efficient the disk can be in reading/writing data.
If possible, you do want sound on a separate physical  disk,
not  sharing  any  other  UNIX function, including swapping,
etc.  It's also useful if sound disks are on  separate  con-
trollers.   CARL  benchmarks are that a Digisound-16 can run
48,000Hz stereo from a Fujitsu Eagle with a single  Xylogics
450  controller on a Sun-3 with a little spare bandwidth.  A
second controller helps a lot.  There are some files in  the
device  driver  directories  for  the  ai  driver  (for  the
Digisound-16) which  suggest  further  performance  enhance-
ments.

DEVICE DRIVER INSTALLATION

Refer to the appropriate subdirectory in ../sys for the type
of  converter  you  have  and follow the directions you find
there.

SOFTWARE INSTALLATION

The code is installed using standard CARL  Software  conven-
tions.   If this code is being installed as part of the CARL
Software  Distribution,  the  process   should   be   mostly
automatic,  save for the installation of the device drivers.
Refer to the instructions for the Distribution, but all that
need be done is to first say
                  make
then              make install
and finally       make clean

To install standalone, proceed as follows:

First, you need a copy of libcarl.a, from the CARL  software
distribution  to  compile  some  routines,  so  don't bother
unless you have one elsewhere, or are  willing  to  do  wri-
tearounds  (which wouldn't be too difficult) for the missing
routines.

Edit the file ./include/config.m4,  which  contains  default
and built-in pathnames for programs.  For standalone instal-
lation, the most important are  m4SNDFILESYSTEM,  m4INCLUDE,
m4DESTDIR, and m4MANDIR.

Then execute the file ./Makefirst as follows:
                  % make -f Makefirst

This creates the subdirectory  /usr/include/carl,  and  puts
the  file ./include/config.m4 in it.  It is strongly advised
that this subdirectory be used.  If you want to put it some-
where  else,  you  must  edit  all Makefile.m4 files in this
directory tree to point to the new  directory,  plus  change
any    C    program    files    that   make   reference   to
/usr/include/carl.   There  is  a  script  to   change   the
makefiles  called  ./misc/fixmakefiles  that  you can use to
expedite this process, if necessary.
Next, say         % make

which does the following steps:
+    remakes all Makefiles with correct paths,
+    installs the  remaining include files in
     /usr/include/carl,
+    builds the library
+    compiles application programs.

Next say          % make install

which  will  install  binaries,  manual  pages,  and  system
aliases.

Lastly, say       % make clean

to remove executables and .o files.
To run off documentation, say

                  $ make roffall

SYSTEM ALIASES

The  contents  of  ./bicsf/std.sfaliases  must  somehow   be
sourced  by  all users when they log in.  Furthermore, it is
useful to have users refer to a  master  copy,  so  that  as
BICSF  programs  come  and  go,  a single file only needs be
changed.  At CMIL, for instance, this is done as follows.

All users have a standard .cshrc file in their  home  direc-
tories which contains the following line:
        source /`hostname`/lib/std.cshrc
where `hostname` is either the name of the machine, or  some
other  well-known  local  path.   The file std.cshrc in turn
sources  /`hostname`/lib/std.sfaliases,  which   initializes
shell variables and establishes the system aliases for BICSF
commands.

There is a prototype .cshrc file, ./bicsf/dotcshrc, which is
provided  for convenience.  These should be the basis of the
.cshrc files all users have.  At CARL, we  have  an  adduser
shell script which installs new users.  Part of it's task is
to copy dotcshrc to ~newuser/.cshrc.

============================ E N D ===========================
