____ ___________        1  File Compression


                                   **********
                                   * lzcomp *
                                   **********


        NAME:   lzcomp -- File Compression

        SYNOPSIS:

                lzcomp [-options] [infile [outfile]]

        DESCRIPTION:

                lzcomp  implements  the  Lempel-Ziv   file   compression
                algorithm.  (Files compressed by lzcomp are uncompressed
                by lzdcmp.) It operates by finding common substrings and
                replaces  them  with  a  variable-size  code.   This  is
                deterministic, and can be done with a single  pass  over
                the  file.   Thus,  the decompression procedure needs no
                input table, but can track the way the table was built.

                Options may be given in either case.

                -B      Input file  is  "binary",  not  "human  readable
                        text".   This  is  necessary  on  Dec  operating
                        systems, such as VMS  and  RSX-11M,  that  treat
                        these  files  differently.   (Note  that  binary
                        support is rudamentary and probably insufficient
                        as  yet.)  (On  VMS  version  4, this is ignored
                        unless the -x option is specified or  the  input
                        file is record-oriented.)

                -M bits Write using the specified number of bits in  the
                        code  -- necessary for big machines making files
                        for   little   machines.    For   example,    if
                        compressing a file on VMS which is to be read on
                        a PDP-11, you should select -M 12.

                -V [n]  Verbose if specified.  If a value is  specified,
                        it will enable debugging code (if compiled in).

                -X [n]  "Export" -- write a file format that can be read
                        by  other  operating systems.  Only the bytes in
                        the file are copied;  file  attributes  are  not
                        preserved.   If  specified, the value determines
                        the level of compatiblity.  If not specified, or
                        specified  with  an  explicit value of zero, and
                        lzcomp is running on  Vax/VMS  version  4  under
                        VaxC  and  the  input  file is a disk or magtape
                                                                          Page 2
        lzcomp  File Compression


                        file  (block-oriented),  a  VMS-private   output
                        format  is  used  which is incompatible with the
                        Unix compress utility, but which  preserves  VMS
                        file  attributes.   -X may take on the following
                        values:

                         0  Choose VMS private format.  See restrictions
                            below.
                         1  Compatible with Unix compress  version  3.0:
                            this is the default if -x is given without a
                            value.
                         2  As above, but supress "block compression"
                         3  Supress block compression and do not  output
                            a   compress  header  block.   This  is  for
                            compatiblity with a quite early  version  of
                            Unix       compress       (and      requires
                            conditional-compilation to use).

                        Note that the  -B  (binary)  option  is  ignored
                        unless the input file is "record-oriented", such
                        as a terminal or mailbox.

                The  other  two  arguments  are  the  input  and  output
                filenames   respectively.    Redirection  is  supported,
                however, the output must be a disk/tape file.

                The file format is almost identical to the current  Unix
                implementation  of  compress  (V4.0).   Files written by
                Unix compress  should  be  readable  by  lzdcmp.   Files
                written by lzcomp in export (-x) format will be readable
                by Unix compress (except that lzcomp outputs two "clear"
                codes  to  mark  EOF.   A  patch  to  Unix  compress  is
                available.)

        VMS RESTRICTIONS:

                VMS Private mode stores the true name and attributes  of
                the  input  file  into  the  compressed  file and lzdcmp
                restores the attributes  (and  filename  if  requested).
                The  following  restrictions apply -- they may be lifted
                in the future as they are primarily due to the  author's
                lack of understanding of the intricacies of of VMS I/O:

                    All files must be stored on disk.
                    The lzcomp output file must be specified directly.

                Also, for all usage on VMS, the compressed file must  be
                written to, and read from disk.

        LZW COMPRESSION ALGORITHM:

                This section is abstracted from  Terry  Welch's  article
                referenced   below.    The  algorithm  builds  a  string
                translation table that maps substrings in the input into
                                                                          Page 3
        lzcomp  File Compression


                fixed-length  codes.   The  compress  algorithm  may  be
                described as follows:

                  1. Initialize table to contain single-character
                     strings.
                  2. Read the first character.  Set <w> (the prefix
                     string) to that character.
                  3. (step): Read next input character, K.
                  4. If at end of file, output code(<w>); exit.
                  5. If <w>K is in the string table:
                        Set <w> to <w>K; goto step 3.
                  6. Else <w>K is not in the string table.
                        Output code(<w>);
                        Put <w>K into the string table;
                        Set <w> to K; Goto step 3.

                "At each execution of the basic step an acceptable input
                string <w> has been parsed off.  The next character K is
                read and the extended string <w>K is tested to see if it
                exists  in  the  string table.  If it is there, then the
                extended string becomes the parsed string  <w>  and  the
                step  is  repeated.  If <w>K is not in the string table,
                then it is entered, the code for the successfully parsed
                string <w> is put out as comprssed data, the character K
                becomes the beginning of the next string, and  the  step
                is repeated."

                The decompression  algorithm  translates  each  received
                code   into  a  prefix  string  and  extension  [suffix]
                character.  The extension  character  is  stored  (in  a
                push-down stack), and the prefix translated again, until
                the  prefix  is  a  single  character,  which  completes
                decompression  of  this  code.   The entire code is then
                output by popping the stack.

                "An update to the string table is  made  for  each  code
                received  (except  the first one).  When a code has been
                translated, its final character is used as the extension
                character,  combined with the prior string, to add a new
                string to the string table.  This new string is assigned
                a  unique  code  value,  which is the same code that the
                compressor assigned to that string.  In  this  way,  the
                decompressor  incrementally reconstructs the same string
                table that the decompressor used....  Unfortunately  ...
                [the algorithm] does not work for an abnormal case.

                The abnormal case occurs  whenever  an  input  character
                string  contains  the  sequence  K<w>K<w>K,  where  K<w>
                already appears in the compressor string table."

                The decompression algorithm,  augmented  to  handle  the
                abnormal case, is as follows:

                  1. Read first input code;
                                                                          Page 4
        lzcomp  File Compression


                     Store in CODE and OLDcode;
                     With CODE = code(K), output(K);  FINchar = K;
                  2. Read next code to CODE; INcode = CODE;
                     If at end of file, exit;
                  3. If CODE not in string table (special case) then
                        Output(FINchar);
                        CODE = OLDcode;
                        INcode = code(OLDcode, FINchar);

                  4. If CODE == code(<w>K) then
                        Push K onto the stack;
                        CODE == code(<w>);
                        Goto 4.

                  5. If CODE == code(K) then
                        Output K;
                        FINchar = K;

                  6. While stack not empty
                        Output top of stack;
                        Pop stack;

                  7. Put OLDcode,K into the string table.
                     OLDcode = INcode;
                     Goto 2.

                The  algorithm  as  implemented  here   introduces   two
                additional complications.

                The actual codes are transmitted using a variable-length
                encoding.  The lowest-level routines increase the number
                of bits in the code when the largest  possible  code  is
                transmitted.

                Periodically, the algorithm checks that  compression  is
                still increasing.  If the ratio of input bytes to output
                bytes decreases, the entire process is reset.  This  can
                happen if the characteristics of the input file change.

        VMS PRIVATE FILE STRUCTURE:

                In VMS Private mode, the compressed data file contains a
                variable-length  (but  compressed)  file header with the
                file "attributes" needed  by  the  operating  system  to
                construct  the  file.   This  allows  the  decompression
                program to recreate the file  in  its  original  format,
                which is essential if ISAM databases are compressed.

                The overall file format is as follows:

                LZ_SOH  "start of  header"  signal  (this  value  cannot
                        appear in user data).

                        A  variable-length  data  record  (maximum   256
                                                                          Page 5
        lzcomp  File Compression


                        bytes)  containing  the header name, followed by
                        whitespace,    followed    by    header-specific
                        information.  In this case, the name record will
                        contain the string "vms$attributes" followed  by
                        the number of bytes in the attribute data block.
                        (I assume that the name record will consist of a
                        facility  name,  such  as  "vms",  followed by a
                        dollar  sign,  followed  by  a   facility-unique
                        word.)

                LZ_EOR  Signals "end of record".

                        This is followed by a VMS file attributes record
                        (generated by a VMS system library
                              routine).

                LZ_ETX  Signals "end of segment".

                ST_STX  Signals "start of text"  (i.e.,  start  of  data
                        file).

                        This is followed by the user data file.

                LZ_ETX  Signals "end of segment"

                LZ_ETX  Two in a row signals "end of file".

                Note that this format can easily be extended to  include
                trailer  records (with file counts and checksums) and/or
                multiple data files in one compressed file.

                Note also that the LZ_CLEAR code may appear  in  headers
                or  data  files  to  cause  the decompression program to
                "readapt" to the  characteristics  of  the  input  data.
                LZ_STX  and  LZ_SOH  reset  the  compression  algorithm.
                LZ_EOR does not.

        AUTHORS:

                The algorithm is from "A Technique for High  Performance
                Data  Compression."  Terry A.  Welch.  IEEE Computer Vol
                17, No.  6 (June 1984), pp 8-19.

                This revision is by Martin Minow.

                Unix Compress authors are as follows:

                Spencer W. Thomas
                       (decvax!harpo!utah-cs!utah-gr!thomas)
                Jim McKie               (decvax!mcvax!jim)
                Steve Davies            (decvax!vax135!petsd!peora!srd)
                Ken Turkowski           (decvax!decwrl!turtlevax!ken)
                James A. Woods          (decvax!ihnp4!ames!jaw)
                Joe Orost               (decvax!vax135!petsd!joe)


           ____ _____________        1  File Decompression


                                   **********
                                   * lzdcmp *
                                   **********


        NAME:   lzdcmp -- File Decompression

        SYNOPSIS:

                lzdcmp [-options] [infile [outfile]]

        DESCRIPTION:

                lzdcmp decompresses files  compressed  by  lzcomp.   The
                documentation   for  lzcomp  describes  the  process  in
                greater detail.

                Options may be given in either case.

                -B      Output file is "binary", not text.  (Ignored  in
                        VMS private mode.)

                -X 3    To read files compressed by an old Unix  version
                        that doesn't generate header records.

                -V val  Verbose (print  status  messages  and  debugging
                        information).   The  value selects the amount of
                        verbosity.

                AUTHOR:

                        This version by Martin Minow.   See  lzcomp  for
                        more details.