% FILE VMS_PROFILE.WEB % % This program by D. E. Knuth is not copyrighted and can be used freely. % Version 1.0 was completed October 31, 1983. % Version 1.1 added static weight totals (November, 1983). % Provisional VMS version 0.0 by RMD, 1989 % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\indent\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like PASCAL \def\PASCAL{{\mc PASCAL}} \def\PA{{\mc PASCAL}} \def\TA{\.{TANGLE}} \def\WB{\.{WEB}} \def\NP{\.{VMS\_PROFILE}} \def\PR{\.{PROFILE}} \def\v{\.{\char'174}} % vertical (|) in typewriter font \def\({} % kludge for alphabetizing certain module names \def\title{VMS\_PROFILE} \def\topofcontents{\null \vfill \centerline{\ttitlefont VMS\_PROFILE} \vskip 15pt \centerline{(Version 0, 1989)} \vfill} \pageno=\contentspagenumber \advance\pageno by 1 % These macros for verbatim scanning are copied from MANMAC.TEX. But we cant % use the vertical bar for a temporary escape character as WEAVE catches it. % So we will use ! instead and hope for the best \chardef\other=12 \def\ttverbatim{\begingroup \catcode`\\=\other \catcode`\{=\other \catcode`\}=\other \catcode`\$=\other \catcode`\&=\other \catcode`\#=\other \catcode`\%=\other \catcode`\~=\other \catcode`\_=\other \catcode`\^=\other \obeyspaces \obeylines \tt} \outer\def\begintt{$$\let\par=\endgraf \ttverbatim \parskip=0pt \catcode`\!=0 \rightskip-5pc \ttfinish} {\catcode`\!=0 !catcode`!\=\other % ! is temporary escape character !obeylines !obeyspaces % end of line is active !gdef!ttfinish#1^^M#2\endtt{#1!vbox{#2}!endgroup$$}} @* Introduction. The purpose of \.{PROFILE} is to annotate a \PASCAL\ program with data about the estimated computer time ascribed to each line of code. If the \PASCAL\ program has been produced by \.{TANGLE}, the accumulated run time estimates for each section of the \.{WEB} source code will also be determined. Running time is estimated roughly by assigning an integer weight to each feature of \PASCAL. For example, the statement `$A[i]:=x+2/n$' will be weighted by the time to fetch $i$, $x$, 2, and $n$, plus the time to compute the location of $A[i]$ given $i$, plus the time to add, to convert $n$ from integer to real, to divide, and to store in memory. Approximate costs of each primitive operation appear in macro definitions like \\{int\_real\_cost} that are grouped together in one of the sections below; thus, they are not hard to change. The \.{PROFILE} program includes a \PASCAL\ parser, so it can probably be modified to do other operations on \PASCAL\ source code. Indeed, this program intentionally deals with \PASCAL\ in more generality than it needs to, so that it can be used as a starting point for programs that translate subsets of \PASCAL\ to other higher-level languages. Don Knuth wrote the first version of \.{PROFILE} in October, 1983, for the purpose of analyzing bottlenecks in the \TeX\ processor. This version much revised by R.M.Damerell, in an attempt to get it running on VAX-VMS. I have tried to add support for the whole of Standard \PASCAL, also for the easier features of VMS \PASCAL. The ``banner line'' defined here should be changed whenever \.{PROFILE} is modified. @d banner=='This is VMS_PROFILE, Version 0' @ The program begins with a fairly normal header, made up of pieces that will mostly be filled in later. There are three input files: \smallskip \hang |pascal_file| is a syntactically correct \PASCAL\ source program, preferably (but not necessarily) output by \.{TANGLE}. \smallskip \hang |count_file| is a text file containing integer counts, one per line, representing the number of times that key statements of the \PASCAL\ program were executed. There should be one count for the first statement in each block and for the first statement following {\bf do}, {\bf then}, {\bf else}, {\bf repeat}, plus one count for each colon following a label. An artificial count of `{\tt-1}' should also appear at the very end of the file. I have written the companion program \.{PRE-PROFILE} to generate count files. \smallskip\noindent |pool_file| is generated in the usual way by \.{TANGLE}. Also we need to read and write on the user's terminal. \smallskip\noindent The |display| file will receive a ``pretty printed'' version of the \PASCAL\ source code, with frequency counts and weights accompanying each statement. A summary of the total weight for each \.{WEB} section will also be appended to the output, if the source file contains comments like `\.{\{123:\}}' and `\.{\{:123\}}' around the code for section~123. \NP\ now also prints a summary list of procedures and their weights. \smallskip\noindent Comments like `\.{\{+10\}}' or `\.{\{-5\}}' will add 10 or subtract 5 from the weight that \.{PROFILE} ordinarily computes at a particular place in the source file. The special comment `\.{\{\^\}}' in a procedure heading indicates that the procedure in question does not exit. If it is necessary to abort the job because of a fatal error, the program calls the `|jump_out|' procedure, which goes to the label |end_of_PROFILE|. Altered by RMD: for debugging purposes, a fatal error now causes a crash. @p program PROFILE(pascal_file,input,count_file,output,display); const @ @/ type @ @/ var @ @/ @ @/ @ @ @/ @ @/ @ @ @= begin @ @/ @ @/ print_ln(banner); {print a ``banner line''} end; @ Labels are given symbolic names by the following definitions. We insert the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a \&{case} statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the \&{loop} construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|continue|'. @d exit=10 {go here to leave a procedure} @d restart=20 {go here to start a procedure again} @d reswitch=21 {go here to start a case statement again} @d continue=22 {go here to resume a loop} @d done=30 {go here to exit a loop} @d done1=31 {go here to exit another loop} @d done2=32 {and another} @d done3=33 {and another} @d done4=34 {in case of five loops in one big procedure} @d found=40 {go here when you've found it} @d not_found=41 {go here when you've found something else} @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @d do_nothing == {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil @f loop == xclause @ We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like @^system dependencies@> $$\vbox{\halign{#\hfil\cr |case x of|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr |endcases|\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the compiler used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or `\\{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. (Of course, if no default mechanism is available, the |case| statements of this program must be extended by listing all remaining cases.) When the \PASCAL\ source program being processed by \.{PROFILE} has default cases, we assume below that the default case is labeled with some recognizable keyword, and that the |count_file| contains a count for this label just like any other. @d othercases == otherwise {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ The following parameters are set big enough to handle \TeX, so they should be sufficient for most applications of \.{PROFILE}. We will use ``half'' words to address memory etc; these are 16 bits long. |@!max_names| is the maximum number of identifiers and reserved words; it should be a prime number. |@!mem_size| is the size of in the dynamic memory array; the sum of these numbers must be less than |max_halfword|. (BODGE!!!) @d min_halfword==0 @d max_halfword==65535 @d max_names = 4999 @d mem_size=30000 @= @!pool_size=30000; {maximum total length of identifiers and reserved words} @!max_modules=2000; {greater than the total number of \.{WEB} sections} @!buf_size=200; {maximum length of input line} @!line_length=100 ; {lines of unannotated \PASCAL\ output} @!weight_length=6; {columns of annotation for weight data} @!freq_length=10; {columns of annotation for frequency data} @!out_buf_size=1000 ; {output lines before breaking to |line_length|} @!max_comment=20; {comments are truncated to this many characters} @!save_size=500; {number of entries on the save stack} @ A lot of initialization needs to be done, and it's convenient to have an integer variable for iteration. @= @!i :integer; {all-purpose indices} @* Error handling procedures. A global variable called |history| will contain one of four values at the end of every run: |spotless| means that no unusual messages were printed; |harmless_message| means that a message of possible interest was printed but no serious errors were detected; |error_message| means that at least one error was found; |fatal_message| means that the program terminated abnormally. The value of |history| does not influence the behavior of the program; it is simply computed for the convenience of systems that might want to use such information. @d spotless=0 {|history| value for normal jobs} @d harmless_message=1 {|history| value when non-serious info was printed} @d error_message=2 {|history| value when an error was noted} @d fatal_message=3 {|history| value when we had to stop prematurely} @# @d mark_harmless==@t@>@+if history=spotless then history:=harmless_message @d mark_error==history:=error_message @d mark_fatal==history:=fatal_message @ At the end of the run, we report the history to the user. @^system dependencies@> @= case history of spotless: print_nl('(No errors were found.)'); harmless_message: print_nl('(Did you see the warning message above?)'); error_message: print_nl('(Pardon me, but I think I spotted something wrong.)'); fatal_message: print_nl('(That was a fatal error, my friend.)'); end {there are no other cases} @ The command `|err_print('! Error message')|' reports errors and then gives an indication of where the error was spotted in the source file. Nearly all the errors that \NP\ detects are syntax errors; they occur because \NP\ is trying to parse a different dialect of \PA. Since \NP\ is supposedly working from a running \PASCAL\ program, its error messages have not been jazzed up for super error recovery. @d err_print(#)== begin new_line; print(#); error; end @= procedure error; {prints `\..' and location of error message} var@!k,@!l: 0..buf_size; {indices into |buffer|} begin @; mark_error; if debug_count > 0 then decr( debug_count ) else if debugging then debug_help; end; @ @= history:spotless..fatal_message; {how bad was this run?} debug_count: integer; debugging:boolean ; @ @= history:=spotless; debug_count := 0 ; debugging := true ; @ Error locations are indicated by the global variables |loc| and |line|, which mark respectively the first unlooked-at position in |buffer| and the current line number. This routine should be modified on systems whose standard text editor has special line-numbering conventions. @= begin print('. ('); print_ln('l.', line:1, ')'); if loc>=buf_len then l:=buf_len else l:=loc -1 ; for k:=1 to l do print( buffer[k ]); {print the characters already read} new_line; for k:=1 to l do print(' '); {space out the next line} for k:=l +1 to buf_len do print( buffer[k ]); {the part not yet read} new_line; end @ If a fatal error occurs, then the program will force a crash. With the VMS debugger, you can then interrogate variables, etc. I chose the square root of -1 as this is unlikely to figure prominently in \TeX-related programs. An overflow stop occurs if \.{PROFILE}'s tables aren't large enough. @^square root@> @d fatal_error(#)==begin new_line; print(#); crash; end @d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded') @.Sorry, x capacity exceeded@> @= procedure crash; var u: real; begin u := -1 ; debugging := true; debug_count := 0; error; u := sqrt(u) ; end; @ The most important of these procedures is |debug_help|. This reads a command from the terminal and tries to exhibit \NP's internal data. @d breakpoint = 1000 @= procedure debug_help; {routine to display various things} label breakpoint ; var n :integer; ddt: char; oo: byte; begin debugging := false; write(term_out,'DEBUG_HELP'); update_terminal; repeat write(term_out, '#'); update_terminal; read(term_in,ddt); oo := zord( ddt) ; if ( high) then oo := oo + "a" - "A" ; if oo = "i" then begin read_ln(term_in, n); print_id(n); end else if oo = "m" then begin read_ln(term_in, n); print_mem(n); end else begin read_ln(term_in); if oo = "r" then error else if oo = "g" then else if oo = "b" then begin goto breakpoint; {go to every label at least once} breakpoint: oo :=0; end else @ end ; until oo = "g" ; debugging := true; end; @ @= begin print_ln( 'The DEBUG_HELP commands are as follows. is an integer > 0'); print_ln( 'I Show identifier n '); print_ln( 'M Show memory cell n '); print_ln( 'R Redisplay input buffer '); print_ln( 'G Go '); print_ln( 'B Go to breakpoint ( label 1000) '); print_ln( 'but you must set a breakpoint at this label before you can use it' ); end; @ @= procedure print_id ; var k: pool_pointer; begin if (u>=id_ptr ) or (u<0) then print_ln('IMPOSSIBLE') else begin for k:=id_start[u] to id_start[u+1]-1 do print( id_pool[ k ]); print_ln( ' CODE = ', id_code( u):1 , ' AUX = ', id_aux( u):1 ); end; end; @# procedure show_id ; var k: pool_pointer; begin if (u>=id_ptr ) or (u<0) then print_ln('IMPOSSIBLE') else begin for k:=id_start[u] to id_start[u+1]-1 do print( id_pool[ k ]); end; end; @# procedure print_mem ; begin print( 'VALUE = ' , val( n):1) ; print( ' INFO = ' , info( n):1) ; print_ln( ' LINK = ' , link( n):1) ; end; @ @= procedure debug_help; forward; procedure print_mem(n : integer ) ; forward; procedure show_id (u:halfword) ; forward; procedure print_id (u:halfword) ; forward; @* The character set. The most important part of \NP\ is the code which parses a \PA\ program. I want to adopt a definite scheme in describing this. So for each entity that might appear, I consider: how is it held in memory? how is it recognised in the input? and how transmitted to the output? Since the structure of \.{WEB} is strongly biassed towards bottom-up programming, I start with characters. Most \.{WEB} programs convert their input to an internal seven-bit code that is essentially standard ASCII, the ``American Standard Code for Information Interchange.'' I have decided to delete all this, for these reasons: 1. It is not necessary. \TeX\ had to do this conversion because characters have to occupy known positions in \TeX\ fonts. 2. It makes debugging harder, because it is much easier to examine an array of |char| than an array of small numbers. 3. Every VMS change file I have seen initialises |xord| incorrectly. The only relic of this code conversion that seems worth preserving is the following pair of macros; in theory, one can restore the |xord..xchr| code if necessary. @d zord( #) == ord ( #) @d zchr( #) == chr ( #) @ Now when we have read a character from the input file, we want to know what sort of character it was. The variable |@!oo| will be its ordinal value, and the following macros will test it. @d min_char = 0 {ASCII null} @d max_char = 126 {ASCII tilde} @d bad == ( oo < min_char ) or ( oo > max_char) {Any such characters will be rejected} @d control == ( oo <= 32 ) {These will all be translated into ASCII space} @d low == (( oo >= "a" ) and ( oo <= "z" )) @d high == (( oo >= "A" ) and ( oo <= "Z" )) @d letter == low or high or ( oo = "$" ) or ( oo = "_" ) @d digit == (( oo >= "0" ) and ( oo <= "9" )) @d quote == ( oo = "'") @ Now lets consider character strings. @= blank, file_name, buffer: packed array[ 1..buf_size] of char ; f_len : integer; @!oo: byte ; {The ordinal value of the current character} @ @= for i := 1 to buf_size do blank[ i] := ' ' ; @* Lowest level of input and output. The |input_ln| procedure is based on the same-named procedures in \TeX\ and \.{TANGLE}. It gets the next line of input from the specified file into the |buffer| and returns the value |true|, unless the file has already been entirely read, in which case it returns |false|. I have tried to write it in a robust fashion, but you can never tell what idiotic things other operating systems will do when reading files. Note that there are four input channels and the superior software must ensure that reads on different channels do not get confused. @d text_file == text @= function input_ln(var f:text_file):boolean; forward; @ @p function input_ln ; {inputs a line or returns |false|} var n: byte; cc:char; bad_char, break: boolean ; begin buffer := blank ; n := 0 ; break:= false ; bad_char:= false ; while ( eoln( f) and not eof( f)) do begin get( f) ; incr( line); end; {Go to the beginning of a non-blank line} if eof(f) then input_ln:=false else begin input_ln:= true; while ( not break) and ( not eoln(f)) do @; if bad_char then err_print ( 'Bad character( s) in input, skipped') ; end; end; @.Input line too long@>@.Bad character...@> @ Now |n| is the length of the line so far, with characters in positions |1..n|. If the line threatens to be too long, I will try to chop it at a space. Unfortunately, this might chop a quoted string in half, but this seems less bad than simply crashing on every long line. @= begin cc:=f^; get(f); oo := zord( cc) ; if bad then bad_char := true else if control and ( n> buf_size - 30) then break := true else begin incr( n) ; buffer[ n ]:= cc ; buf_len := n ; if n >= buf_size -1 then begin break := true ; err_print('! Input line too long'); end; {I failed! So break at a random place } end; end; @ Input from the count file is trivial; we simply ask for a count, whenever it's time to know a new one. If |count_file| has the wrong number of counts, the discrepancy will be reported later; such problems will usually also be caught by the redundancy check that is made at the time of every `\&{else}' count. @p function get_count:integer; begin if last_count<0 then begin decr(last_count); get_count:=0; end else if eof( count_file) then begin last_count:=-1; get_count:=0; end else begin read_ln(count_file,last_count); if last_count < 0 then begin last_count:=-1; get_count:=0; end else get_count:=last_count; end end; @ Here are two global variables associated with |get_count|. @= @!last_count:integer; {the last count read from |count_file|, or $-n$ if that file is $n$ counts short} @!n:integer; {a temporary counter} @ @= last_count:=0; @ The proper number of counts is checked at the end, thusly: @= if last_count<0 then begin err_print('! count file had ', (-last_count):1,' too few counts'); end else begin n:=-1; repeat incr(n); read_ln(count_file,last_count); until eof( count_file) or ( last_count<0) ; if n>0 then begin err_print('! count file had ', n:1, ' too many counts.'); end; if last_count <> -1 then begin err_print('! count file didnt end with -1 '); end; end @.count file...@> @ The |get_line| procedure is called when |loc>buf_len|; it puts the next line of merged input into the buffer and updates the other variables appropriately. A space is placed at the right end of the line. @p procedure get_line; {inputs the next line} begin if in_pascal then begin if not input_ln(pascal_file) then fatal_error('Premature end of PASCAL file') else loc := 1; end else if not input_ln(pool_file) then fatal_error('Premature end of POOL file') else loc := 3 ; buffer[buf_len+1 ]:=' '; end; @.Premature end...@> @* Opening files. The input and output conventions of this program have been copied from those of \.{WEAVE} and \.{TANGLE}. Therefore people who need to make modifications should already know how to do it. Terminal output is done by writing on file |term_out|, which is assumed to consist of characters of type |text_char|: Different systems have different ways of specifying that the output on a certain file will appear on the user's terminal. In VMS this happens automatically. The |update_terminal| procedure is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. @^system dependencies@> @d term_out == output @d term_in == input @d print(#)==write(term_out,#) {`|print|' means write on the terminal} @d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line} @d new_line==write_ln(term_out) {start new line} @d print_nl(#)== {print information starting on a new line} begin new_line; print(#); end @d update_terminal == write_ln(term_out) {empty the terminal output buffer} @ The main input comes from |pascal_file| and |count_file|, as explained earlier, and output goes to |display|. @= @!pool_file, pascal_file:text_file; {the \PASCAL\ source code} @!count_file:text_file; {frequency counts} @!display:text_file; {the final result} @!s: integer; {status after an attempted |open|} @ The following code opens these files. In VMS, the best way to do this is to fetch a ``command line'', which may contain the input file name. If this fails, we ask again on the terminal. @^system dependencies@> @d VAX_immed==@= %immed @> @d VAX_descr==@= %descr @> @d VAX_ref==@= %ref @> @d VAX_foreign == @= lib$get_foreign@> @d VAX_continue == @=continue@> @d get_set == begin line := 1; loc := 1 ; buf_len := 0 ; end; @= function VAX_foreign( VAX_descr cmdlin:[volatile] packed array [$l1..$u1:integer] of char := VAX_immed 0; VAX_descr prompt:[volatile] packed array [$l2..$u2:integer] of char := VAX_immed 0; var len : [volatile] halfword := VAX_immed 0; var flag : [volatile] integer := VAX_immed 0) :integer; extern; @ @= VAX_foreign(file_name,,f_len); repeat open( pascal_file, file_name, old,default := '*.PAS', error:= VAX_continue) ; s := status( pascal_file); if s=0 then reset( pascal_file, error:= VAX_continue) ; s := status( pascal_file); if s <> 0 then begin write_ln ( term_out, 'OPEN failed for PASCAL file, try a new file name?') ; if input_ln( term_in) then begin file_name := buffer ; f_len := buf_len ; end; end; until s = 0 ; open( pool_file, 'PROFILE.POO', old); reset ( pool_file) ; open( count_file, '.cou', old, default := file_name) ; reset(count_file); open( display, '.dis', new, default := file_name) ; rewrite( display); @* The dynamic memory. A large array called |mem| is used for general-purpose list manipulations. Each word of this array contains either a single integer or two halfwords. Halfwords are used to hold pointers or numeric data, as is usual in list processing. As in \TeX82, halfwords are assumed to contain values between |min_halfword| and |max_halfword|; these quantities should be defined so that \PASCAL\ packs two halfwords into the space of an |integer|. A null pointer is represented by |null|, which is defined to be |min_halfword|, the index of the smallest word in |mem|. Altered: now |null| is changed, the idea is to force |pointer| and |id_pointer| to be disjoint subranges. @d null = 0 {BODGE!!!} @d mem_top = null+mem_size @= byte = 0..255; @!halfword=min_halfword..max_halfword;@/ @!pointer=null..mem_top;@/ @!two_halves=packed record @!lh,@!rh:halfword; end; @!memory_word=record case boolean of false:(@!int:integer); true:(@!hh:two_halves); end; @ We use the notations |info(p)| and |link(p)| to stand for the halfwords in |mem[p]|; or |val(p)| to stand for the fullword integer value stored there. The largest |mem| index used so far is called |mem_end|. Available locations |<=mem_end| are maintained in a list |avail|, |link(avail)|, |link(link(avail))|, \dots; when this list is empty, we have |avail=null|. @d info(#)==mem[#].hh.lh {the ``left'' halfword in a given node} @d link(#)==mem[#].hh.rh {the ``right'' halfword in a given node} @d val(#)==mem[#].int {the memory word as a (possibly signed) integer} @= @!mem: array[pointer] of memory_word; {dynamic all-purpose memory} @!mem_end:pointer; {we haven't touched |mem[mem_end+1..mem_top]|} @!avail:pointer; {head of available space stack} @ @= mem_end:=null; avail:=null; @ The function |get_avail| returns a pointer to a new word whose |link| field is null. An overflow stop occurs if no room is left. @= function get_avail:pointer; {allocation} var p:pointer; {the new node} begin p:=avail; {get top location in |avail| stack} if p<>null then avail:=link(p) {and pop it off} else if mem_end= temp: pointer; @ Now list-processing programs tend to be very messy. The two main difficulties seem to be: first, it is impossible to specify the contents of a complicated list in words. We need an easily-readable notation. The best I can do is: denotes a cell whose |info| and |link| contain |a| and |b| respectively. means that the |info| is a pointer to cell |a|. is an un-divided cell whose |val| is |m|. Finally is a linked list, terminated with a |null| in the last link. This notation is clearly not adequate, but it is the best I can do. Second, in order to write clean programs, we need a set of macros for simple list operations. When the list will be used last-in-first-out, this is simple. To make an empty list, do |p:=null|. To add something to the front end do |prepend( data)( list)|; to remove it, do |step_wipe|. FIFO lists are more difficult. |new_list| makes a new list; |append( list)( data)| appends to it; then we must do |finish( list)| to terminate it. Note the unexpected feature of FIFO lists: |info( p)| points to the tail of the list and |link( p)| to the head. I have not yet adapted everything to make use of all these macros. @d pre_tail( #) == link( temp) := # ; # := temp @d prepend( #) == temp := get_avail; info( temp) := # ; pre_tail @d new_list( #) == # := get_avail ; info( #) := # ; link( #) := null @d append( #) == link( info( #)):= get_avail ; info( #) := link( info( #)) ; info( info( #)) := @d finish( #) == link( info( #)) := null ; step_wipe( #) @* Identifiers. There's a separate memory for identifier names; such names will never be deleted after they've been inserted. \.{PROFILE}'s name memory is implemented essentially as \TeX's string pool: Name number~|k| consists of characters |id_start[k]| through |id_start[k+1]-1|, inclusive, of the |id_pool| array. Uppercase letters normally get converted to lowercase before being stored in |id_pool|. @= @!id_pointer=0..max_names; @!pool_pointer=0..pool_size; @ The variables |pool_ptr| and |id_ptr| are used to allocate space in |id_pool| and |id_start|, respectively. The ``meaning'' of identifier number |k| is stored in |equiv[k]|; it consists of two halfwords called |id_code(k)| and |id_aux(k)|. @d length(#)==(id_start[#+1]-id_start[#]) @d id_code(#)==equiv[#].lh @d id_aux(#)==equiv[#].rh @= @!id_pool:packed array[pool_pointer] of char ; {the characters} @!id_start:array[id_pointer] of pool_pointer; {the starting pointers} @!pool_ptr:pool_pointer; {the first unused position in |id_pool|} @!id_ptr:id_pointer; {the highest used position in |id_start|} @!equiv:array[id_pointer] of two_halves; {the equivalents} @ The first 128 ``identifiers'' are one character long, and they represent themselves. (This is an exception to the lowercase-only convention stated earlier.) Undefined entries have |id_code=undefined| and |id_aux=0|. @d undefined= 32766 @= id_code(0):=undefined; id_aux(0):=0; {set up |equiv[0]|} for i:=0 to 127 do begin id_start[i]:=i; id_pool[i]:= zchr( i); equiv[i]:=equiv[0]; end; id_start[128]:=128; id_ptr:=128; pool_ptr:=128; @ A hash table is used to help locate identifiers that are more than one character long. If |hash[k]| is nonzero, it points to such an identifier. The variable called |loc| is used to specify buffer locations when an identifier needs to be located. @= @!hash:array[id_pointer] of id_pointer; {hash table for linear probing} @ @= for i:=0 to max_names do hash[i]:=0; @ The |find_id| procedure reads an identifier from the |buffer| and returns a pointer to its index in |id_start|. When this is called, |loc| points to the first letter of the identifier, and |oo| is its ordinal value. It is possible to overflow the pool capacity or the name capacity. But the hash table will never fill up, because the first 128 names have been kept out of that table. @p function find_id:id_pointer; {finds the current identifier} var i:0..buf_size; {index into |buffer|} @!h:id_pointer; {hash code} @!k:pool_pointer; {index into |id_pool|} @!left, l:0..buf_size; {left hand end of the identifier and its length } @!p:id_pointer; {where it is being sought} begin l := 0; left := loc ; p := 0 ; {an illegal value} @; if l > 1 then begin @; find_id := p ; end else find_id:= h ; end; @ A simple hash code is used: If the sequence of characters have ordinal values $c_1c_2\ldots c_m$, its hash value will be $$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|max_names|.$$ (That's why |max_names| is supposed to be prime.) The symtax of identifiers is peculiar. First, VMS \PA has some special words of the form |'%' (letter)*|. @= h:= 0 ; oo := zord( buffer[ loc]); if reserving then while loc <= buf_len do begin h := ( 2*h + oo) mod max_names; next; incr( l) ; end else if ( oo = "%") then begin h := "%" ; next; l := 1 ; while ( letter) do begin @ h := ( 2*h + oo) mod max_names; next; incr( l) ; end; end else while ( letter) or ( digit) do begin @ h := ( 2*h + oo) mod max_names; next; incr( l) ; end; @ We need to be able to insert into the pool file some special identifiers which are guaranteed not to appear in the Pascal file. So we convert letters to lower case, but only when reading the Pascal file. @= if in_pascal and ( high) then begin oo := oo + "a" - "A" ; buffer[loc] := zchr( oo) ; end; @ @= repeat p:=hash[h]; if p=0 then @ else if length(p)=l then @ else p := 0 ; if h=0 then h:=max_names@+else decr(h); until p <> 0 @ @= begin if pool_ptr+l>pool_size then overflow('name pool'); if id_ptr=max_names then overflow('names'); i:=left; while i< loc do begin id_pool[pool_ptr]:=buffer[i]; incr(i); incr(pool_ptr); end; p:=id_ptr; hash[h]:=p; equiv[p]:=equiv[0]; {the new identifier is |undefined|} incr(id_ptr); id_start[id_ptr]:=pool_ptr; end @ @= begin i:=left; k:=id_start[p]; while (i< loc)and(buffer[i]=id_pool[k]) do begin incr(i); incr(k); end; if i <> loc then p := 0 ; {not all characters agree} end @ Identifier interpretation. The equivalent of each identifier consists of two halfwords called the |id_code| and |id_aux|, as we have seen. When an identifier is declared, its equivalent is redefined; and the old value is saved for future restoration if this declaration is not in the outer block. The |id_code| might be |undefined| or it might be one of the codes like | "while" | for a reserved word; or it might be one of the codes for non-reserved words that we are about to discuss. When an identifier is declared, its equivalent is redefined. The old value is saved for future restoration unless this declaration was in the outer block. Identifier equivalents are saved and restored by the `|save_stack|'. The variable |save_ptr| points to the first unused location on this stack; and |save_ptr=save_base+2n|, when |n| equivalents are to be restored at the end of the current block. Location |save_base-1| contains the previous value of |save_base|, if |save_base>0|; and |save_base>0| if and only if definitions are to be saved, i.e., if and only if we are not making a definition at the outermost level. A saved definition occupies two words on the save stack: First comes the old value of |equiv[n]| and then the value of |n|. @= @!save_stack:array[0..save_size] of memory_word; @!save_ptr:0..save_size; {number of locations used in |save_stack|} @!save_base:0..save_size; {bottom of current level} @ @= save_base :=0 ; save_ptr := 0 ; @ The three operations we need for |save_stack| maintenance are really simple. @p procedure save(n:id_pointer); begin if save_base>0 then begin if save_ptr+2>save_size then overflow('save stack'); save_stack[save_ptr].hh:=equiv[n]; save_stack[save_ptr+1].int:=n; save_ptr:=save_ptr+2; end; end; @# procedure push_save_stack; begin if save_ptr+1>save_size then overflow('save stack'); save_stack[save_ptr].int:=save_base; incr(save_ptr); save_base:=save_ptr; end; @# procedure unsave; var n:id_pointer; begin while save_ptr>save_base do begin save_ptr:=save_ptr-2; n:=save_stack[save_ptr+1].int; equiv[n]:=save_stack[save_ptr].hh; end; decr(save_ptr); save_base:=save_stack[save_ptr].int; end; @* Operation costs. Here we define the assumed cost of each operation. The numbers are expressed in terms of a unit that essentially represents a memory access. The numbers aren't extremely precise, because the goal is simply to give a ballpark estimate of running time. On the KL-10 computer, each unit supposedly represents about 400 nanoseconds. @d fetch_cost=1 {time to load a variable or constant} @d store_cost=1 {time to store into a variable} @d jump_cost=1 {time to |goto| a (local) label} @d index_cost=1 {extra time for array-address computation} @d packed_index_cost=10 {ditto, for packed arrays} @d point_cost=1 {cost of |^| in pointers} @# @d add_cost=1 {integer addition or subtraction} @d mult_cost=5 {integer multiplication} @d div_cost=10 {integer |div| and |mod|} @d real_add_cost=5 {floating point addition or subtraction} @d real_mult_cost=6 {floating point multiplication} @d real_div_cost=12 {floating point division} @d compare_cost=1 {equality or inequality tests} @d in_cost=3 {\&{in}} @d set_cost=1 {set union, intersection, difference} @d and_or_cost=1 {\&{and}, \&{or}} @# @d unary_cost=1 {|abs| and |odd| and |pred| and |succ| and \&{not} and negation} @d ord_chr_cost=0 {|ord| or |chr|} @d int_real_cost=25 {|trunc|,|round|, or fix-to-float} @d char_string_cost=15 {read or write a |char|} @d int_string_cost=100 {read or write an integer} @d real_string_cost=300 {read or write a real} @d string_string_cost=5 {setup time for reading and writing a string} @d string_string_tax=5 {ditto, extra cost per character} @d array_string_cost=250 {reading characters into an array} @d transcendental_cost=100 {|sin|, |exp|, etc. (guesstimate)} @d new_cost=50 {|new|, |dispose| (guesstimate)} @d pack_cost=250 {|pack|, |unpack| (guesstimate)} @# @d packed_surcharge=1 {packed record field cost added to |fetch_cost|} @d var_surcharge=1 {extra cost for fetching or storing a \&{var} parameter} @d call_overhead=10 {push or pop stack and transfer to or from procedure or function} @# @d open_cost=3000 {|reset| or |rewrite|} @d close_cost=600 {|close|} @d break_in_cost=100 {|breakin|} @d get_put_cost=10 {|get| or |put| on a file} @d eof_cost=10 {|eof| or |eoln|} @ Besides the costs of basic operations, there are costs associated with changes of control. @d if_cost=1 {setup time for $\&{if}\ldots\&{then}\ldots\&{else}$} @d for_cost=5 {\&{for} loop setup time} @d for_tax=5 {extra charge per iteration} @d while_cost=1 {setup time for \&{while} loop} @d while_tax=1 {extra charge per iteration of \&{while} loop} @d repeat_tax=1 {extra time per iteration of \&{repeat} loop} @d case_cost=5 {setup time for branching in \&{case} statement} @* Frequencies and weights. An operation of weight $w$ that is performed $f$ times contributes a total of $w\cdot f$ to the running time. Both $w$ and $f$ are integers. I also attempt to calculate the explicit cost of each procedure as well. I have not tried to get the implicit cost as I have not thought of any plausible formula for this. If we are currently in \.{WEB} sections $m_1$, \dots, $m_k$ (from the ``outside in''), the product $w\cdot f$ must be added to the explicit cost of section $m_k$ and to the implicit cost of sections $m_1$, \dots, $m_{k-1}$. By convention, $m_1$ is always zero; hence the dummy section~0 will be credited with all accumulated costs. A stack is maintained with |info(sec_ptr)=@t$m_k$@>|, |info(link(sec_ptr))=@t$m_{k-1}$@>|, etc. @= @!impl_cost,@!expl_cost, {accumulated implicit and explicit runtime estimates} @!proc_cost, {cost of procedure or function} @!static_weight: wt_array ; {accumulated weight without regard to frequency} @!proc_names: array [ mod_p ] of id_pointer ; {their names} @!max_section, max_proc, cur_proc: mod_p ; {maximum section number whose weight has been recorded} @!sec_ptr,@!proc_ptr:pointer; {top of the stack of current section numbers and procedures} @ @= for i:=0 to max_modules do begin impl_cost[i]:=0; expl_cost[i]:=0; static_weight[i]:=0; proc_cost[i]:=0; proc_names[0] := "program" ; end; sec_ptr:=get_avail; info(sec_ptr):=0; max_section:=0; proc_ptr:=get_avail; info(proc_ptr):=0; max_proc:=0; @ @= mod_p = 0..max_modules ; wt_array = array[ mod_p ] of integer; @ The stack of section numbers is kept up to date by two fairly simple procedures. Mismatched numbers (e.g., \.{\{123:\}\{:124\}}) are not permitted. @p procedure push_section(@!m:halfword); var p:pointer; begin if m>max_section then begin if m>max_modules then overflow('section'); max_section:=m; end; p:=get_avail; info(p):=m; link(p):=sec_ptr; sec_ptr:=p; end; @# procedure pop_section(@!m:halfword); begin if (m=0)or(info(sec_ptr)<>m) then fatal_error('WEB sections dont match, should be ',info(sec_ptr):1); @.WEB sections don't match@> step_wipe(sec_ptr); end; @ And here is the corresponding code for procedures. @= incr( max_proc) ; cur_proc := max_proc ; proc_names[ max_proc] := name ; z := get_avail ; info( z) := max_proc ; link( z) := proc_ptr ; proc_ptr := z ; @ @= step_wipe( proc_ptr) ; cur_proc := info( proc_ptr) ; @ If \.{PROFILE}'s frequency assumptions are correct, the program component that it is currently processing was executed |cur_freq| times. Another global quantity, |out_wt|, contains the total weight that has not yet been reported in the |output| file. @= @!cur_freq:integer; {the current frequency} @!out_wt: integer; {accumulated weight to show on the next output line} @ @= cur_freq:=0; out_wt:=0; @ Here is the procedure that multiplies the current frequency by a given weight and adds it to all appropriate subtotals. @p procedure add_weight(@!w:integer); var @!p : pointer; {for list manipulation} @!wf: integer; {weight times frequency} begin out_wt:=out_wt+w; static_weight[info(sec_ptr)]:=static_weight[info(sec_ptr)]+w; wf:=w; wf:=wf*cur_freq; expl_cost[info(sec_ptr)]:=expl_cost[info(sec_ptr)]+wf;@/ proc_cost[cur_proc]:=proc_cost[cur_proc]+wf;@/ p:=link(sec_ptr); while p<>null do begin impl_cost[info(p)]:=impl_cost[info(p)]+wf; p:=link(p); end; end; @ After the weights have all been gathered, it's time to massage them a bit and sort them, then publish the output. @= @!nn:0..max_modules; {Number of modules with positive cost} t: integer; @!x:array[0..max_modules] of 0..max_modules; {sections being sorted} @!total, {total weight} @!factor:real; {multiply by this to get percentage of total weight} @ @= begin total :=impl_cost[0]+expl_cost[0]; write_ln( display, 'Total weight was', total ) ; if total =0 then begin print_nl('Total weight was zero.'); mark_harmless; @.Total weight was...@> end else begin factor:=100.0/ total ; page( display); for k:=0 to max_modules do begin impl_cost[k]:=impl_cost[k]+expl_cost[k]; impl_cost[k]:= round( impl_cost[k] * factor ) ; expl_cost[k]:= round( expl_cost[k] * factor ) ; proc_cost[k]:= round( proc_cost[k] * factor ) ; end; write_ln( display, 'Accumulated costs of individual WEB sections:'); write_ln(display); write_ln( display, 'secno','implicit cost( % )':17, 'explicit cost( % )':17 , 'static weight'); write_ln( display); for k:=0 to max_section do if (impl_cost[k]<>0)or( expl_cost [k]<>0) then write_ln( display, k:5,impl_cost[k]:17, expl_cost[k]:17, static_weight[k]:13); print_nl('Sorting...'); update_terminal; shell( impl_cost) ; page( display); write_ln( display, 'WEB sections in order of implicit cost:'); write_ln( display); for k:=1 to nn do write_ln( display, x[k]:5,impl_cost[x[k]]:17 ); shell( expl_cost) ; page( display); write_ln( display, 'WEB sections in order of explicit cost:'); write_ln( display); for k:=1 to nn do write_ln( display, x[k]:5,expl_cost[x[k]]:17 ); shell( proc_cost) ; page( display); write_ln( display, 'Procedures in order of estimated cost:'); write_ln( display); for k:=1 to nn do begin n := proc_names[ x[ k]] ; for t := id_start[n] to id_start[n+1] -1 do write( display, id_pool[t] ) ; write( display, ' ': 20 + id_start[n] - id_start[n+1] ) ; write_ln( display, proc_cost[x[k]]:1 ); end; print('Done.'); end; end @ Standard ``Shellsort'' is used [{\sl Art of Computer Programming}, Algorithm 5.2.1D]. @p procedure shell( ww : wt_array ) ; var @!h,@!j,@!k,@!i,@!m:0..max_modules; {variables to control sorting, etc.} @!c:real; {temporary placeholder for a sort key} begin nn:=0; for k:=0 to max_modules do if ww[k]<>0 then begin incr(nn); x[nn]:=k; end; @; while h>0 do begin for j:=h+1 to nn do begin i:=j-h; m:=x[j]; c:=ww[m]; if (c>ww[x[i]])or((c=ww[x[i]])and(mh then i:=i-h; until (c<=ww[x[i]])and((c<>ww[x[i]])or(m>=x[i])); end; h:=h div 3; end ; end; @ (See 5.2.1-(8) in {\sl The Art of Computer Programming}.) @= h:=1; while h= @!head,@!tail:pointer; {pointers for the main translation list} @ @= head:=get_avail; tail:=head; @ The |out| macro puts a given item on the current translation list. @d out(#)== begin link(tail):=get_avail; tail:=link(tail); info(tail):=#; end @ Here is a list of all the commands that might appear in a translation list. @d bad_com = 0 {An illegal command; inserted for debugging} @d verbatim= 1 {entries up to the next |end_verbatim| should be output without a line break} @d end_verbatim= 2 @d begin_section=3 {output `\.\{$m$\.{:\}}', where $m$ is the next entry} @d end_section=4 {output `\.{\{:}$m$\.\}', where $m$ is the next entry} @d change_weight=5 {output `\.{\{+}$v$\.\}', where the next entry points to $v$} @d backspace=6 {back up one character, if at the left margin} @d indent=7 {increase indentation by one} @d outdent=8 {decrease indentation by one} @d all_caps=9 {add |"A"-"a"| to all characters of the next identifier} @d initial_cap=10 {add |"A"-"a"| to the first character of the next identifier} @d max_command=10 @ A global array |out_buf| is used to accumulate characters before they are printed. Output lines are indented by |cur_indent|. @= @!cur_indent:0..out_buf_size; {indentation} @!out_buf:array[0..out_buf_size] of char ; {characters to be output} @!out_line:integer; {serial number of the next line of output} @ @= cur_indent:=0; out_line:=1; @ The |flush_out| procedure outputs the current translation list and removes it from memory. If the list is empty, nothing happens. Otherwise one or more lines are output, with weight and frequency information placed on the final line (if |out_wt| and/or |cur_freq| are nonzero). @p procedure flush_out; var @!p : pointer; {for list manipulation} @!i,@!j,@!k:0..out_buf_size; {indices into |out_buf|} @!t:pool_pointer; {index into |id_pool|} @!offset, @!next_offset:integer; {0 or |"A"-"a"|} @!n:halfword; {command or |id_pointer|} @!v,@!w:integer; {registers used in decimal output conversion} begin if tail<>head then begin p:=link(head); for k:=1 to cur_indent do out_buf[k-1]:=' '; k:=cur_indent; offset:=0; next_offset:=0; while p<>null do begin @; step_wipe(p); end; @; tail:=head; end; end; @ Variable |k| represents the end of the buffer contents before a command is processed; variable~|j| represents the buffer's end after the command. @= n:=info(p); j:=k; if n>max_command then @ else case n of bad_com: err_print('bad command in translation list'); end_verbatim: err_print ( 'end_verbatim out of context'); verbatim: @; begin_section, end_section, change_weight: @; backspace: if k>0 then if k=cur_indent then decr(j); indent: @; outdent: @; all_caps: begin offset:="A"-"a"; next_offset:=offset; end; initial_cap: offset:="A"-"a"; {|next_offset=0|} end; {there are no other cases} if j>line_length then @; k:=j {this value is |<=line_length|} @ Lines that exceed |line_length| are broken, and excess material is indented by |cur_indent+2|. @= begin write( display,' ': weight_length+freq_length ); @ for i:=0 to k-1 do write( display, out_buf[i ]); write_ln( display); incr(out_line); for i:=1 to cur_indent+2 do out_buf[i-1]:=' '; i:=cur_indent+2; if j-(k-i)>line_length then overflow('line length'); while k= if k>0 then begin if (out_wt<>0)or(cur_freq<>0) then write( display,out_wt:weight_length,cur_freq:freq_length) else write( display,' ': weight_length+freq_length ); @ for j:=0 to k-1 do write( display, out_buf[j]); write_ln( display); out_wt:=0; incr(out_line); end @ @= if out_line mod 5 = 0 then begin write( display,'......'); if out_line mod 100 = 0 then begin print('.'); if out_line mod 500 = 0 then print(out_line:1); end; end else write( display,' '); @ @d put_out(#)==begin out_buf[j]:= #; incr(j); end @= begin for t:=id_start[n] to id_start[n+1]-1 do begin put_out(id_pool[t] ); offset:=next_offset; end; offset:=0; next_offset:=0; end @ Verbatim material should consist entirely of |ASCII_code| data. Nodes are deleted from |mem| as they are output. @= repeat step_wipe(p); if info(p)= verbatim then err_print('Nested Verbatims found') else if info(p)= end_verbatim then else put_out( zchr(info(p))) ; until info(p)= end_verbatim @ @= begin if k=cur_indent then begin put_out(' '); end; incr(cur_indent); if cur_indent+10>line_length then overflow('line length'); end @ @= begin if k=cur_indent then decr(j); decr(cur_indent); {the value will not be negative} end @ @= begin step_wipe(p); put_out('{'); if n=change_weight then begin v:=val(info(p)); link( p):= null; step_wipe(info(p)); if v>=0 then put_out('+') else begin put_out('-'); v:=-v; end; end else begin v:=info(p); if n=end_section then put_out(':'); end; @; if n=begin_section then put_out(':'); put_out('}'); end @ @= w:=10; while v>=w do w:=10*w; repeat w:=w div 10; put_out( zchr("0"+(v div w))) ; v:=v mod w; until w=1 {this is inefficient, but it works from left to right} @* Levels of translation. Translation lists are actually output only when \.{PROFILE} is operating on its ``outer level.'' Inner levels are also possible; for example, the parsing of an arithmetic expression might involve many levels. The global variables |head| and |tail| always refer to the translation list on the current level; a stack of other head-tail pairs is accessible via |head_ptr|. @= @!head_ptr:pointer; {top of stack for pushed-down translation lists in progress} @!trans_head,@!trans_tail:pointer; {translation list just completed} @ @= head_ptr:=null; @ To enter a new level of translation list building, there's a |push_level| subroutine: @p procedure push_level; var p,@!q:pointer; {two new nodes added to the stack} begin p:=get_avail; q:=get_avail; link(p):=q; link(q):=head_ptr; head_ptr:=p;@/ info(p):=head; info(q):=tail;@/ head:=get_avail; tail:=head; end; @ Conversely, when we lower the level, the current head and tail are placed in variables |trans_head| and |trans_tail|, respectively. @p procedure pop_level; begin trans_head:=head; trans_tail:=tail;@/ head:=info(head_ptr); step_wipe(head_ptr); tail:=info(head_ptr); step_wipe(head_ptr); end; @ It is a simple matter to append the |trans_head/trans_tail| list to the current list: @p procedure app_trans; begin if trans_tail<>trans_head then begin link(tail):=link(trans_head); tail:=trans_tail; end; step_wipe(trans_head); end; @ When |get_next| looks ahead and sees a comment like \.{\{:123\}}, it doesn't immediately delete section~123 from the section stack, because the lookahead process is (by definition) one jump ahead of the parsing process. The section stack is actually updated later, when the translation list specified by |new_trans| is appended to the parser's current |head/tail| list. The |get_next| routine does this delayed updating just before looking ahead for another token. @d new_trans==link(new_trans_loc) @= if new_trans<>null then begin link(tail):=new_trans; repeat tail:=link(tail); if info(tail) = begin_section then begin tail:=link(tail); push_section(info(tail)); end else if info(tail) = end_section then begin tail:=link(tail); pop_section(info(tail)); end else if info(tail) = change_weight then begin tail:=link(tail); add_weight(val(info(tail))); end; until link(tail)=null; new_trans:=null; end @* Getting the next token. A ``top down'' or ``recursive descent'' parsing strategy is used, in which the program has looked ahead one token in the input. A token is a special character like `\.+'; or a special pair of characters like `\.{<=}'; or a reserved word like `\.{ARRAY}'; or an identifier; or a numeric constant; or a string constant. It may be preceded by any number of blanks and/or comments; blanks are removed, but the comments are copied to the output. Whenever a token is acceptable to the syntax, that token is swallowed (never to be put back), and the program looks ahead for another token. This elementary lookahead is done by \.{PROFILE}'s |get_next| routine, which sets up several global variables to show what it has found: \smallskip \hang|token| is either one of the special code values listed below or an |id_pointer| that points to an identifier or reserved word. If it is an identifier, then: \smallskip \hang|cur_code| and |cur_aux| are its |id_code| and |id_aux|. Otherwise, they are zero. \smallskip \hang|cur_val| is the value of an integer constant, or the length of a quoted string, or zero if irrelevant. \smallskip \hang|new_trans| is the head of a translation list for the token that has just been scanned. The translation list includes comments (interpreted if they have special forms, otherwise truncated to a length at most |max_comment|); spaces have been suppressed, except that spaces are inserted before or after reserved words as mentioned earlier. Reserved words have been preceded by an |all_caps| command in the translation. \smallskip \hang|kludge_flag| has been set |true| if the comment `\.{\{\^\}}' was sensed. @= @!token:halfword; {|id_code| or other code in current lookahead token} @!cur_code,@!cur_aux:halfword; {its |id_code| and |id_aux| if it is an identifier} @!cur_val:integer; {integer value of current lookahead token, if appropriate} @!new_trans_loc:pointer; {head of translation list for current lookahead token} @!kludge_flag:boolean; {set to |true| when |get_next| sees an uparrow comment} @ @=new_trans_loc:=get_avail; @ These global variables help |get_next| control the low-level activities associated with taking characters out of the input buffer: @= @!line:integer; {the number of the current line in the current file} @!buf_len:0..buf_size; {the last character position occupied in the buffer} @!loc:0..buf_size; {the next character position to be read from the buffer} @!get_tail:pointer; {tail of the |new_trans| list being built} @ Here now is the program for |get_next|. Remember that a call of |get_next| means, essentially, ``I accept the previous token. Look ahead for another, because I will want to look at it later.'' @d unknown = 32765 {any nonsense value} @d next == incr( loc); oo := zord( buffer[ loc]) @d app(#)==begin link(get_tail):=get_avail; get_tail:=link(get_tail); info(get_tail):=#; end @d set_code==begin token:= oo; app( oo); next ; end @d compress(#)==begin token:=#; app( token); loc := loc+2 ; oo := zord( buffer[ loc]) ; end @p procedure get_next; label continue ; var k : integer; begin @ ; get_tail:=new_trans_loc; token := unknown ; cur_code := 0; cur_aux := 0; while token = unknown do begin if loc > buf_len then get_line; oo := zord( buffer[ loc]) ; if ( letter) or ( oo = "%") then @ else if digit then @ else if quote then @ @ else set_code ; end; end; @ @= procedure get_next; forward; @ @= else if ( oo = "{" ) then get_comment else if ( oo = "(" ) and ( buffer[loc+1] = '*' ) then get_comment else if ( oo = ":" ) and ( buffer[loc+1]='=' ) then compress( ":=" ) else if ( oo = "<" ) and ( buffer[loc+1]='=' ) then compress( "<=" ) else if ( oo = "<" ) and ( buffer[loc+1]='>' ) then compress( "<>" ) else if ( oo = ">" ) and ( buffer[loc+1]='=' ) then compress( ">=" ) else if ( oo = "." ) and ( buffer[loc+1]='.' ) then compress( ".." ) else if ( oo = " " ) then begin next; end {ignore spaces } @ Recall that the |input_ln| procedure might have been forced to break a string at a random place, so this code tries to repair the break. Also, we must distinguish between quoted string constants and character constants, because the latter are of ordinal type. @= begin app(verbatim); cur_val:=0; while quote do begin app( oo); next; while not ( quote) do begin incr(cur_val); app( oo); next; if loc>buf_len then begin if buf_len > buf_size then get_line else begin err_print('! String constant didn''t end') ; goto continue; end; end; end; app( oo); next; if quote then incr(cur_val); end; continue: if cur_val = 1 then begin token:= char_const; cur_code:= char_const; cur_val := info( get_tail) ; {the character} end else begin cur_code:=string_const; token:=string_const; end; app(end_verbatim); end @ @= begin app(verbatim); cur_val:= 0 ; cur_code :=int_const; token :=int_const; while ( digit) do begin k := oo -"0"; cur_val:=10*cur_val + k ; app( oo); next; end; @; app(end_verbatim); end @ @< Scan to the end of a real constant... @>= if ( buffer[loc]='.') and ( buffer[loc+1]<>'.') then begin cur_code :="real"; token :="real"; cur_val:=0; app( oo); next; while ( digit) do begin app( oo); next; end; end; if (buffer[loc]='E')or(buffer[loc]='e') then begin cur_code :="real"; token:="real"; cur_val:=0; app( oo); next; if ( buffer[loc]='+') or ( buffer[loc+1]<>'-') then begin app( oo); next; end; while ( digit) do begin app( oo); next; end; end @ Reading a comment. We must deal with comments enclosed in {\tt (*...*)} brackets, also special comments inserted by \.{TANGLE}. These may be split across an end of line. @p procedure get_comment ; label not_found, exit; const long_comment = 7 ; var k,c_leng, value: integer; code : pointer; brace:boolean ; com: packed array [0..max_comment] of byte ; begin @ @ not_found: @; exit: end; @ @= procedure get_comment ; forward; @ Now |loc| is pointing to either an opening brace or the `(' of an `(*'. @= brace := ( oo = "{" ) ; c_leng := 0; com[0] := "{" ; k := 0; code := null ; incr(loc); if not brace then incr(loc); while (( brace and (buffer[loc] <> '}' )) or ((not brace) and ( (buffer[loc] <> '*' ) or (buffer[loc+1] <> ')' )))) do begin if loc > buf_len then get_line else begin incr(c_leng); if c_leng < max_comment then com[c_leng] := zord( buffer[loc] ); incr(loc); end; end; incr(loc); if not brace then incr(loc); if loc > buf_len then get_line; @ @= if ( c_leng > long_comment ) or ( not brace) then goto not_found; if ( com[ 1]="^") and ( c_leng = 1) then begin kludge_flag:=true; goto not_found; end; if ( com[ 1]="+") or ( com[ 1]="-") or ( com[ 1]=":") then k := 2 else if ( com[ c_leng]=":") then k := 1 else goto not_found; value:=0; while (k <= c_leng) and (com[k]>="0") and (com[k]<="9") do begin value:=10*value+com[k]-"0"; incr(k); end; if value>max_halfword then goto not_found; if ( com[k]=":") and ( k=c_leng) then code:=begin_section else if k <> c_leng+1 then goto not_found else if ( com[ 1]="+") then code:=change_weight else if ( com[ 1]="-") then begin code:=change_weight; value := -value; end else code:=end_section; app(code); app(value); goto exit; @ @= app(verbatim); if c_leng >= max_comment then begin c_leng := max_comment ; for k := max_comment-3 to max_comment-1 do com[k] := "." ; end else incr(c_leng) ; com[ c_leng] := "}" ; for k:= 0 to c_leng do app(com[k ]); app(end_verbatim); @ @= begin token := find_id ; cur_code:=id_code( token ); cur_aux:=id_aux(token ); app(" "); if cur_code= "RESERVED" then app(all_caps); app(token); app(" "); end @* Small syntactic units. Now that we have |get_next| to scan tokens, the next job is to parse slightly larger entities. With luck, we'll eventually work up to the point where we'll be ready to parse a whole \PASCAL\ program. The remainder of the \.{PROFILE} program can be viewed as an expansion of the syntax diagrams at the end of the \PASCAL\ manual into a set of subroutines, as if those diagrams were ``flow charts.'' @ The most common error message that occurs in typical top-down parsing routines is the following. @d expected(#)==err_print('! "',#,'" was expected') @ But very often this error appears in the following combination. @d expect( #)== if token = # then get_next else begin print( 'error: "') ; show_id( #) ; print_ln( '" was expected ') ; error; end; @ Here's a little routine that is used when an identifier had better be next: @p procedure skip_to_id; begin if cur_code =0 then begin err_print('! spurious stuff before the identifier is being skipped'); @.spurious stuff...@> repeat get_next; until cur_code<>0; end; end; @ When we need to swallow a semicolon, |get_semi| does the trick. @p procedure get_semi; begin if token<>";" then begin err_print('! spurious stuff before the semicolon is being skipped'); repeat get_next; until token=";"; end; get_next; end; @ Here's a routine that scans past a list of identifiers, separated by commas, and returns a pointer to a list of their locations (in reverse order). @p function get_id_list:pointer; label exit; var p:pointer; {the list} @!q:pointer; {new item to put on it} begin p:=null; loop@+ begin skip_to_id ; q:=get_avail; info(q):=token; link(q):=p; p:=q; get_next; if token<>"," then goto exit; get_next; end; exit:get_id_list:=p; end; @ Now comes |get_int_const|, which scans an integer constant and returns its value. @p function get_int_const:integer; var negative: boolean; begin negative := false ; if token= "+" then get_next ; if token = "-" then begin negative := true; get_next; end; if token<> int_const then begin cur_val := val( cur_aux) ; if ( cur_code<>int_const ) then expected('integer constant'); end; if negative then cur_val:=-cur_val; get_int_const:=cur_val; get_next ; end; @* Starting to scan declarations. The next few sections are devoted to the task of scanning the declarations that form the major part of a block. Whenever an identifier gets defined, its |id_code| says what kind of identifier it is (|var| etc.) and its |id_aux| points to a structure that gives the specific declaration. VMS \PA\ allows us to arrange the declaration sections in a block in any order, and to repeat them. @p procedure do_declarations; var t,p,q : pointer ; {for working lists} begin repeat if token="label" then @ else if token="const" then @ else if token="type" then @ else if token="var" then @ else if (token="procedure" )or(token="function") then get_routine else if ( token = "begin") or ( token = "end" ) then do_nothing else begin expected( 'declaration or statement ') ; get_semi; end; until ( token = "begin") or ( token = "end" ) ; end; @ @= procedure do_declarations; forward; @ The label part is easy. We just copy it across. @= begin repeat get_next; cur_val:=get_int_const; until token<> "," ; expect(";") ; flush_out; end @ A constant declaration like `\&{const} |a=3|' will be recorded as |id_code=int_const| and |id_aux| will point to a |mem| word whose |val| is the constant value. Real constants have |id_code="real"| and |id_aux=null|; string constants have |id_code=string_const| and |id_aux| the string length. \.{PROFILE} doesn't keep track of the precise values of real and string constants. @= begin get_next; out(indent); skip_to_id; while cur_code <> "RESERVED" do begin p:=token; save(p); get_next; expect("=") ; @; get_semi; flush_out; skip_to_id; end; out(outdent); end @ The constant might be an explicit integer or string or real or |char| constant, or a constant identifier. @= begin if ( token = "+" ) or ( token = "-" ) or ( token = int_const ) then begin id_code( p) := int_const ; q := get_avail ; id_aux( p) := q ; val( q) := get_int_const ; end else if ( ( token = char_const ) or ( token = "real" ) or ( token = int_const ) or ( token = string_const )) then begin id_code( p) := token ; id_aux( p) := cur_val ; get_next; end else if ( ( cur_code = char_const ) or ( cur_code = "real" ) or ( cur_code = int_const ) or ( cur_code = string_const )) then begin id_code( p) := cur_code ; id_aux( p) := cur_aux ; get_next; end else expected('constant'); end; @* Representation of types. The rich type structures of \PASCAL\ must be represented inside of \.{PROFILE} since it is necessary to understand all of the identifiers and variables that come along. We represent a type by a memory word whose |info| field identifies a particular sort of type; the |link| field is an auxiliary argument in case the type has further structure. The rules for each type are given below, assuming that |t| points to a memory word representing the type in question. A subrange type of integers |a..b| is represented by |info(t)="ORDINAL"|, |val(info(link(t)))=a|, and |val(link(link(t)))=b|. An enumerated type such as $(\\{red},\\{yellow},\\{blue})$ is effectively replaced by a subrange type $\\{red}\to\\{blue}$, accompanied by the constant definitions $\\{red}=0$, $\\{yellow}=1$, $\\{blue}=2$. The standard simple types |char|, |integer| and |boolean| are treated as |min_char..max_char|, |-max_int..max_int| and |(false,true)|, respectively. Nearly all of the predeclared types will be created by inserting suitable declarations into the |pool| file. There are a few exceptions, which have to be assembled by hand. First, the |real| type, represented by |info(t)="real"| and |link(t)=null|. Second, when we are parsing an expression and something goes wrong, the expression has unknown type. We then say that its type is |"Bad"|. Then |or_typ| is an unspecified ordinal type. @= real_type, bad_type, or_typ, char_const, int_const, string_const : pointer; @ @= real_type := get_avail; info( real_type ) := "real"; id_code( "real") := "type" ; id_aux( "real"):= real_type ; id_code( "SAME") := "type" ; id_aux( "SAME"):= "SAME" ; or_typ := get_avail; info( or_typ ) := "ORDINAL"; link( or_typ ) := null; int_const := or_typ ; char_const := get_avail; info( char_const ) := "ORDINAL"; link( char_const ) := char_const ; string_const := get_avail; info( string_const ) := "Packed_array"; link( string_const ) := char_const ; bad_type := get_avail; info( bad_type ) := "Bad"; link( bad_type ) :=bad_type ; id_code( "Bad") := "type" ; id_aux( "Bad"):= bad_type ; id_code( "open") := undefined ; id_code( "error") := undefined ; @ Structured types are represented by |info(t)="array"|, |"record"|, |"set"|, or |"file"|, or |"Packed_array"| etc. The value of |link(t)| is |null| if the type is not further specified (e.g., if a standard procedure takes any \&{file} as a parameter). Otherwise, in the case of an array, |info(link(t))| points to the index type (which will be a subrange type), and |link(link(t))| points to the component type (which might be another array). A type definition like `\&{type} |b=array[1..3] of char|' will be recorded as |id_code="type"|; |id_aux| will point to the type representation (in this case a node whose |info| is |"array"| and whose |link| points to a node whose |info| points to a type specification of |1..3| and whose link is a pointer to the |char| type. A \&{var} declaration like `\&{var} |c:integer|' will be recorded as |id_code= "var"| and |id_aux= "Int" |. In the case of a record, we retain only the names and types of fields that might be used, not the particular way they might occur as variants. For example, $$\vbox{\halign{#\hfill\cr \&{record} |a:real|;\cr \quad\&{case} |b:boolean| \&{of}\cr \quad\\{false}: (\ignorespaces |c,d:integer|; \&{case} \\{integer} of |0:(e:real)|);\cr \quad\\{true}: (\ignorespaces |f:packed file of set of char|)\cr \quad\&{end}\cr}}$$ will have the same representation as $$\vbox{\halign{#\hfill\cr \&{record} |a:real|; |b:boolean|; |c,d:integer|;\cr \quad |e:real|; |f:packed file of set of char|;\cr \quad\&{end}\cr}}$$ inside of \.{PROFILE}. A list of fields starts at |link(t)|. If |u| points to a node of this field list, |info(info(u))| points to the field name, |link(info(u))| to the field type, and |link(u)| to the next node in the list. For set and file types, |link(t)| points to the base type involved. Thus, for example, a |file of set of char| would have |info(t)="file"|, |info(link(t))="set"|, and |info(link(link(t)))="char"|. The standard file type |text| is considered to yield a |file of char|, as far as \.{PROFILE} is concerned. A more-or-less standard type |string| is introduced to represent `\ignorespaces|packed array[1..n] of char|\unskip' where |n| is unspecified. Finally, a pointer type like `|^|\\{person}' is represented by |info(t)="^"|, with |link(t)| pointing to the identifier, `\\{person}'. Or |link(t)=null|, in the case of the type corresponding to the standard pointer value |nil|. @ Here now is a subroutine that scans what \PASCAL\ calls a ``simple type.'' The |get_s_type| function returns a pointer to the representation of the type that was scanned. After calling |get_s_type| (and several other subroutines below that use |push_level|), it's necessary to say `|app_trans|' to keep from losing the translation list that was formed, because the translation has been moved out of the |head/tail| list. @p function get_s_type:pointer; var p:pointer; {the resulting type} @!q:pointer; {temporary for list manipulation} @!m,@!n:integer; {subrange boundaries} begin push_level; if cur_code="type" then {a type identifier} begin p:=cur_aux; get_next; end else begin if token="(" then @ else @; @; end; pop_level; get_s_type:=p; end; @ @= begin m:=0; n:=-1; repeat get_next; skip_to_id;@/ incr(n); save(token); id_code(token):=int_const; id_aux(token):=n; get_next; until token<> "," ; expect(")") ; end @ @= begin m:=get_int_const; expect(".."); n:=get_int_const; end @ @= begin p:=get_avail; val(p):=m; q:=get_avail; info(q):=p; p:=get_avail; val(p):=n; link(q):=p; p:=get_avail; info(p):= "ORDINAL" ; link(p):=q; end @ VMS \PA\ allows ``varying'' arrays. The syntax is: |varying[ nn ] of (type)| which is represented by the structure: "VARY" -- ^i -- ^basetype where |i| points to the subrange type |1..nn| as if this were a fixed array. This function is called when we have just seen the key word ``|varying|''. @p function scan_vary_type: pointer; var lo, hi, p, q, base :pointer ; begin get_next; expect( "[") ; lo := get_avail ; val( lo) := 1 ; hi := get_avail ; val( hi) := get_int_const ; expect( "]") ; expect( "of") ; base := get_type ; prepend( lo)( hi) ; prepend( hi)( base) ; prepend( "VARY")( base) ; scan_vary_type := base ; end ; @ The next procedure is similar, but it gets a general type. Here we introduce one of the eccentricities of VMS \PA, called an ``attribute list''. This is a list of identifiers in square brackets. We will just ignore it. @d skip_attributes == if token = "[" then begin repeat get_next until token = "]" ; get_next; end @= function get_field_list:pointer; forward; function get_type:pointer; forward; @ @p function get_type ; label done; var loose: boolean ; {false if \&{packed}} @!p,@!q:pointer; {for list manipulation} begin skip_attributes ; push_level; if token <> "packed" then loose := true else begin loose := false ; get_next; end; if token = "^" then @ else if token = "array" then @ else if token = "varying" then p := scan_vary_type else if token = "file" then @ else if token = "set" then @ else if token = "record" then @ else begin p:=get_s_type; app_trans; end; pop_level; get_type:=p; end; @ @= begin get_next; skip_to_id; p:=get_avail; info(p):= "^" ; link(p):=token; get_next; end @ @= begin get_next; p:=get_avail; if loose then info(p):= "array" else info(p) := "Packed_Array" ; q:=get_avail; link(p):=q; expect("[") ; loop@+ begin info(q):=get_s_type; app_trans; if token<> "," then goto done; get_next; link(q):=get_avail; q:=link(q); if loose then info(q):= "array" else info(q) := "Packed_Array" ; link(q):=get_avail; q:=link(q); end; done: expect("]") ; expect("of") ; link(q):=get_type; app_trans; end @ @= begin get_next; p:=get_avail; if loose then info(p):= "file" else info(p) := "Packed_File" ; expect("of") ; link(p):=get_type; app_trans; end @ @= begin get_next; p:=get_avail; if loose then info(p):= "set" else info(p) := "Packed_Set" ; expect("of") ; link(p):=get_s_type; app_trans; end @ @= begin get_next; p:=get_avail; if loose then info(p):= "record" else info(p) := "Packed_Record" ; link(p):=get_field_list; expect("end") ; end @ The |get_field_list| routine returns a pointer to a field list in the form required by a record type. @p function get_field_list; {declared |forward| above} label continue,done,done1,done2; var p,@!q:pointer; {for list manipulation} @!h,@!t:pointer; {head and tail of list} begin h:=get_avail; t:=h; continue:if token="case" then @ else begin if (cur_code <> "RESERVED" )and(token<>"end") then @; if token=";" then begin get_next; goto continue; end; end; step_wipe(h); get_field_list:=h; end; @ @= begin q:=t; loop@+ begin p:=get_avail; link(t):=p; t:=p; p:=get_avail; info(t):=p; info(p):=token; get_next; if token<> "," then goto done; get_next; skip_to_id; end; done: expect(":") ; p:=get_type; app_trans; while q<>t do begin q:=link(q); link(info(q)):=p; end; end @ @= begin get_next; skip_to_id; p:=token; get_next; if token= ":" then begin get_next; if cur_code <> "type" then expected('type identifier') else begin q:=get_avail; link(t):=q; t:=q; q:=get_avail; info(t):=q; info(q):=p; link(q):=cur_aux; get_next; end; end; expect("of") ; @; end @ @= loop@+ begin if ( ( token = char_const ) or ( token = int_const ) or ( cur_code = char_const ) or ( cur_code = int_const )) then begin @; expect("(") ; link(t):=get_field_list; while link(t)<>null do t:=link(t); expect(")") ; end; if token<>";" then goto done2; get_next; end; done2: @ The following code is used twice; label |done1| needs to be declared in both contexts. VMS: the label |otherwise| is not followed by a colon. @= begin loop begin if ( token = char_const ) or ( token = int_const ) then cur_val:=get_int_const else get_next; if token<> "," then goto done1; get_next; end; done1: if token <> "otherwise" then begin expect(":") ; end; end @ Declaring types and variables. So now we consider actual |type| declarations. These are not too bad, since |get_type| does the hard work. @= begin get_next; out(indent); skip_to_id; while cur_code <> "RESERVED" do begin p:=token; save(p); get_next; expect("=") ; id_code(p):= "type"; id_aux(p):=get_type; app_trans; get_semi; flush_out; skip_to_id; end; out(outdent); end @ Vars add one more twist. @= begin get_next; out(indent); skip_to_id; while cur_code <> "RESERVED" do begin p:=get_id_list; expect(":") ; t:=get_type; app_trans; while p<>null do begin q:=info(p); save(q); id_code(q):= "var"; id_aux(q):=t;@/ step_wipe(p); end; get_semi; flush_out; skip_to_id; end; out(outdent); end @* Declaring subroutines. A procedure declaration causes |id_code| to be | "procedure" |, and |id_aux| will point to a list of nodes that specify the parameters. A function declaration is similar, but |id_aux| points to a word whose |info| points to the result type and whose |link| points to the parameter specification list. For each node |q| on the parameter specification list, |info(q)| points to a node whose |info| and |link| fields correspond to the |id_code| and |id_aux| of the parameter itself. So a function declaration is: Name -- "function" -- ^result -- ^P1 -- ^P2 -- \dots -- ^Pn null and for each parameter P1, we have: P1 -- mechanism -- type This is best explained by an example. Consider the procedure heading $$\hbox{\&{procedure} |b(function f(x:char):boolean; x,y:integer; var@?z:real)|.}$$ This defines the identifier |b|, and (one level deeper) also |f|, |x|, |y|, and |z|. The |x| inside |f| is a dummy name of no importance. The |id_code| for |f| will be | "function" |; and its |id_aux| will point to~|p|, where |info(p)= "Bool" | and |link(p)=q|, and where |info(q)=r| and |link(q)=null|, and where |info(r)= "var"| and |link(r)= "char" |. The |id_code| for |x| and |y| will be | "var"| and the |id_aux| will be | "Int" |. The |id_code| for |z| will be | "Var_Param" | and the |id_aux| will be | "real" |. Finally, the |id_code| for |b| will be | "procedure" |, and its |id_aux| will point to a list of four items, pointing to nodes that copy the |equiv| table entries of |f|, |x|, |y|, and~|z|. If we had a more complex definition like $$\hbox{\&{procedure} \\{big}(\&{procedure} $b$ [as above])}$$ the data structure for |b| would be the same, but no declarations of |f|, |x|, |y|, or~|z| would be made. Standard procedures like `|read|' can have two other sorts of parameters not available to ordinary programmers. If the |info| field of an argument is |"OPT_file"|, it means that a default file variable will be inserted as an argument if the procedure call does not have a file argument in the current position; the |link| field contains the runtime estimate. If the |info| field is |"SPECIAL"|, the argument is allowed to be of any type; in this case the |link| field points to a list of extra runtime weights that apply if the actual parameter is of type |char|, |integer|, |real|, \&{array}, or other, respectively. (These weights include the |store_cost|; and if the first parameter is a |"SPECIAL"|, the weight also includes the calling overhead.) A |"SPECIAL"| is optional; if |info(info(p))="SPECIAL"| in a procedure's argument list, then |link(p)=p|; i.e., the argument list is arbitrarily long. If the result type of a function is |null|, the function returns the type of its last argument. Here are the standard functions of \PASCAL, and a few nonstandard ones needed for system programming (available in Hedrick's \PASCAL\ system). We set things up so that, e.g., |abs|, looks like a function call (even though the compiler expands it in-line). Before we tackle a whole procedure, it will be helpful to get a subroutine out of the way. The following routine scans a parameter list, optionally declaring the identifiers found inside, and returns a pointer to the representation of this parameter list. @p function get_parameter_list(@!decl_params:boolean):pointer; var @!mech, {passing mechanism ( "var", "function" etc.)} @!sub_pars, {parameter list of nested function} @!fin_typ:halfword; {type to the right of a colon} @!names, @!par_list, @!con_array, @!con_index, @!con_typ, p, q: pointer; lo, hi, nn : id_pointer ; begin new_list( par_list) ; if token= "(" then begin repeat get_next; if (token= "procedure")or(token= "function") then @ else @; @; until token<>";"; expect(")") ; end; finish( par_list) ; get_parameter_list:= par_list; end; @ One of the eccentricities of VMS \PA\ is ``foreign mechanism specifiers''. @= begin if token= "var" then begin mech:= "Var_Param" ; get_next; end else if ( token= "%ref" ) or ( token= "%immed" ) or ( token= "%descr" ) then begin mech := "var"; get_next; end else mech := "var"; names:=get_id_list; end @ @= begin mech:=token ; get_next; out(initial_cap); names:=get_id_list; sub_pars:=get_parameter_list(false); end @ This section reads in the rest of the current set of parameters. VMS \PA\ allows formal parameters to have default values. @= if mech<> "procedure" then begin expect(":") ; skip_attributes ; if ( ( token = "OPT_file" ) or ( token = "SPECIAL" ) or (token= "file") or ( token = "SAME" ) )then begin fin_typ := token; get_next; end else if cur_code = "type" then begin fin_typ:=cur_aux; get_next; end else if (token = "packed") or (token = "array") then begin @ end else if (token = "varying") then fin_typ := scan_vary_type else begin fin_typ:="Bad"; expected('type identifier'); end; if token = ":=" then repeat get_next until ( token = ")") or ( token = ";") ; end; @ Now assemble a set of pointers that represents the parameter list. @= if mech= "function" then begin p:= sub_pars; prepend( fin_typ)( p) ; end else if mech= "procedure" then p := sub_pars else p:= fin_typ; q:=p ; prepend( mech)( p) ; while names<>null do begin nn:= info(names); if decl_params then begin save( nn); id_code( nn):=mech; id_aux( nn):= q ; end; append( par_list)( p) ; step_wipe( names) ; end; @ Recall that ordinary arrays are represented by the structure "array" -- index -- base type. A many-dimensional array is represented as "array" -- index -- "array" -- index....-- base type. A conformant index has a declaration like |"a:array[low..high:index]of base_type|". We will in effect replace this with: |low,high:index; a:array[index]of base_type|. First scan the index part; we add Pn -- "CONFORM" -- ^index to the parameter list. declare |id1| and |id2| as |with unspecified values, and the array itself as if it were |"array [type-id]"|. First set up for the parameters... @= new_list( con_array) ; while (token = "packed") or (token = "array") do begin if ( token = "packed") then begin get_next; con_typ := "Packed_Array" ; end else con_typ := "array" ; get_next; expect ( "[") ; @ Now read a conformant subrange and attach it to |par_list|... @= repeat lo := token ; get_next; expect ( "..") ; hi := token ; get_next; expect( ":") ; con_index := get_s_type ; app_trans ; p := con_index ; prepend( "CONFORM")( p) ; append(par_list)( p) ; append(par_list)( p) ; append( con_array)( con_typ); append( con_array)( con_index) ; @ Declare the index bounds if we are declaring parameters... @= if decl_params then begin save( lo) ; id_code( lo) := "var" ; id_aux( lo) := con_index ; save( hi) ; id_code( hi) := "var" ; id_aux( hi) := con_index ; end; if token = ( ";") then get_next ; until (token = "]") ; @ Now finally tack the base type on the end. @= get_next ; expect ( "of") ; end; if cur_code = "type" then begin link( info( con_array)) := cur_aux ; get_next; end else expected('type identifier'); step_wipe( con_array); fin_typ := con_array ; @ And now comes the ``fun'' part. @= procedure get_routine; forward; @ @p procedure get_routine ; var p,z,q,s,t: pointer; n :integer; name, mm:id_pointer; begin t:=token ; out(indent); out(backspace); get_next; skip_to_id; name:=token; save(name); push_save_stack; out(initial_cap); kludge_flag:=false; get_next; @; get_semi; flush_out; skip_to_id; if (token = "extern" ) then get_next else if (token = "forward" ) then @ else begin @ get_block ; @ end; get_semi; flush_out; unsave; out(outdent); cur_freq:=0; end; @ In general, the procedure header will be the next part of the input. The exception occurs if it was declared |forward| within the current block. A declaration in an outer block does not count; and such a declaration will have the wrong value of |save_base|. @= if ( id_code( name) <> "forward" ) or ( info( id_aux( name)) <> save_base) then begin id_code(name):=t; p:=get_parameter_list(true); if t= "function" then begin expect(":") ; skip_attributes ; q:=get_avail; link(q):=p; p:=q; if cur_code= "type" then begin info(p):=cur_aux; get_next; end else begin expected('type identifier'); while token<>";" do get_next; info(p):=null; end; end else if kludge_flag then id_code(name):= "Terminal_Procedure" ; id_aux(name):=p; end else @ Now consider what happens when we read a "forward". Then all the parameter declarations must be taken off the stack and preserved somewhere, to be re-declared once we meet the procedure body. At present the function's data records only the parameter types and not their names, which are lying about on the stack. So we must first rescue the names and attach a list of them to the function data. If the previous declaration was: name -- "function" -- (etc) the new declaration wil be: name -- "forward" -- s -- ^p -- "function" -- (etc as before) where |s| records the value of |save_base| and |p| points to the list of parameter names. @= begin p:=get_avail; q:=p ; n := save_base + 1 ; while n < save_ptr do begin link( q) := get_avail; q := link( q) ; s := get_avail; info( q) := s ; val( s) := save_stack[ n ].int ; n:= n + 2 ; end; q := get_avail; info( p) := q ; info( q) := id_code( name) ; link( q) := id_aux( name) ; q:=get_avail; link( q) := p; info( q) := save_base; id_code( name) := "forward" ; id_aux( name) := q ; get_next; end @ So when we read the body of a |forward| function, we replace everything in its expected place. @= begin t := id_aux( name) ; step_wipe( t) ; s := info( t) ; id_code( name) := info( s) ; id_aux( name) := link( s) ; step_wipe( s) ; if id_code( name) = "function" then s := link( s) ; step_wipe( t) ; {|t| now points to the list of parameter names, and |s| to the list of types} while t <> null do begin p := info( s) ; mm := val( info( t)) ; save( mm) ; if ( info( p) <> "CONFORM" ) then id_code( mm) := info( p) else id_code( mm) := "var" ; id_aux( mm) := link( p) ; step_wipe( t) ; s:= link( s) ; end; end; @* Expressions. The next few subroutines are the top-down parsers for \PASCAL\ expressions. The main routine is called |get_exp|; it returns it result in the |trans_head/trans_tail| translation list, and it also makes |cur_type| point to the type of expression. Meanwhile the assumed run-time cost of evaluating the expression is also taken into account. @= @!cur_type: halfword; {the type of expression found by |get_exp| and its friends} @!cur_length:integer; {the length of string, if |cur_type=string|} @ But before we get into |get_exp|, we need a simpler routine, |get_variable|. @p procedure get_variable; label continue,found; var p,@!q:pointer; {for link manipulation} begin push_level; if (cur_code= "var") or (cur_code= "Var_Param" ) or (cur_code= "%immed") or (cur_code= "%ref" ) or (cur_code= "%descr" ) then begin if cur_code= "Var_Param" then add_weight(var_surcharge); cur_type:=cur_aux; get_next; end else begin expected('variable identifier'); get_next; cur_type:= bad_type ; end; continue:if token= "[" then @ else if token= "." then @ else if token= "^" then @; pop_level; end; @ The VMS |varying| structure is a real weirdo. The documents describe it as equivalent to a |record| with fields |length| and |body|; the |body| is a packed array of the base type. But the components are addressed as |ss[i]|, which is illogical as it ought to be |ss.body[i]|. @= begin repeat get_next; if ( info(cur_type)= "Packed_Array") or ( info(cur_type)= "VARY") then add_weight(packed_index_cost) else begin add_weight(index_cost); if info(cur_type)<> "array" then begin err_print('! subscript on non-array'); cur_type:= bad_type ; @.subscript on non-array@> end; end; p:=link(cur_type); {now |info(p)| is index type, |link(p)| is entry type} get_exp; app_trans; {|cur_type| should now be compatible with |info(p)|, but \.{PROFILE} doesn''t bother to check} cur_type:=link(p); until token<> ","; expect("]") ; goto continue; end @ So we must assemble a bogus type for the |body| of a |varying| type. @= begin get_next; skip_to_id; if info(cur_type)= "Packed_Record" then add_weight(packed_surcharge) else if info(cur_type) = "VARY" then begin if token = "length" then begin cur_type := info( link( cur_type)); goto found ; end else if token = "body" then begin cur_type := link( link( cur_type)); prepend( "Packed_array")( cur_type); goto found ; end else begin err_print('! unknown field'); cur_type:= bad_type ; end; end @ @= else if info(cur_type)<> "record" then begin err_print('! field on nonrecord'); @.field on nonrecord@> cur_type:= bad_type ; goto found; end; p:=link(cur_type); while p<>null do begin q:=info(p); if info(q)=token then begin cur_type:=link(q); goto found; end; p:=link(p); end; err_print('! unknown field'); cur_type:= bad_type ; @.unknown field@> found:get_next; goto continue; end @ @= begin get_next; if info(cur_type)= "file" then cur_type:=link(cur_type) else if info(cur_type)= "Packed_File" then begin add_weight(packed_surcharge); cur_type:=link(cur_type); end else if info(cur_type)= "^" then begin add_weight(point_cost); p:=link(cur_type); if id_code(p)<> "type" then begin err_print('! pointer type never defined'); @.pointer type never defined@> cur_type:= bad_type ; end else cur_type:=id_aux(p); end else err_print('! extra uparrow '); @.extra uparrow@> goto continue; end @ It's time now to face the messy details of procedure and function calls, where standard procedures have nonstandard syntax. VMS: the |open| statement has such a horribly nonstandard syntax that I cannot support it. I will merely scan past it. @p procedure get_call; label exit, done, continue; var t:pointer; {final type, if known} @! bal: 0..100; {Bracket balance for |open| kludge} @!p:pointer; {runs through the parameter list} @!q:pointer; {for list manipulation} @!r:pointer; {type of argument} @!c:halfword; {type code of argument} @!ll:halfword; {length of a string} begin @ @ out(initial_cap); get_next; add_weight(call_overhead); if token= "(" then @; if p<>null then begin q:= link( info(p)) ; if q <> "SPECIAL" then err_print('! argument(s) expected'); @.argument(s) expected@> end; if ( t<>null) and ( t <> "SAME" ) then cur_type:=t; exit: end; @ @= p := null ; t := null ; if cur_code = "procedure" then begin p:=cur_aux; t:=null; end else if cur_code = "function" then begin p:=link(cur_aux); t:=info(cur_aux); end else if cur_code = "forward" then begin p:=info( link( cur_aux)) ; t:=info( p); p:=link( p) ; if t = "procedure" then t:=null else begin t:= info( p) ; p:= link( p); end end else expected( 'procedure or function name'); @ One of the subtle things here is that our \PASCAL\ allows empty arguments to standard functions like |call_i|. @= begin loop begin get_next; @ if (token= ",")or(token= ")" ) then cur_type:= or_typ else begin get_exp; app_trans; end; continue:if p=null then begin err_print('! extra argument(s)'); p:= bad_type; @.extra argument(s)@> end; q:= info( p) ; {The current (formal) parameter} if info( q)= "CONFORM" then begin p:=link(p); p:=link(p); goto continue; end ; if link( q) = "OPT_file" then begin p:=link(p); if (info(cur_type)<> "file") and (info(cur_type)<> "Packed_File") then goto continue; end else if link( q) = "SPECIAL" then @ else @; if token<> "," then goto done; end; done: expect(")") ; end @ Here are two horrible kludges, my apologies: @= if token = "open" then begin get_next; expect("(") ; bal := 1 ; repeat get_next; if token = "(" then incr( bal) ; if token = ")" then decr( bal) ; until bal = 0 ; get_next; add_weight(open_cost); goto exit; end; @ @= if token = "error" then begin get_next; get_next; get_next; if token = ")" then goto done ; end ; @ The program could check for type conflicts between argument and parameter; but it doesn't. @= begin if ( info(cur_type)= "ORDINAL") and ( info(link(q))= "real") then add_weight(int_real_cost); add_weight(store_cost); p:=link(p); end @ @= begin r:=cur_type; ll := cur_length ; n:=0; while token= ":" do begin get_next; incr(n); get_exp; app_trans; add_weight(store_cost); end; if ( n=2) and ( info(r)= "ORDINAL" ) then begin add_weight(int_real_cost); r:= real_type ; end else if ( r= string_const) or ( r= char_const) then add_weight(string_string_cost+ ll*string_string_tax) else begin c:=info(r); if c = "ORDINAL" then add_weight( int_string_cost) else if c = "real" then add_weight( real_string_cost) else if c = "array" then add_weight( array_string_cost) end; end @ The next step up from a variable is a ``factor.'' @p procedure get_factor; label done; var p: pointer; {list manipulation} begin push_level; if cur_code = "real" then begin get_next; cur_type := real_type ; add_weight(fetch_cost); end else if cur_code = int_const then begin get_next; cur_type := int_const ; add_weight(fetch_cost); end else if cur_code = string_const then begin cur_type := string_const ; cur_length :=cur_aux; get_next; add_weight(fetch_cost); end else if cur_code = char_const then begin cur_type := char_const ; get_next; add_weight(fetch_cost); end else if ( cur_code = "extern" ) or ( cur_code = "forward" ) or ( cur_code = "function" ) then get_call else if token = "(" then begin get_next; get_exp; app_trans; expect(")") ; end else if token = "not" then begin cur_type := or_typ ; get_next; add_weight(unary_cost); get_factor; app_trans; end else if token = "[" then @ else begin get_variable; app_trans; add_weight(fetch_cost); end ; pop_level; end; @ @= begin get_next; add_weight(set_cost); cur_type:=null; loop@+ begin if token= "]" then goto done; get_exp; app_trans; add_weight(store_cost); if token= ".." then begin get_next; get_exp; app_trans; add_weight(store_cost); end; if token= "," then get_next else if token<> "]" then begin expected(']'); goto done; end; end; done:p:=get_avail; info(p):= "set" ; link(p):=cur_type; cur_type:=p; end @ Factors make terms. @p procedure get_term; label exit; var t: pointer; {type of the first operand} begin get_factor; app_trans; loop@+ begin t: =cur_type; if token = "*" then @ else if token = "/" then @ else if (token = "div") or (token = "mod" ) then @ else if token = "and" then @ else return end; exit: end; @ @= begin get_next; get_factor; app_trans; if info(t)= "ORDINAL" then if info(cur_type)= "ORDINAL" then add_weight(mult_cost) else if info(cur_type)= "real" then add_weight(int_real_cost+real_mult_cost) else incomp else if info(t)= "real" then if info(cur_type)= "real" then add_weight(real_mult_cost) else if info(cur_type)= "ORDINAL" then begin add_weight(int_real_cost+real_mult_cost); cur_type:= real_type ; end else incomp else if (info(t)= "set" )and(info(cur_type)= "set" ) then add_weight(set_cost) else incomp; end @ @= begin get_next; get_factor; app_trans; if info(t)= "ORDINAL" then add_weight(int_real_cost) else if info(t)<> "real" then incomp; if info(cur_type)= "ORDINAL" then add_weight(int_real_cost) else if info(cur_type)<> "real" then incomp; cur_type:= real_type; add_weight(real_div_cost); end @ @= begin get_next; get_factor; app_trans; if (info(t)<> "ORDINAL" )or(info(cur_type)<> "ORDINAL" ) then incomp; add_weight(div_cost); end @ @= begin get_next; get_factor; app_trans; if (info(t)<> "ORDINAL" )or(info(cur_type)<> "ORDINAL" ) then incomp; add_weight(and_or_cost); end @ Terms make simple expressions. @p procedure get_s_exp; label exit; var t:pointer; {type of first operand} begin if ( token= "+") or ( token= "-") then begin if ( token= "-") then add_weight(unary_cost); get_next; end; get_term; loop@+ begin t:=cur_type; if ( token= "+") or ( token= "-") then @ else if token= "or" then @ else return; end; exit: end; @ @= begin get_next; get_term; if info(t)= "ORDINAL" then if info(cur_type)= "ORDINAL" then add_weight(add_cost) else if info(cur_type)= "real" then add_weight(int_real_cost+real_add_cost) else incomp else if info(t)= "real" then if info(cur_type)= "real" then add_weight(real_add_cost) else if info(cur_type)= "ORDINAL" then begin add_weight(int_real_cost+real_add_cost); cur_type:= real_type ; end else incomp else if (info(t)= "set" )and(info(cur_type)= "set" ) then add_weight(set_cost) else incomp; end @ @= begin get_next; get_term; if (info(t)<> "ORDINAL" )or(info(cur_type)<> "ORDINAL" ) then incomp; add_weight(and_or_cost); end @ Finally, simple expression lead us all the way up to expressions. (Once again, \.{PROFILE} does not bother to verify the validity of types; but the information is present.) @= procedure get_exp; forward; @ @p procedure get_exp; var t:halfword; {code of left operand} begin push_level; get_s_exp; if (token= "in") or (token= ">" ) or (token= ">=") or (token= "<>" ) or (token= "<" ) or (token= "<=" ) or (token= "=" ) then begin t:=token; get_next; get_s_exp; if t= "in" then add_weight (in_cost) else add_weight ( compare_cost) ; cur_type:= or_typ ; end; pop_level; end; @* Declarations. Our next task is to parse what \PASCAL\ syntax calls a ``block.'' The |get_block| routine declares all variables of the block; it has a lot to do. @ All right, let's |get_block|. @= procedure get_block; forward; @ @p procedure get_block; label exit ; begin do_declarations; if token<> "begin" then expected('begin') ; cur_freq:=get_count; {the number of times this block is performed} get_statement; exit: end; @* Statements. The scanning process reaches its glorious heights in the climactic |get_statement| procedure. This subroutine sets the global variable |next_freq| to the value that |cur_freq| should have if another statement follows. @= @!next_freq:integer; {the flow that emanates from the statement just got} @ Basically, |get_statement| is a multiway switch between ten different kinds of \PASCAL\ statements. @= procedure get_statement; forward; @ @p procedure get_statement; label done1,done2,done3; {lots of things get done} var in_freq,@!out_freq:integer; {frequencies before and after} tt, ff, nam : pointer ; {used for |with| statement} @!w:integer; {weight of |while| clause} @!lhs_type:pointer; {type of left-hand side of assignment} @!t: halfword ; {type of right-hand side type} @!n:integer; {number of items in assignment} begin while token=int_const do @; in_freq:=cur_freq; next_freq:=cur_freq; if token = "begin" then @ else if token = "if" then @ else if token = "case" then @ else if token = "while" then @ else if token = "repeat" then @ else if token = "for" then @ else if token = "with" then @ else if token = "goto" then @ else if ( cur_code = "var" ) or( cur_code = "Var_Param" ) or( cur_code = "function" ) then @ else if( cur_code = "extern" ) or( cur_code = "forward" ) or( cur_code = "procedure" ) then get_call else if( cur_code = "Terminal_Procedure" ) then begin get_call; next_freq :=0; end else do_nothing {we just saw an empty statement} end; @ @= begin push_save_stack; get_next; repeat @ until token = "do" ; get_next; flush_out ; out(indent); get_statement; out(outdent); unsave; end @ @= get_variable; tt := cur_type ; app_trans ; if ( info( tt) = "record" ) or ( info( tt) = "Packed_Record" ) then @ else expected( 'variable of record type') ; if token = "," then get_next else if token <> "do" then begin expected( 'do'); err_print ( ' DO forcibly inserted here') ; token := "do" ; end; @ @= begin if info( tt) = "Packed_Record" then add_weight( packed_surcharge); ff := link ( tt) ; {the current field} repeat nam := info( info( ff)); {its name} save( nam) ; id_code( nam) := "var"; id_aux( nam) := link( info( ff)) ; ff := link( ff) ; until ff = null; end @ @= begin out(backspace); get_next; expect(":") ; cur_freq:=get_count; end @ @= begin get_next; if token<>int_const then expected('label')@+else get_next; add_weight(jump_cost); next_freq:=0; end @ @= begin get_next; out(indent); repeat get_statement; if token <> "end" then begin get_semi; flush_out; cur_freq:=next_freq; end; until token= "end" ; flush_out; cur_freq:=next_freq; out(outdent); get_next; end @ @= begin get_next; add_weight(if_cost); get_exp; app_trans; expect("then") ; flush_out; cur_freq:=get_count; in_freq:=in_freq-cur_freq; next_freq:=cur_freq; if token<> "end" then if token<>";" then if token<> "else" then if token<> "until" then {controlled statement is nonempty} begin out(indent); get_statement; out(outdent); end; if token= "else" then begin out_freq:=get_count; {this count is supposedly redundant} if in_freq<>out_freq then @ else flush_out; get_next; out(indent); cur_freq:=out_freq; out_freq:=next_freq; get_statement; next_freq:=next_freq+out_freq; out(outdent); end else next_freq:=in_freq+next_freq; end @ Frequency counts can be ``off'' for a variety of reasons (e.g., when a procedure sometimes doesn't return, or when a user aborts the program). Such discrepancies are indicated by a `\.{\{LOST..\}}' comment line. @= begin flush_out; out(lost_comment); cur_freq:=in_freq-out_freq; flush_out; err_print( ' LOST: ', cur_freq:1, ' at output line ' , out_line:1 ); end @ @= begin get_next; add_weight(case_cost); get_exp; app_trans; expect("of") ; flush_out; out(indent); out_freq:=0; loop@+ begin if token = int_const then begin out(backspace); @; cur_freq:=get_count; in_freq:=in_freq-cur_freq; get_statement; out_freq:=out_freq+next_freq; end; if token<>";" then goto done2; get_next; flush_out; end; done2: flush_out; out(outdent); expect("end") ; cur_freq:=in_freq; next_freq:=out_freq; if cur_freq<>0 then out(lost_comment); end @ The calculation of |next_freq| after a repeat statement is slightly tricky because of the possibility of |goto| statements leading out of the loop. @= begin get_next; out(indent); cur_freq:=get_count; out_freq:=cur_freq; loop@+ begin get_statement; if token= "until" then goto done3; get_semi; flush_out; cur_freq:=next_freq; end; done3:flush_out; cur_freq:=next_freq; out(outdent); get_next; add_weight(repeat_tax); get_exp; app_trans; next_freq:=cur_freq-out_freq+in_freq; end @ The fact that \&{goto}'s can affect the frequencies means that we don't know how often the test of a |while| or |for| loop is performed until after the loop has been entirely scanned. \.{PROFILE} therefore outputs a special line that says `\.{\{WHILE..\}}' or `\.{\{FOR..\}}', after the necessary information has been gathered. These special comments are treated as identifiers, for simplicity. @= @!while_comment,@!lost_comment,@!for_comment: id_pointer; {locations of special comments} @ @= while_comment:= "{WHILE...}" ; lost_comment := "{LOST...}" ; for_comment:= "{FOR...}" ; @ @= begin get_next; add_weight(for_cost); get_variable; app_trans; expect(":=") ; get_exp; app_trans; if ( token<>"to" ) and ( token<>"downto" ) then expected('to or downto')@+else get_next; get_exp; app_trans; expect("do") ; flush_out; out(indent); cur_freq:=get_count; out_freq:=cur_freq; get_statement; out(outdent); flush_out; out(for_comment); cur_freq:=next_freq; add_weight(for_tax); next_freq:=next_freq-out_freq+in_freq; end @ @= begin get_next; add_weight(while_cost); get_exp; app_trans; expect("do") ; w:=out_wt; flush_out; out(indent); cur_freq:=get_count; out_freq:=cur_freq; get_statement; out(outdent); flush_out; out(while_comment); cur_freq:=next_freq; add_weight(w+while_tax); next_freq:=next_freq-out_freq+in_freq; end @ An assignment of an array to an array is treated here as an assignment of a simple variable to a simple variable, since the author didn't want to bother to compute the size of the array. If the user actually uses such assignments, an appropriate `\.{\{+v\}}' comment should be given so that the statement is weighted properly. @d incomp==err_print('! incompatible types') @.incompatible types@> @= begin if cur_code = "function" then begin lhs_type:=info(cur_aux); out(initial_cap); get_next; end else begin get_variable; app_trans; lhs_type:=cur_type; end; expect(":=") ; get_exp; app_trans; add_weight(store_cost); t:=info(cur_type); if info(lhs_type) = "real" then begin if t="ORDINAL" then add_weight(int_real_cost) else if t<> "real" then incomp; end else if ( info(lhs_type) = "Packed_Array") or ( info(lhs_type) = "record" ) then begin if(t<>info(lhs_type))or ((link(cur_type)<>link(lhs_type))and(cur_type<> string_const )) then incomp; if t= "Packed_Array" then begin if cur_type= string_const then n:=cur_length else n:=1; end else begin n:=0; lhs_type:=link(lhs_type); while lhs_type<>null do begin incr(n); lhs_type:=link(lhs_type); end; end; add_weight((n-1)*(fetch_cost+store_cost)); end else if t<>info(lhs_type) then incomp ; end @* The program. Now there's only one thing left to do, namely to parse an entire \PASCAL\ program (meanwhile doing all the profiling). This is where \.{PROFILE} itself begins and ends. @p begin @ ; get_next; {get the first token} expect("program") ; out(initial_cap); skip_to_id; get_next; if token= "(" then begin get_next; p:=get_id_list; expect(")") ; end; get_semi; flush_out;@/ get_block; {this is where most of the work is done} if token <> "." then expected( 'full stop '); out(".");flush_out; {if all went well, that's the `\&{end.}' of the program} @; @; {here files should be closed if the operating system requires it} @; end. @ @=p: pointer; @* Initialising the table of reserved words. This will be done by using the pool file mechanism of \.{WEB}. This has been described at length in various places, notably the \TeX book ( Vol. 3), so here is only a summary. When \.{TANGLE} reads this source file, it converts any string enclosed in double quotes into a number, and writes that number into the corresponding place in the \PA\ output. It also generates a ``pool'' file containing the original strings. What we now have to do is to read the pool file for this program and load all these strings into memory. After we have read in all the reserved words, we must make some adjustments, beacuse there are some words that are not really reserved words bur which had to be mentioned above. We will then alter their |id_codes| so they can be properly redeclared. @.WEB@>@.\TeX book@> @= begin in_pascal := false ; get_set ; reserving := true ; repeat @ until token = max_special ; @ reserving := false ; get_next ; do_declarations ; @ end; @ Some of the POOL file lines will be reserved words. @= begin get_line ; token := find_id ; id_code( token ) := "RESERVED" ; id_aux( token ) := 0 ; end @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the end of this \.{NPROFILE.POO} file; any other value means that the wrong pool file has been loaded. @^check sum@> @= begin if not input_ln(pool_file) then fatal_error('Premature end of POOL file') ; if buffer[1] <> '*' then fatal_error ( 'POOL file didnt have a checksum') ; @.POOL file errors@> a:=0; for k := 2 to 10 do begin oo := zord( buffer[ k]) ; if not digit then fatal_error ('! POOL file check sum doesn''t have nine digits.'); a:=10*a+ oo -"0"; end; if a <> @$ then fatal_error('! POOL file check sum doesn''t match; TANGLE me again.'); in_pascal := true ; get_set ; close( pool_file) ; end @ @= a: integer ; {POOL file checksum} reserving, in_pascal, pool_ok : boolean; k: integer ; @* Pre-declared identifiers. The purpose of this section is to give a straightforward mechanism for constructing the pre-declared identifiers that \NP\ will be able to recognise. The distinction between a ``reserved word'' and a ``pre-declared identifier'' is that the ``reserved words'' govern the parsing process. But they are both constructed by using the Pool file mechanism of \WB, and I want to force them to occupy disjoint portions of that file. Therefore this section must come nearly at the end of \NP.\WB, i.e. after all the reserved words have appeared. First, some jiggery-pokery to get things going: @d declare( #) == {Nothing!!} @= max_special = "MM" ; @ Next, here are some sample definitions: @p declare("const true=1;false=0;maxint=2147483647;minint=-maxint;")@/ declare("minchar=0;maxchar=255;")@/ declare("type boolean=false..true;integer=minint..maxint;")@/ declare("char=minchar..maxchar;")@/ declare("text=file of char;")@/ @ The declarations are written in the usual \PA\ form; they may extend over several lines and each line must be enclosed in double quotes and |declare| called on it. Procedures and functions must have just the header, followed by |extern|. After the last declaration, we add a second marker. For comparison, here is part of the equivalent code from the original version of \PR (some macro definitions are omitted): \begintt = char_loc:=get_avail; info(char_loc):=char_type; int_loc:=get_avail; info(int_loc):=int_type; p:=get_avail; link(int_loc):=p; q:=get_avail; val(q):=-max_int; info(p):=q; q:=get_avail; val(q):=max_int; link(p):=q; bool_loc:=get_avail; info(bool_loc):=int_type; p:=get_avail; link(bool_loc):=p; zero_loc:=get_avail; val(zero_loc):=0; info(p):=zero_loc; one_loc:=get_avail; val(one_loc):=1; link(p):=one_loc; id5("f")("a")("l")("s")("e")(bool_const)(0); id4("t")("r")("u")("e")(bool_const)(1); p:=get_avail; val(p):=max_int; id6("m")("a")("x")("i")("n")("t")(int_const)(p); id7("i")("n")("t")("e")("g")("e")("r")(defined_type)(int_loc); id7("b")("o")("o")("l")("e")("a")("n")(defined_type)(bool_loc); id4("c")("h")("a")("r")(defined_type)(char_loc); \endtt @ And here is how it all works. When \TA\ reads a |declare|, it first evaluates the argument. As this is a string in double quotes, it copies the string into the |pool| file. Then it evaluates the |declare| and solemnly puts nothing into the \PA\ file. Then \PR\ comes to read the |pool| file. First it reads all the reserved words, then a |max_special|, then it comes here. I maintain that this mechanism is much better than the previous one, as you can actually read the declarations. There is a price: it seems that the cost of each procedure or function must be explicitly written into the declaration as an explicit number instead of a macro. I think the improvement outweighs the price. The syntax I want to adopt is that |extern| may optionally be followed by a list of integers in square brackets. The first of these is the cost of the procedure. Multiple integers are only allowed if the procedure has a |"SPECIAL"|. If present, they indicate the cost of processing each such parameter. @ And now here is a heap of declarations. @p declare("type string=varying[100] of char;") @/ declare("function abs(x:real):SAME;extern;") @/ declare("function sqr(x:real):real;extern;") @/ declare("function arctan(x:real):real;extern;") @/ declare("function sin(x:real):real;extern;") @/ declare("function cos(x:real):real;extern;") @/ declare("function ln(x:real):real;extern;") @/ declare("function sqrt(x:real):real;extern;") @/ declare("function exp(x:real):real;extern;") @/ declare("function round(x:real):integer;extern;") @/ declare("function trunc(x:real):integer;extern;") @/ declare("function pred(x:integer):integer;extern;") @/ declare("function succ(x:integer):integer;extern;") @/ declare("function ord(x:integer):integer;extern;") @/ declare("function chr(x:integer):integer;extern;") @/ declare("function odd(x:integer):integer;extern;") @# declare("var input:text;") @/ declare("var output:text;") @# declare("procedure read(var f:OPT_file;data:SPECIAL);extern;") @/ declare("procedure readln(var f:OPT_file;data:SPECIAL);extern;") @/ declare("procedure write(var f:OPT_file;data:SPECIAL);extern;") @/ declare("procedure writeln(var f:OPT_file;data:SPECIAL);extern;") @/ declare("procedure get(var f:OPT_file);extern;") @/ declare("procedure put(var f:OPT_file);extern;") @# declare("procedure open(var f:text;opts:SPECIAL);extern;") @/ declare("procedure close(var f:text;opts:SPECIAL);extern;") @/ declare("procedure reset(var f:text;opts:SPECIAL);extern;") @/ declare("procedure rewrite(var f:text;opts:SPECIAL);extern;") @/ declare("function eof(x:file):boolean;extern;") @/ declare("function eoln(x:file):boolean;extern;") @/ declare("function status(x:file):integer;extern;") @/ declare("procedure page(var f:text);extern;") @# declare("type double = real;") @/ declare("function dble(x:real):real;extern;") @# declare("end.") {this winds up the pool file} @* System-dependent changes. This module should be replaced, if necessary, by changes to the program that are necessary to make \.{PROFILE} work at a particular installation. It is usually best to design your change file so that all changes to previous modules preserve the module numbering; then everybody's version will be consistent with the printed program. More extensive changes, which introduce new modules, can be inserted here; then only the index itself will get a new module number. @^system dependencies@> @* Index. Here is a cross-reference table for the \.{PROFILE} processor. All sections in which an identifier is used are listed with that identifier, except that reserved words are indexed only in unusual cases. Underlined entries correspond to where the identifier was declared. Error messages and a few other things like ``system dependencies'' are indexed here too. @z % % ========================= END OF FILE =============================