.LM10.RM70.PS58,70.SP1.NHY .NF.NJ.NONUMBER # .B20.C THE PME PERFORMANCE MEASUREMENT .B2.C AND EVALUATION PACKAGE .B4.C by .B.C Bert Beander .B.C Technical Languages .B.C Digital Equipment Comporation .B4.C December 5, 1979 .PG # .B10.C TABLE OF CONTENTS .B3.C 1.0 Introduction . . . . . . . . . . . . . . . . . 1 .B.C 2.0 Information Flow in the Package . . . . . . . 2 .B.C 3.0 PMECLOCK: Clock-Driven Sampling . . . . . . . 5 .B.C 4.0 PMETRACE: Trace-Driven Sampling . . . . . . . 7 .B.C 5.0 PMEBUILD: Building the Bucket File . . . . . . 9 .C 5.1 Defining the Program Structure . . . . . . . . 9 .C 5.2 Defining Program Unit Address Ranges . . . . . 11 .C 5.3 Defining Sampling Buckets . . . . . . . . . . 14 .C 5.4 Specifying Options . . . . . . . . . . . . . . 16 .C 5.5 Error Recovery . . . . . . . . . . . . . . . . 17 .C 5.6 Examples of Use . . . . . . . . . . . . . . . 18 .B.C 6.0 PMEHISTO: Printing the Performance Histogram . 20 .B.C 7.0 Suggested Command File Setup . . . . . . . . . 22 .B.F.J.NUMBER0 .PG .HL1 INTRODUCTION .I5 The PME performance measurement and evaluation package is a tool for measuring where a user's program is spending its time. To do so, the package periodically samples the program counter of the running program, determines what program section each such sample falls in, and displays the resulting information in histogram form. .B.I5 The PME package consists of four parts called PMECLOCK, PMETRACE, PMEBUILD, and PMEHISTO. PMECLOCK consists of subroutines which collect program counter samples by trapping a clock interrupt every 10 milliseconds. PMETRACE consists of subroutines which collect program counter samples by tracing the user program; it thus retrieves every single instruction's program counter value, but it also takes much more time than sampling on clock interrupts. .B.I5 PMEBUILD is the program through which the user speci- fies how his program is to be divided into sections called ^&buckets\&. Each bucket is defined by an address range, and contains a counter which accumulates the number of program counter samples in that address range. Finally, PMEHISTO is the program which prints the accumulated data in histogram form with one histogram bar per bucket. .B.I5 These four parts are described in detail in Sections 3 - 6 of this manual. But first, Section 2 describes the overall structure of the PME package and how information is communicated between the parts. .PG .HL1 INFORMATION FLOW IN THE PACKAGE .I5 The input to the PME package consists of files and terminal input which specify the structure and address ranges of the program sections whose performance the user wants to measure. The output is the histogram displays which show where the program spends its time. This section describes where the input comes from, where the output goes to, and what intermediate files are needed to collect the actual program counter samples. .B.I5 The PMEBUILD program requires as input "program definition statements" which can come from a "program definition file", from the user's terminal, or from a combination of the two. These statements specify the structure of the program to be measured, i.e. how it is divided in phases, how the phases are divided into modules, how the modules are broken into routines, etc. They can also specify the actual start and end addresses of these program units, and they can specify certain options. Finally, program definition statements are used to specify how the program is to be broken into "buckets" for the data collection. As mentioned above, a ^&bucket\& is defined by an address range (such as the start and end addresses of a program module) and contains a counter which records the number of program counter samples found within that range. .B.I5 The exact formats of the program definition statements are described in Section 5 below. The program definition file normally has the extension .PMD. .B.I5 If an appropriate program definition statement so specifies, the PMEBUILD program can also retrieve much of the information it needs from the user program's linker map (the .MAP file) or from the traceback information in the executable image (the .EXE file). In particular, module name and start and end addresses can be extracted from the .MAP file, and both module and routine names and the corresponding address ranges can be extracted from the .EXE file. .B.I5 The output of PMEBUILD is a single file called a "bucket file". This file contains all necessary information about how the user has divided his program into buckets. Since no program counter values have been tallied in it, it is an "empty" bucket file. Bucket files have the extension .PME by default. .B.I5 When clock-driven traps are used to collect program counter values, the user's program calls the PMECLOCK subroutines which write the collected program counter values out to a "sample file". Its default extension is .PMS. The sample file and the empty bucket file then serve as input to PMEHISTO, which tallies the samples in the appropriate buckets and produces a histogram showing the number of tallies in each bucket. PMEHISTO also produces a "filled" bucket file, also with extension .PME, which contains all information in the empty bucket file plus the counts for each bucket. .B.I5 When tracing is used to collect program counter values, no .PMS file is written because the volume of data collected (one value for every instruction executed) is too large. Instead, the PMETRACE subroutines accept as input an empty bucket file and produce as output a filled bucket file. The filled bucket file can then be passed to PMEHISTO to produce the histogram. .B.I5 The PMEHISTO program produces two pieces of output. One is a histogram file, with a default extension of .HIS, which can be sent to a line printer. The other is a histogram display on the user's terminal. This display allows the user to examine one page at a time and to cycle through the histogram repeatedly if he so desires. Each bar in the histogram corresponds to one bucket and shows the relative proportion of the total processor time spent in that bucket. The bucket's symbolic name, address range, and percentage of the total count is also displayed. .B.I5 The overall structure of the PME package, including the flow of data between programs and files, is summarized in the figure on the next page. .PG.NF.NJ +--------------------+ +----------+ +------------+ | Program Definition | | Linker | | Executable | | File (.PMD) and | | Map File | | Image File | | user's terminal | | (.MAP) | | (.EXE) | +---------+----------+ +----+-----+ +-----+------+ | | | | | | V V V ************************************************** * * * PMEBUILD program * * * ************************************************** | | V ****************** +--------------+ ****************** * * | Empty Bucket | * * * User's * | File (.PME) | * User's * * program * +--------+--+--+ * program * * * | | * * * *************** | | *************** * * * PMECLOCK * | +------>* PMETRACE * * * * subroutines *----+ | * subroutines * * ****************** | | ****************** | | | V | | +-------------+ | | | PC Sample | | | | File (.PMS) | | | +------+------+ | | | | V +---------------+ | | +---------------+ | Filled Bucket | | +-----+ | Filled Bucket | | File (.PME) | | | | File (.PME) | +---------------+ | | +-------+-------+ A | | | |Clock Input| | Trace Input | | V V V ************************************************** * * * PMEHISTO program * * * ************************************************** | | | | V V +------------------+ +-----------------+ | Histogram Print- | | Terminal Histo- | | out File (.HIS) | | gram Display | +------------------+ +-----------------+ .B.F.J .PG .HL1 PMECLOCK: CLOCK-DRIVEN SAMPLING .I5 The PMECLOCK subroutines sample the program counter by trapping clock interrupts every 10 milliseconds. They thus accumulate approximately 100 samples per second. The simplest way of doing clock sampling is to link the user program with the /DEBUG=PME qualifier: .B.C $ LINK/DEBUG=PME user-modules .B Here "PME" is an object module in the PME package which includes the PMECLOCK subroutines. When linked in this way, this module is invoked by VMS as if it were the debugger. It thus gets control before the user program. This allows it to initiate clock sampling before starting the user program and to terminate such sampling after the user program terminates. The program counter samples are accumulated in a file called PMEFILE.PMS. .B.I5 For example, if the user program consists of modules A, B, and C, where A is the main program, a clock sample file is built by these two commands: .B.C $ LINK/DEBUG=PME A,B,C .C $ RUN A # .B Note that no editing or recompilation of any source modules is needed to use PMECLOCK in this case. .B.I5 However, if more flexibility is needed in initiating or terminating the collection of program counter samples, or if a different name is desired for the output file, the PMECLOCK subroutines must be explicitly called from the user program. To initiate sampling, the user program calls this entry point: .b.c CALL PME__INIT .b This creates and initializes a sample file whose default name is PMEFILE.PMS. It also requests a clock interrupt in 10 milliseconds and sets up a handler for it. When the interrupt occurs, the handler retrieves the interrupted program counter value, saves it for the output file, and requests another clock interrupt 10 milliseconds later, when the whole cycle repeats itself. The file buffer holds 128 samples, so a file write occurs approximately once every 1.3 seconds. .B.I5 To terminate sampling, the user program calls this entry point: .b.c CALL PME__EXIT .b This stops the clock interrupts, writes the last buffer to the sample file, and closes that file. No more program counter samples are collected thereafter. .B.I5 If a name other than the default PMEFILE.PMS is desired for the output file, this call can be used: .b.c CALL PME__SNAME (^&filename\&) .b where ^&filename\& is a Fortran character string containing the desired file name. If no extension is included, it defaults to .PMS. This string must be passed by descriptor (see Appendix C of the VAX11/780 Architecture Handbook); Fortran does this automatically, but other languages may not. The string may contain trailing blanks but must otherwise be a valid VAX/VMS filename. If called, PME__SNAME must be called ^&before\& PME__INIT. .B.I5 PMECLOCK also has two dummy entry points called PME__INAME and PME__ONAME; they both return immediately and do nothing. They are included for compatibility with PMETRACE so that a user program set up to call PMETRACE can be changed to call PMECLOCK by simply relinking it; no source changes or recompilations are needed. .B.I5 After being compiled to call PME__INIT and PME__EXIT (and possibly PME__SNAME), the user's program must be linked to include object module PMECLOCK. The /DEBUG=PME qualifier should ^¬\& be used in this case. .PG .HL1 PMETRACE: TRACE-DRIVEN SAMPLING .I5 The PMETRACE subroutines sample the program counter by tracing the user's program, thus retrieving every single instruction's program counter value. To initiate sampling, the user program calls this entry point: .b.c CALL PME__INIT .b This opens, reads in, and closes an empty bucket file (whose default name is PMEFILE.PME), and it starts tracing the user's program to collect program counter samples. These samples are directly tallied in the proper buckets as defined by the input file. Tracing is a slow process which increases the program's execution time by about 300-fold, but it collects a large number of samples. .B.I5 To terminate sampling, the user program calls this entry point: .b.c CALL PME__EXIT .b This stops the tracing, creates a filled bucket file whose default name is PMEFILE.PME, writes the accumulated bucket information to that file, and closes the file. This file can then be used as input to PMEHISTO. No more program counter samples are collected after the PME__EXIT call. .B.I5 If a name other than the default PMEFILE.PME is desired for the input bucket file, this entry point may be called: .b.c CALL PME__INAME (^&filename\&) .b Similarly, if a different name is desired for the output bucket file, this entry point may be called: .b.c CALL PME__ONAME (^&filename\&) .b In either case, ^&filename\& is a Fortran character string containing the desired file name. If no extension is included, it defaults to .PME. This string must be passed by descriptor (see Appendix C of the VAX11/780 Architecture Handbook); Fortran does this automatically but other languages may not. The string may contain trailing blanks but must otherwise be a valid VAX/VMS file name. .B.I5 If called, PME__INAME must be called before PME__INIT and PME__ONAME must be called before PME__EXIT. .B.I5 PMETRACE also has a dummy entry point called PME__SNAME. This entry point returns immediately and does nothing. It is included for compatibility with PMECLOCK so that a program set up to call PMECLOCK can be changed to use PMETRACE by simply relinking it. .B.I5 After being compiled to call PME__INIT and PME__EXIT (and possibly PME__INAME and PME__ONAME), the user's program must be linked to include object module PMETRACE. .PG .HL1 PMEBUILD: BUILDING THE BUCKET FILE .I5 The PMEBUILD program creates a new bucket file based on the program and bucket definitions entered by the user. PMEBUILD is invoked as follows: .b.c $ RUN PMEBUILD .B.I5 PMEBUILD first asks the user if he wishes to specify the input (.PMD) and output (.PME) file names. If the answer is "N" (for No), these default names are used: .b.tp2.nf.nj.i8 Input Program Definition File: PMEFILE.PMD .i8 Output Empty Bucket File: PMEFILE.PME .b.f.j If the answer is "Y" (for Yes), PMEBUILD asks the user to enter each of the two file names. A blank response to either query causes the corresponding default name to be used. If SYS$INPUT is specified as the input file, all input is taken from the user's keyboard. Also, if the input file cannot be opened, all input is taken from the keyboard. .B.I5 Once the input file is opened, statements are read from this file until the file ends or an END statement is encountered. If no END statement is found, additional input is solicited from the user's terminal until an END statement is entered. .B.I5 In the sections below, the actual input statements are described in the order they would normally be entered through the program definition file and then the user's keyboard. .HL2 ^&Defining the Program Structure\& .I5 The first thing PMEBUILD needs to know is how the user's program is divided into subunits such as phases, modules, and routines. The possible ^&program\& ^&unit\& ^&kinds\& are declared as follows: .b.nf.nj.i11 DEFINE UNITS: ^&kind1\&, ^&kind2\&, ..., ^&kindn\& .b.f.j This says that the user's program is to be broken into units called ^&kind1\&, ^&kind2\&, ..., ^&kindn\& where ^&kind2\& is a subunit of ^&kind1\&, ^&kind3\& is a subunit of ^&kind2\&, and so on. For example, .b.nf.nj.i12 DEFINE UNITS: PROGRAM, PHASE, MODULE .b.f.j declares that the largest unit, called PROGRAM, is divided into smaller units called PHASEs, and each PHASE is divided into smaller yet units called MODULEs. These are names of the user's choosing, and up to ten such subdivisions may be declared. .B.I5 In addition to declaring program unit kinds, DEFINE UNITS declares that all following program unit specifica- tions are to be read in "define units mode," meaning that they declare the structure of the user's program. This is best illustrated by example: .B.tp11.NF.NJ .I12 DEFINE UNITS: PROGRAM, PHASE, MODULE .I12 PROGRAM CRUNCH .I16 PHASE READ__DATA .I20 MODULE INITIALIZE .I20 MODULE READER .I16 PHASE PROCESS__DATA .I20 MODULE INVERT .I20 MODULE MINIMIZE .I20 MODULE COMPUTE .I16 PHASE PRINT__DATA .I20 MODULE PRINTER .B.F.J Here three ^&kinds\& of program units called PROGRAM, PHASE, and MODULE are declared. A program CRUNCH is then declared and its structure is defined: it consists of three PHASEs, each of which consists of one to three MODULEs. This structure can later be referenced to define sampling buckets. .B.I5 As the example illustrates, a program unit specifica- tion has, in its simplest form, the following format: .b.c ^&kind\& ^&unitname\& .b where ^&kind\& is a previously declared program unit kind and ^&unitname\& is the name of the new program unit. In its most general form, a program unit specification looks like this: .B.C ^&kind\& ^&unitname\&, ^&start\& - ^&end\&, ^&step\&, ^&keyword\& .b Here ^&start\& and ^&end\& specify the program unit's start and end addresses in hexadecimal, and ^&step\& specifies that this program unit should be broken into equal-sized buckets, each ^&step\& bytes long, if it is included in the output bucket file. ^&step\& is specified in hexadecimal. ^&keyword\& is only meaningful if ^&kind\& is MODULE; the keyword LINES in this position specifies that high level language line numbers (e.g., Fortran line numbers) should be extracted from the executable image for this module (see Section 5.2 below). The keyword NOLINES specifies that line numbers should not be retrieved. Both the LINES and NOLINES keywords can be overridden by the DEFINE OPTIONS command; see Section 5.4. If no keyword is specified, NOLINES is the default. .B.I5 The ^&start\& - ^&end\&, ^&step\&, and ^&keyword\& fields are all optional. This specification is thus legal: .b.c MODULE MUMBLE,,10 .b This does not specify module MUMBLE's address range or a keyword, but it does specify its step size to be 10 hexadecimal, or 16 decimal, bytes. .B.I5 Program unit start and end addresses are not normally specified directly in program unit specification statements since PMEBUILD can more conveniently extract this informa- tion from the user program's linker map or the traceback information in its executable image. Even much program structure information (such as what modules the program contains) can be determined from these sources. How this is done is explained in Section 5.2 below. .B.I5 Program unit and kind names consist of 1 - 16 charac- ters from the set A - Z, 0 - 9, $, and __, with lower case letters being treated as upper case. .B.I5 Three miscellaneous points should be noted about the DEFINE UNITS statement. First, no program unit kind may be called "DEFINE" or "END" or have the same name as a previously declared kind. Second, the abbreviated statement .b.c DEFINE UNITS .b may be used to switch to "define units mode" from another mode. This causes subsequent program unit specifications to be treated as declarations of additional program structure. It requires that program unit kinds have been declared with a previous DEFINE UNITS statement. And third, the colon and commas in the DEFINE UNITS statement are optional. Thus .b.c DEFINE UNITS PROGRAM PHASE MODULE .b is valid. It may be less readable, but it is easier to type. .HL2 ^&Defining Program Unit Address Ranges\& .I5 Once some program structure has been declared, the statement .b.c DEFINE ADDRESSES .b can be used to enter "define addresses mode". This means that subsequent program unit specifications are allowed to define address ranges, step sizes, and keywords, but only for previously declared program units. Thus .b.tp2.c DEFINE ADDRESSES # .c MODULE READER, 200-2AF .b declares module READER to have a start address of 200 (hexadecimal) and an end address of 2AF (also hexadecimal). If module READER is not a previously declared program unit, this statement is illegal and results in an error message. .B.I5 A more useful form of the DEFINE ADDRESSES statement is this: .b.c DEFINE ADDRESSES: MAP "CRUNCH.MAP" .b This says that all MODULE address definitions (start and end addresses) are to be retrieved by scanning the specified linker map (file CRUNCH.MAP in this example). The addresses in program section (PSECT) $CODE are normally used since this is the proper default for Fortran programs. If a different PSECT ("LIB$CODE", for example) should be used, this can be specified as follows: .b.c DEFINE ADDRESSES: MAP "CRUNCH.MAP" PSECT LIB$CODE .b.f.j Use of "DEFINE ADDRESSES:#MAP" ^&requires\& that a program unit kind called MODULE exists. If it does not, the statement is in error. .B.I5 If modules are found in the linker map which have not been declared in define units mode, those modules are added to the end of PMEBUILD's program unit list anyway. Thus the three statements .b.tp3.nf.nj.i13 DEFINE UNITS: PROGRAM, MODULE .I13 PROGRAM CHESS .I13 DEFINE ADDRESSES: MAP "CHESS.MAP" .b.f.j are enough to declare program CHESS to consist of all modules found in the linker map CHESS.MAP. In addition, PMEBUILD now knows the address ranges of all those modules. Hence enough information is available to immediately specify sampling buckets. .B.I5 A third form of the DEFINE ADDRESSES statement extracts module, routine, and line number definitions from the traceback information in the user's executable image: .b.nf.nj.i13 DEFINE ADDRESSES: EXE "CRUNCH.EXE" .b.f.j For this to work, a program unit kind called MODULE must exist, i.e., must have been declared with a DEFINE UNITS statement. Routine information is extracted only if program unit kind ROUTINE exists, where ROUTINE is a subunit of MODULE. Similarly, line numbers are extracted only if kind LINE exists, where LINE is a subunit of both MODULE and ROUTINE. Kind ROUTINE does not have to exist for line numbers to be retrieved. In Fortran, for example, where there is exactly one routine per object module, it is enough that kinds MODULE and LINE exist. .B.I5 DEFINE ADDRESSES:#EXE also requires that the executable image has been linked, and its modules compiled, with the /TRACEBACK or /DEBUG option. However, since /TRACEBACK is a default option for the linker and most languages, it usually need not be explicitly specified. .B.I5 If new modules are found in the .EXE file, PMEBUILD adds them and their contained routines to the end of the program unit list. Thus the three statements .b.tp3.nf.nj.c DEFINE UNITS: PROGRAM, MODULE, ROUTINE .c PROGRAM CHECKERS # .c DEFINE ADDRESSES: EXE "CHECKERS.EXE" # .b.f.j are enough to declare program CHECKERS to consist of all modules and routines found in the traceback information in CHECKERS.EXE. Enough information is then available to create sampling buckets. Similarly, if modules have been explicitly declared in define addresses mode but routines and line numbers have not, the routines and line numbers are inserted after the proper modules in PMEBUILD's program unit list. .B.I5 Large high-level language programs can easily have enough lines to overflow PMEBUILD's internal tables. For this reason, line numbers are not extracted from the executable image by default. They are extracted for a module X only under two circumstances:#(1) the DEFINE OPTIONS:#LINES command has been used, or (2) the DEFINE OPTIONS:#SOMELINES option has been set and the LINES keyword (Section 5.1) has been specified for module X. (Since SOMELINES is a default option, it is enough to specify the LINES keyword in the module declaration.)##The user can thus specify that all line numbers be extracted with the DEFINE OPTIONS:#LINES command, or he can specify which lines he wants on a module by module basis. Line numbers have numeric names without leading zeroes. Thus "LINE 10" names the line given as 0010 in the left margin of a Fortran listing. .B.I5 Like DEFINE ADDRESSES:#MAP, DEFINE ADDRESSES:#EXE nor- mally extracts only those address ranges which are in program section $CODE, since this is the proper default for Fortran programs. If a different PSECT should be used, it is specified as follows: .B.C DEFINE ADDRESSES:#EXE "CHECKERS.EXE" PSECT LIB$CODE .B Here LIB$CODE is the desired program section. .B.I5 For typing convenience, the keyword ADDRESSES can be shortened to ADDR and the colon thereafter omitted in all DEFINE ADDRESSES statements. .HL2 ^&Defining Sampling Buckets\& .I5 The actual creation of sampling buckets is done in "define sampling mode" which is entered with the .b.c DEFINE SAMPLING .B statement. Subsequent program unit specifications then cause sampling buckets to be defined. These specifications are of two kinds. First, the statement .b.i14 ^&kind\& ^&unitname\&, ^&start\& - ^&end\&, ^&step\& .b causes the specified program unit (which must be previously declared) to be broken into equal-sized buckets covering ^&step\& bytes each. The optional ^&start\& - ^&end\& field specifies the start and end of the desired address range ^&relative\& to the start of the program unit. Thus if routine X covers addresses 2000 - 23AB (hexadecimal), "ROUTINE X, 100 - 1FF, 10" causes 16 buckets covering the address range 2100 - 21FF to be generated. This address range must be wholly contained in the specified program unit (routine X in this case). If ^&start\& - ^&end\& is omitted, the whole program unit is covered. The ^&step\& field is also optional. If a step size has already been specified in define units or define addresses mode, it need not be specified again. If no step size is defined at all, the whole program unit constitutes a single bucket. .B.I5 The following are examples of legal specification statements: .b.tp4.c MODULE INVERT,,40 # .c MODULE COMPUTE # .c ROUTINE GETLINE, 100-1FF, 20 .c ROUTINE THINK, 40 - 225 # .b Note that ^&start\&, ^&end\&, and ^&step\& are all specified in hexadecimal. .B.I5 The second kind of sampling specification is of this form: .b.c ^&kind1\& ^&unitname\& BY ^&kind2\& .b Here ^&kind1\& and ^&kind2\& are program unit kinds where ^&kind2\& is a subunit of ^&kind1\&. This says that each subunit of kind ^&kind2\& within the program unit defined by ^&kind1\& ^&unitname\& constitutes a separate sampling bucket. For example, .b.i16 PHASE PROCESS__DATA BY MODULE .b causes each module in phase PROCESS__DATA to constitute a sampling bucket. However, if one of those modules has a defined step size, it will in turn be broken into enough buckets of that size to cover the module. .B.I5 Consider what happens in this simple example: .b.NF.NJ.tp10 .I12 DEFINE UNITS: PROGRAM, MODULE .I12 PROGRAM FIDO .I16 MODULE MAIN .I16 MODULE SUB1 .I16 MODULE SUB2 .I12 DEFINE ADDRESSES: EXE "FIDO.EXE" .I12 MODULE SUB1,,20 .I12 DEFINE SAMPLING .I12 PROGRAM FIDO BY MODULE .I12 END .B.F.J The program structure is defined, all start and end addresses are extracted from the executable image, and the step size is set to be 20 hexadecimal (32 decimal) bytes for module SUB1. Program FIDO is then broken into sampling buckets by module. Module MAIN becomes one sampling bucket, module SUB1 is divided into as many 32-byte buckets as are needed to cover its address range, and module SUB2 becomes one bucket. Program counter values will be collected and eventually displayed in terms of these buckets. .B.I5 Each statement entered in define sampling mode creates what is called a "bucket group". The following restrictions apply to bucket groups: (1)#At most ten bucket groups may be specified; (2)#No two bucket groups may have overlapping address ranges; (3)#No two buckets within a bucket group may have overlapping address ranges; and (4)#No bucket or bucket group may include both positive and negative addresses (i.e., cover both system space and user space). .B.I5 After the last sampling specification, an .b.c END .b statement must be entered. This causes PMEBUILD to output the desired bucket file and then stop. .HL2 ^&Specifying Options\& .I5 Certain options can be communicated to PMEBUILD through the DEFINE OPTIONS statement: .b.nf.nj.i11 DEFINE OPTIONS: opt1, opt2, ..., optn .b.f.j where opt1, opt2, ..., optn are options keywords. The following options keywords are allowed: .b.lm+20.i-15 LIST#########-#This causes all program definition statements to be listed on the user's terminal as they are read from the .PMD file or keyboard. .b.i-15 NOLIST#######-#This suppresses the listing of program definition statements. .b.i-15 PRINT########-#This causes information about each bucket group and the buckets therein to be printed out each time a bucket group is created in define sampling mode. .b.i-15 NOPRINT######-#This suppresses the bucket group print- out. .b.i-15 RELADDR######-#This sets a flag in the bucket file which causes PMEHISTO to print program- unit-relative addresses in the histogram printout. .b.i-15 ABSADDR######-#This sets the same bucket file flag so that PMEHISTO prints absolute addresses in the histogram printout. .b.i-15 CLEARCOUNTS##-#This sets a flag in the bucket file which causes all bucket counts to be cleared to zeroes before a new set of program counter values are tallied in the file. The effect is that a filled bucket file with this bit set can serve as input to PMETRACE and PMEHISTO where an empty bucket file is expected; it is not necessary to rerun PMEBUILD if a filled bucket file already has the desired bucket definitions. .b.i-15 ACCUMCOUNTS##-#This sets the same flag so that the bucket counts are not cleared before new counts are tallied. The effect is that repeated sampling runs through PMETRACE (for trace data) or PMEHISTO (for clock- interrupt data) with the same bucket file causes the sampling counts to be accumulated through ascending versions of the bucket file. This is useful for a user who wants to collect representa- tive data from a large number of sam- pling runs. .b.i-15 LINES########-#This causes subsequent DEFINE ADDRESSES: EXE commands to extract line numbers from the executable image for ^&all\& modules in the program. .b.i-15 SOMELINES####-#This causes subsequent DEFINE ADDRESSES: EXE commands to extract line numbers ^&only\& for those modules which were entered with the LINES keyword in define units or define addresses mode (see Section 5.1). .b.i-15 NOLINES######-#This prevents subsequent DEFINE ADDRESS- ES:#EXE commands from extracting any line numbers from the executable image. .LM-20 .B.I5 If no options are specified through DEFINE OPTIONS statements, default options corresponding to these state- ment are in effect: .b.tp2.nf.nj.c DEFINE OPTIONS: NOLIST, NOPRINT, RELADDR .c DEFINE OPTIONS: CLEARCOUNTS, SOMELINES # .f.j.b For typing convenience, the colon and commas are optional in the DEFINE OPTIONS statement. .HL2 ^&Error Recovery\& .I5 When erroneous input is entered, PMEBUILD gives an error message and in general aborts the semantic effect of the errant statement. However, when defining sampling buckets, it is possible to unintentionally overflow the bucket file or internal tables in PMEBUILD. Furthermore, this may not be detected until the END statement is entered. To recover from this situation, PMEBUILD solicits additional input from the user, who can then enter this command: .b.c DEFINE CLEAR .B This clears PMEBUILD's internal bucket and bucket group tables, after which the user can redo all his bucket definitions from scratch. All program structure definitions remain intact, however. .HL2 ^&Examples of Use\& .I5 Here we give two examples of using PMEBUILD, one simple and one more complicated. In the simple case, the user has a program whose performance he wants to measure by module, but he does not set up a program definition file. The following run stream sets up the bucket file: .b.lm+5.tp5.nj.nf $#LINK USERPROG,PMECLOCK $#RUN PMEBUILD Do you wish to specify file names?#(Yes or No):#N Enter DEFINE statement:#DEFINE UNITS:#PROGRAM, MODULE Enter program unit spec:#PROGRAM X Enter program unit spec:#DEFINE ADDRESSES:#EXE "X.EXE" Enter address spec:#DEFINE SAMPLING .TP6 Enter sampling spec:#PROGRAM X BY MODULE Enter sampling spec:#END .b ***Bucket File created*** .b $ .b.lm-5.f.j Prompts and other output from PMEBUILD is shown in lower case and the user's input is shown in upper case. The six input lines (from "DEFINE UNITS" to "END") represent the minimum input to PMEBUILD, but are enough to give a complete breakdown of the whole program by module. .B.I5 In the next example, a more complicated program struc- ture is declared with the following program definition file: .b.lm+10.nf.nj.tp9 DEFINE UNITS:#PROGRAM, PHASE, MODULE PROGRAM GRAVEL .I8 MODULE GRAVEL .I4 PHASE ONE .I8 MODULE FEED .I4 PHASE TWO .I8 MODULE CRUNCH .I8 MODULE CRUSH .I8 MODULE GRIND .I4 PHASE THREE .I8 MODULE SIEVE .I8 MODULE SHOVEL PROGRAM SYSTEM$SPACE, 80000000-FFFFFFFF .TP3 DEFINE ADDRESSES:#EXE "GRAVEL.EXE" MODULE CRUSH,,100 DEFINE SAMPLING .B.LM-10.F.J Assuming that this file has the name PMEFILE.PMD, the following run stream builds a bucket file: .b.lm+5.tp9.nf.nj $#LINK GRAVEL,FEED,CRUNCH,CRUSH, - GRIND,SIEVE,SHOVEL,PMECLOCK $#RUN PMEBUILD Do you wish to specify filenames?#(Yes or No):#N Enter sampling spec:#PHASE TWO BY MODULE Enter sampling spec:#END .B ***Bucket File created*** .b $ .b.f.j.lm-5 In this case, everything except the actual sampling specifi- cation has been set up in the program definition file. Hence only two lines ("PHASE TWO BY MODULE" and "END") need to be entered when PMEBUILD is run. A program definition file takes more time to set up initially, but it saves time and trouble when repeated PMEBUILD runs, specifying differ- ing bucket configurations, are expected. .PG .HL1 PMEHISTO: PRINTING THE PERFORMANCE HISTOGRAM .I5 PMEHISTO is the reduction program which displays what fraction of the user program's total time is spent in each bucket. It is invoked with this command: .b.c $ RUN PMEHISTO .B.I5 PMEHISTO first asks the user whether Clock or Trace generated data is to be processed. If the answer is "C" (for Clock), a sample file (.PMS) and an empty bucket file (.PME) serve as input and a filled bucket file (.PME) and the histogram is the output. If the answer is "T" (for Trace), a filled bucket file (.PME) is the input and the histogram is the output. .B.I5 In either case, PMEHISTO also asks the user whether he wants to specify the names of the input and output files involved. If the answer is "N" (for No), the following default names are used: .b.tp5.lm+12.nf.nj.i-2 For Clock-driven sampling: Input Sampling File: PMEFILE.PMS Input Bucket File: PMEFILE.PME Output Bucket File: PMEFILE.PME Output Histogram File: PMEFILE.HIS .B.tp3.I-2 For Trace-driven sampling: Input Bucket File: PMEFILE.PME Output Histogram File: PMEFILE.HIS .b.lm-12.f.j If the answer is "Y" (for Yes), PMEHISTO asks the user for each of the desired file names. A blank response to any such query causes the corresponding default name (as listed above) to be used, but with one exception:#for clock-driven sampling, if the input bucket file's name is specified but the output bucket file's is not, the specified name is used for both files. .B.I5 PMEHISTO produces the reduced output in two forms. The first is a histogram file whose default name is PMEFILE.HIS. This file is suitable for listing on the line printer. It shows the symbolic name of each program unit sampled, the relative or absolute address range of each bucket, the percentage of the total time spent in that bucket, and a histogram bar proportional to that percentage. The histo- gram is scaled so that the bucket with the largest reference count has a bar that spans the full width of the histogram. Some summary statistics are also printed at the end of the histogram. .B.I5 The second form of the reduced output is a histogram display on the user's terminal. This display is identical to that meant for the line printer except that each page is much shorter--the terminal can only display 24 lines at a time. After each page of the display, the user is asked to enter a command. The following commands are accepted: .b.tp5.lm+7.nf.nj Carriage Return#--#Display next histogram page R#--#Restart histogram display from beginning S#--#Skip to the summary statistics display K#--#Kill the display and exit program H#--#Help:#print this display command list .b.lm-7.f.j After completing the display, PMEHISTO automatically restarts it from the beginning. The user can thus cycle through the histogram as many times as he wants. The only way to stop the display is to enter the K command. .B.I5 The following figure illustrates what one page of the terminal display looks like: .B2.NF.NJ.TP23 Performance Measurement and Evaluation Histogram Page 1 .B PROGRAM COMPILER BY PHASE .B +----+----+----+----+----+----+----+----+ FRONT__END | 0 - C2E6 |**************************************** 22.4% ALLOCATOR | 0 - 1CBD |** 0.8% OUTBIN | 0 - 1C50 |********* 4.7% OPTIMIZER | 0 - 46A6 |************************* 13.9% CODE__GEN | 0 - 1FFF |********* 5.0% 2000 - 3FFF |********** 5.6% 4000 - 5FFF |************************* 13.9% 6000 - 6583 |******* 4.3% FINAL | 0 - 227F |********** 5.8% +----+----+----+----+----+----+----+----+ .B Scaling: 8.3 counts/asterisk .B2.F.J This illustration has been compressed a bit to make it fit the page. The actual histogram is slightly wider. .PG .HL1 SUGGESTED COMMAND FILE SETUP .I5 Frequent users of the PME package may find a command file useful for doing routine setup operations. The follow- ing command file, which assumes that all PME components are found in the [PERFDIR] directory, is suggested as a model: .b.lm+5.nf.nj $#WRITE SYS$OUTPUT "---PME Setup Run---" $ ! $10: $#INQUIRE ANSWER "Do you want to relink? (Y or N)" $#IF ANSWER .EQS. "Y" THEN GOTO 20 $#IF ANSWER .EQS. "N" THEN GOTO 50 $#GOTO 10 $ ! $20: $#INQUIRE ANSWER "Clock or Trace sampling? (C or T)" $#IF ANSWER .EQS. "C" THEN GOTO 30 $#IF ANSWER .EQS. "T" THEN GOTO 40 $#GOTO 20 $ ! $30: $#WRITE SYS$OUTPUT "---Clock-driven sampling---" $#LINK/MAP PROGNAME,...,[PERFDIR]PMECLOCK $#GOTO 50 $ ! $40: $#WRITE SYS$OUTPUT "---Trace-driven sampling---" $#LINK/MAP PROGNAME,...,[PERFDIR]PMETRACE $ ! $50: $#PURGE PROGNAME.* $#COPY [PERFDIR]PMEFILE.PMD PMEFILE.PMD $#PMEBUILD :==#"RUN [PERFDIR]PMEBUILD" $#PMEHISTO :==#"RUN [PERFDIR]PMEHISTO" $ ! $#WRITE SYS$OUTPUT "---PME Setup Done---" .lm-5.f.j .B.I5 The command file handles the task of linking the user's program with the proper sampling subroutines and it sets up PMEFILE.PMD in the user's default directory. It also defines two new commands, namely .b.c $#PMEBUILD .BR and .c $#PMEHISTO .b which run the corresponding programs. This is a convenient shorthand. .B.I5 Using this command file, which copies PMEFILE.PMD to the user's directory, and then using all the default file names has another advantage: all input and output files have the name "PMEFILE"--only the extensions differ. They can thus all be deleted by this single command: .b.c $#DELETE PMEFILE.*;* .b All the garbage accumulated by repeated PME runs can be removed in a single stroke.