.PAGE SIZE 60,68 .RIGHT MARGIN 68 .LAYOUT 1,2 .CENTER; Record Definition Tutorial .BLANK .CENTER; B.#Z.#Lederman .CENTER; System Resources Corp. .CENTER; 128 Wheeler Rd. .CENTER; Burlington, MA 01803-5139 .CENTER; (617) 270-9228 .NOTE Abstract This session will attempt to go beyond the material in the Datatrieve manuals, and show how various types of problems may be solved using the options available within the record definition. Some introductory material will be given, and then the session will concentrate on showing how a record definition can be derived for a file of unknown layout. .END NOTE .TITLE Record Definition Tutorial .SUBTITLE B.#Z.#Lederman .PARAGRAPH It is not unreasonable to state that the record definition is the foundation of any Datatrieve application, as it is the reference by which all data is stored and retrieved. Therefore, the first rule for any application is: .BLANK.CENTER KNOW YOUR APPLICATION .BLANK from which immediately follows: .BLANK.CENTER KNOW YOUR DATA .BLANK Often, an unsuspected benefit of taking an existing manual operation and implementing it on Datatrieve is that the people involved must sit down and figure out exactly what pieces of data they are dealing with, and in what manner: this is often the first time anyone actually does this, and they are often surprised by the amount of data involved. .PARAGRAPH Some applications move onto Datatrieve almost automatically. If you are using a pre-printed form (and almost every company has some sort of printed form for orders, absence reports, pencil requisitions, etc.) then one can simply copy the fields into the record definition: there are cases of new users moving applications like this in one day. If your records are not as well organized, then you must analyze them yourself. If you are using Datatrieve to read an existing file created by some other program, then it is necessary to obtain the file record layout and follow it. .PARAGRAPH Keep in mind that, while it is nice to get a good record definition at the beginning, it is always possible to define a new domain and record and read the data from the old domain to a new one, so if you have 10,000 records stored, and find you need to add a field, don't panic. .BLANK.TEST PAGE 3.CENTER Data Types. .PARAGRAPH Although the data types are explained in the manual, a brief review is appropriate. One important factor to keep in mind is that there are no "Datatrieve-type" data types and no "Datatrieve-type" files. Datatrieve works with standard VMS (or PDP-11) datatypes and files, and all of these data types are found in programs written in other languages. .BLANK.TEST PAGE 3 CHARACTER STRINGS store one ASCII character per byte. They are the only way to store character data in files. .BLANK.TEST PAGE 6 DISPLAY (the default for numeric) stores one ASCII character per byte. This mode is easy to transfer to other languages (including DCL), the file can be printed out directly on terminals and printers for examination, and is easy to determine the space used for redefine clauses. However, it uses more space in the record than other types, and math operations will often be slower. .BLANK.TEST PAGE 7 COMP#/#INTEGER (BYTE, WORD, LONG, QUAD) is the most efficient method of storing whole numbers, using the least space, and the fastest math operations. It will transfer to most other languages. The file cannot be examined directly, and usually cannot be redefined. The field is 1, 2, 4, or 8 bytes depending upon the size of the PIC clause if it declared as COMP or INTEGER, and this must be known if compatability with other languages is required. .BLANK.TEST PAGE 7 COMP__1#/#REAL is most efficient for real numbers (numbers with fractions), and will cover a wide range from very large to very small numbers without having to know the range when defining the record. Math is almost as fast as integer, but due to floating point rounding, should usually not be used for sums of money. This data type usually cannot be redefined, and will transfer to most other languages (but not all). The field is 4 bytes by default. .BLANK.TEST PAGE 4 COMP__2#/#DOUBLE is like real: it uses more space but has more precision (more decimal places of accuracy). It transfers to slightly fewer languages, and is 8 bytes in size. .BLANK.TEST PAGE 8 COMP__3#/#PACKED is like display but slightly more compact, having about two digits per byte: the formula in the manual must be used for the exact number. My recommendation is that this type should be reserved for compatability with other languages which require the use of packed data. It is used by other computer systems, in which case it may be more easily transferred than integer or real. .BLANK.TEST PAGE 3 COMP__5#/#ZONED is like packed with different sign handling. The same comments apply. .BLANK.TEST PAGE 3 COMP__6 is a data type used by COBOL, and is 2, 4, 6, or 8 bytes long depending upon the PIC length. I recommend using this only to read existing files. .BLANK.TEST PAGE 5 DATE is a special data type unique to DEC. It is an 8 byte integer, but the numeric value is converted to a date when displayed. Many problems with the DATE field type may be avoided by remembering that internally the date is always a number, and not a character string. DATE can be very useful as it allows comparisons to be made on date and time (for record selection, such as all records after 14__AUG__1987#12:15:00.00), and allows sorting records on date and time. .BLANK.TEST PAGE 3.CENTER Getting Started: a text file with SYSGEN parameters. .PARAGRAPH The easiest file to read is usually one where all of the information is normal ASCII text. This is because the file is very easy to examine (you just have to TYPE it to your terminal or EDIT it). If you're lucky, the fields will even be aligned. .PARAGRAPH An example of this is the file SYSGEN creates with a description of all of the system parameters in it. It is created with the following commands: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 7 $ SET DEFAULT SYS$SYSTEM $ MC SYSGEN SYSGEN> USE CURRENT SYSGEN> SET /OUTPUT = PARAMS.LOG SYSGEN> SHOW /ALL SYSGEN> EXIT .BLANK The beginning of this file looks like this: .BLANK Parameters in use: Active Parameter Name Current Default Minimum Maximum Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- PFCDEFAULT 64 32 0 127 Pages D KFILSTCNT 16 4 2 255 Slots GBLSECTIONS 400 128 20 4095 Sections GBLPAGES 20500 4096 512 -1 Pages GBLPAGFIL 2048 1024 128 -1 Pages MAXPROCESSCNT 128 72 12 8192 Processes PROCSECTCNT 32 32 5 1024 Sections MINWSCNT 20 20 10 -1 Pages PAGFILCNT 2 2 1 63 Files .BLANK.JUSTIFY.FILL .PARAGRAPH Producing a record definition for this file is simple: all you have to do is count how many columns each text field occupies, paying attention to the actual data and not the header text at the top of the page (though these give you good names for your fields). The results should look like this: .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD TPARAMS__RECORD 01 TPARAMS__REC. 10 NAME PIC X(16). 10 CURRENT PIC X(18). 10 SUBCUR REDEFINES CURRENT. 20 FILLER PIC X(8). 20 RCUR PIC X(10). 10 DEFAULT PIC X(10). 10 MINIMUM PIC X(10). 10 MAXIMUM PIC X(10). 10 FILLER PIC X. 10 UNIT PIC X(12). 10 DYNAMIC PIC X. ; .BLANK.JUSTIFY.FILL.PARAGRAPH FILLER is used to skip over those areas which just have blank spaces and no useful information. .PARAGRAPH This reads the file as text. Since some of the data is numeric, it would be nice to convert it to numbers. This is quite simple: create another domain with the fields defined as numeric: .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD PARAMS__RECORD 01 PARAMS__REC. 10 NAME PIC X(16). 10 CURRENT USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 10 DEFAULT USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 10 MINIMUM USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 10 MAXIMUM USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 10 UNIT PIC X(12). 10 DYNAMIC PIC X. ; .BLANK.JUSTIFY.FILL and then read the data from the text file domain and write to the numeric file. The error messages you would get by trying to convert those fields which are text to numbers can be avoided by selecting only valid records: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 3 FOR TPARAMS WITH UNIT NE "Ascii", "Unit Dynami", " " STORE PARAMS USING PARAMS__RECORD = TPARAMS__RECORD .BLANK.JUSTIFY.FILL Notice that if the field names match in the two records, then all you have to do is equate the two records: Datatrieve will automatically do the conversion from text to numbers. .BLANK 2.TEST PAGE 5.CENTER More than one record type in a file. .PARAGRAPH Unfortunately, not all text files have nicely aligned columns: the output of an INSTALL#LIST/FULL command is one such file which is not so easily processed. A portion of typical listing looks like this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 7 DISK$VAXVMSRL4:.EXE A1;5 Open Shar Prv Entry access count = 50 Current / Maximum shared = 0 / 7 Global section count = 5 Privileges = SYSNAM GRPNAM TMPMBX WORLD OPER NETMBX SYSPRV .BLANK 2.TEST PAGE 5 DISK$VAXVMSRL4:.EXE ANALIMDMP;1 Prv Entry access count = 0 Privileges = CMKRNL CMEXEC .BLANK.TEST PAGE 4 AUTHORIZE;1 Prv Entry access count = 4 Privileges = CMKRNL .BLANK.TEST PAGE 6 BACKTRANS;1 Open Shar Prv Entry access count = 0 Current / Maximum shared = 0 / 0 Global section count = 3 Privileges = GRPNAM PRMGBL .BLANK.JUSTIFY.FILL and so on. Fortunately, some things do occur in the same place in each record: for example, the image name always starts in column 4. The goal is to get the information in the listing to be 'normalized' into something like this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 12 MAX DIRECTORY IMAGE QUAL ENTRY SHARED GLOBAL .BLANK .EXE A1 O SP 50 7 5 .EXE ANALIMDMP P 0 0 0 .EXE AUTHORIZE P 4 0 0 .EXE BACKTRANS O SP 0 0 3 .EXE CDU OH P 2 1 0 .EXE CMS O S 0 0 2 .EXE COPY OHS 20 2 1 .EXE DCL OHS L 16,859 5 1 .BLANK The record definition for this domain is quite simple: .BLANK DEFINE RECORD INSTALL__P__RECORD OPTIMIZE 01 INS__P__REC. 10 DIRECTORY PIC X(64) EDIT__STRING T(48). 10 IMAGE PIC X(32) EDIT__STRING T(24). 10 QUAL PIC X(8). 10 QUALIFIERS REDEFINES QUAL. 20 OPEN PIC X. ! matches INSTALL abbreviations 20 HDR PIC X. 20 SHAR PIC X. 20 PRV PIC X. 20 PROT PIC X. 20 LNKBL PIC X. 20 CMODE PIC X. 20 NOPURG PIC X. 10 ENTRY PIC 9(6) EDIT__STRING ZZZ,ZZ9. 10 MAX__SHARED PIC 9(6) EDIT__STRING ZZZ,ZZ9. 10 GLOBAL PIC 9(6) EDIT__STRING ZZZ,ZZ9. ; .JUSTIFY.FILL the trick is to get from one format to the other. In this case, the work has to be done in the processing of the data rather than all in the record definition. .BLANK The input record definition looks like this: .BLANK.NO JUSTIFY.NO FILL DEFINE RECORD INSTALL__IN__RECORD OPTIMIZE 01 INS__IN__REC. 10 TEXT__IN PIC X(80). 10 PARSE__FIELDS REDEFINES TEXT__IN. 20 DISK PIC XXX. 20 IMAGE PIC X(77). 20 PARSE__2 REDEFINES IMAGE. 30 BLANK PIC X(5). 30 TYPE PIC X(10). ; .BLANK.JUSTIFY.FILL The feature which is of most use here is REDEFINES, which allows a portion of the record to be handled in more than one way. For example, TEXT__IN makes the entire record one long text field, but PARSE__FIELDS allows it to be sub-divided into a section where the disk name will appear and one where the image name will appear. But since other information also shows up in the record under the image name, we further sub-divide IMAGE into a section where there should only be blank space, and a section where some text will appear which will tell us what kind of record we are reading. .PARAGRAPH If you print the IMAGE field of this record definition, you get this: .BLANK.NO JUSTIFY.NO FILL ET NOVERIFY T/FULL .BLANK K$VMSRL5:.EXE SPELL;36 Open Hdr Shar <-- Entry access count = 10 Current / Maximum shared = 0 / 2 Global section count = 1 .BLANK.JUSTIFY.FILL And similarly the TYPE field looks like this: .BLANK.NO JUSTIFY.NO FILL VERIFY L .BLANK RL5: 0 BEGIN ! if there is one ! image name, no ";" T__IMAGE = FN$STR__EXTRACT(IMAGE, 1, (I - 1)) IF FN$STR__LOC (IMAGE, "Open") > 0 THEN T__OPEN = "O" IF FN$STR__LOC (IMAGE, "Hdr") > 0 THEN T__HDR = "H" IF FN$STR__LOC (IMAGE, "Shar") > 0 THEN T__SHAR = "S" IF FN$STR__LOC (IMAGE, "Prv") > 0 THEN T__PRV = "P" IF FN$STR__LOC (IMAGE, "Prot") > 0 THEN T__PROT = "P" IF FN$STR__LOC (IMAGE, "Lnkbl") > 0 THEN T__LNKBL = "L" IF FN$STR__LOC (IMAGE, "Cmode") > 0 THEN T__CMODE = "C" IF FN$STR__LOC (IMAGE, "Nopurg") > 0 THEN T__NOPURG = "N" END END END END .BLANK.NO JUSTIFY.NO FILL The only 'trick' here, if there is one, is the use of FN$STR__LOC to find certain text strings no matter where they occur in a field, rather than the usual record definition layout where fields occur in specific positions, and then FN$STR__EXTRACT can be used to extract something from the input record and put it into a fixed field in the output record. .COMMENT ~~~~~ .BLANK.TEST PAGE 3.CENTER A more difficult example: a non-text file. .PARAGRAPH When deriving a record definition for an existing data file, it helps of course to have some idea of what is in the file. A record definition, even if written for another language (COBOL, C, FORTRAN, etc.) would be most, if not all, of what is needed. But for this example, I'm going to use an example where no starting information exists. .PARAGRAPH Even if no information exists, the operating system will tell you something about the file: for example, DIRECTORY/FULL. .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 10 Directory SYS$SYSROOT:[SYSERR] .BLANK RPCSWL.LOG;1 File ID: (1206,174,0) Size: 30/33 Owner: [SYSTEM] Created: 21-MAR-1990 09:43:22.71 Revised: 30-AUG-1990 09:50:46.22 (33) File organization: Sequential Record format: Variable length, maximum 269 bytes Record attributes: Carriage return carriage control .BLANK.JUSTIFY.FILL I've deleted a few of the listing lines to bring it down to the relevant information. The listing tells me that this is a sequential file, that records can vary in length, and gives the maximum record size. This last item is very useful for record definitions. Another very useful utility is ANALYZE/RMS__FILE/FDL, which produces a File Descriptor Language file with similar information. .PARAGRAPH If the file in question is an indexed file, then the ANALYZE utility will tell you how many indexed fields there are, their position in the file, and the data type. This is very useful, because it gives you a known field within the file as a point of reference. It happens that the sample file I've chosen is sequential, so we'll have to derive all of the information for ourselves. .BLANK.TEST PAGE 3.CENTER Exploring with the Editor. .PARAGRAPH The next utility to try is a text editor. It's difficult to display the editor screen on paper, but if I use the TPU editor on this file I get something which looks like this (after I substitute the tilde ("~") character for all characters which won't print on my printer): .BLANK.NO JUSTIFY.NOFILL.TEST PAGE 6 ~~~~~~~~~~e~~s~ß~T~~~~6~~~~À~ È6(tT~~~~~~5DTS66$DUA0:[SYS0.SYSCOMMON ~~~~~~~~~~e~~s~ß~[~~~~6~~~~À~@¹×ê~@~~~~~~5DTS66$DUA0:[SYS0.SYSCOMMON ~~~~~~~~~~e~~s~ß~U~~~~6~~~~À~~ÃJÎ~@~~~~~~5DTS66$DUA0:[SYS0.SYSCOMMON ~~~~~~~~~~e~~s~ß~}~~~~6~~~~À~~f~J~@~~~~~~5DTS66$DUA0:[SYS0.SYSCOMMON ~~~~~~~~~~e~~s~ß~U~~~~6~~~~À~~f~%~@~~~~~~5DTS66$DUA0:[SYS0.SYSCOMMON .BLANK.JUSTIFY.FILL Since this wraps past 80 columns, I also shifted the data to the left until I saw the entire record. .BLANK.NO JUSTIFY.NOFILL.TEST PAGE 6 A0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC.EXE;2~~~~~~~~~~~~RPC$SWL~~~~ A0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC.EXE;2~~~~~~~~~~~~RPC$SWL~~~~ A0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC.EXE;2~~~~~~~~~~~~RPC$SWL~~~~ A0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC.EXE;2~~~~~~~~~~~~RPC$SWL~~~~ A0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC.EXE;2~~~~~~~~~~~~RPC$SWL~~~~ .BLANK.TEST PAGE 6 ~~~~~~~RPC$SWL~~~~~~~~~SYSTEM ~~%RPC$__SWL-I-RESTART, RPC__SWL ~~~~~~~RPC$SWL~~~~~~~~~SYSTEM ~~%RPC$__SWL-I-RESTART, RPC__SWL ~~~~~~~RPC$SWL~~~~~~~~~SYSTEM ~~%RPC$__SWL-I-RESTART, RPC__SWL ~~~~~~~RPC$SWL~~~~~~~~~SYSTEM ~~%RPC$__SWL-I-RESTART, RPC__SWL ~~~~~~~RPC$SWL~~~~~~~~~SYSTEM ~~%RPC$__SWL-I-RESTART, RPC__SWL .BLANK.TEST PAGE 6 ESTART, RPC__SWL started ESTART, RPC__SWL started ESTART, RPC__SWL started ESTART, RPC__SWL started ESTART, RPC__SWL started .BLANK.JUSTIFY.FILL There is actually more data than is shown here: many of the 'non printing' characters I had to blank out show up on a CRT screen: in particular, bytes which contain nulls print as a "NL" (in one character). This makes it easy to spot sections of records which are just null filler. .PARAGRAPH Now what can be learned from this? First, it's easy to see that the data 'lines up' in columns. This is good news: it means we can define fields in a fixed position in the record definition and it will match all of the records. With some other types of records, the data isn't always in fixed positions. Something else which can be seen is that some of the data is in text, and it's easy to count to the column where it starts and ends. Finally, at the beginning of the records we can see some characters which change from record to record, and don't seem to make sense as text: these probably represent binary numerical data. .BLANK.TEST PAGE 3.CENTER The DUMP utility. .PARAGRAPH This utility is very helpful in looking at the data in a file. .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 10 Dump of file SYS$SYSROOT:[SYSERR]RPCSWL.LOG;1 on 18-OCT-1990 File ID (1206,174,0) End of file block 30 / Allocated 33 .BLANK Record number 1 (00000001), 269 (010D) bytes .BLANK DFF07300 00650000 00000000 00000000 ..........e..s.ß 000000 7E66A003 C0000000 0036FE00 00005501 .U....6....À..f~ 000010 24363653 54443500 01000400 93401825 %.@......5DTS66$ 000020 4F435359 532E3053 59535B3A 30415544 DUA0:[SYS0.SYSCO 000030 50525D45 58455359 535B5D2E 4E4F4D4D MMON.][SYSEXE]RP 000040 00323B45 58452E43 4F52504C 57532443 C$SWLPROC.EXE;2. 000050 53244350 52070000 00000000 00000000 ...........RPC$S 000060 45545359 530C0000 00000000 00004C57 WL.........SYSTE 000070 535F2443 50522584 01202020 2020204D M ..%RPC$__S 000080 5052202C 54524154 5345522D 492D4C57 WL-I-RESTART, RP 000090 20202064 65747261 7473204C 57535F43 C__SWL started 0000A0 20202020 20202020 20202020 20202020 0000B0 20202020 20202020 20202020 20202020 0000C0 20202020 20202020 20202020 20202020 0000D0 20202020 20202020 20202020 20202020 0000E0 20202020 20202020 20202020 20202020 0000F0 20 20202020 20202020 20202020 ... 000100 .BLANK.JUSTIFY.FILL I've only shown one record here to save space, but normally you would want to look at several records to find things which match up. You would also dump the same records with /BYTE and /WORD qualifiers (the default shown here is /LONG) to see if data falls on byte or word boundaries, and you might also dump the data in /DECIMAL to see if there are clearly increasing numbers, or numeric data you recognize. .PARAGRAPH If you see text, you should also check the value of the byte just before the text starts to see if it's value is the same as the length of the field. For example, the same record above dumped with /DECIMAL and /BYTE looks like this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 8 0 0 0 0 0 0 0 0 ........ 0 -33 -16 115 0 0 101 0 0 ..e..s.ß 8 0 54 -2 0 0 0 85 1 .U....6. 16 126 102 -96 3 -64 0 0 0 ...À..f~ 24 1 0 4 0-109 64 24 37 %.@..... 32 36 54 54 83 84 68 53 0 .5DTS66$ 40 89 83 91 58 48 65 85 68 DUA0:[SY 48 79 67 83 89 83 46 48 83 S0.SYSCO 56 83 91 93 46 78 79 77 77 MMON.][S 64 80 82 93 69 88 69 83 89 YSEXE]RP 72 79 82 80 76 87 83 36 67 C$SWLPRO 80 0 50 59 69 88 69 46 67 C.EXE;2. 88 0 0 0 0 0 0 0 0 ........ 96 83 36 67 80 82 7 0 0 ...RPC$S 104 0 0 0 0 0 0 76 87 WL...... 112 69 84 83 89 83 12 0 0 ...SYSTE 120 1 32 32 32 32 32 32 77 M . 128 83 95 36 67 80 82 37-124 .%RPC$__S 136 83 69 82 45 73 45 76 87 WL-I-RES 144 80 82 32 44 84 82 65 84 TART, RP 152 116 115 32 76 87 83 95 67 C__SWL st 160 32 32 32 100 101 116 114 97 arted 168 32 32 32 32 32 32 32 32 176 .... .BLANK.JUSTIFY.FILL In the second character of line '40' the byte has a value of 53, and the character string which follows starting "DTS66" is 53 characters long. Similarly, the third byte in line '104' is 7 and the string "RPC$SWL" is seven characters long, and the third byte in line '120' is 12 and what follows is 12 characters long if you include the trailing blank spaces. One common type of data on VMS systems is the "counted ASCII string", where the first byte gives the length of the string, and these fields are probably this data type: it is at least a reasonable first guess, even though all of the fields in this particular file end up being the same length. .BLANK.TEST PAGE 3.CENTER First record definition. .PARAGRAPH Since some of the fields are text, we can write a preliminary record definition and start using Datatrieve to look at the file. .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD RPC1__RECORD OPTIMIZE 01 RPC1__REC. 10 FX. 20 FILLER PIC X(41). 10 HIDE1. 20 FILLER PIC X(96). 10 TEXT1 REDEFINES HIDE2. 20 C1 USAGE BYTE. 20 T1 PIC X(64). 20 C2 USAGE BYTE. 20 T2 PIC X(15). 20 C3 USAGE BYTE. 20 T3 PIC X(12). 20 W2 USAGE WORD. 10 STATUS__TEXT PIC X(132) EDIT__STRING T(48). 10 PRINT1. .TEST PAGE 4 20 TASK__NAME COMPUTED BY FORMAT FN$STR__EXTRACT(T1, 1, C1) USING X(64) EDIT__STRING T(48). .TEST PAGE 4 20 PROCESS__NAME COMPUTED BY FORMAT FN$STR__EXTRACT(T2, 1, C2) USING X(15) EDIT__STRING T(15). .TEST PAGE 4 20 OWNER COMPUTED BY FORMAT FN$STR__EXTRACT(T3, 1, C3) USING X(12) EDIT__STRING T(12). ; .BLANK.JUSTIFY.FILL This record definition has a 'trick' in it. Since the text fields are potentially of variable length, and are padded with nulls, I want to convert them to fields which are text only. I also would like to have a record definition where only the fields I want print out by default. When there is a REDEFINES clause in a record definition, the first group is the one that prints by default. So by doing the following: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 4 10 HIDE1. 20 FILLER PIC X(96). 10 TEXT1 REDEFINES HIDE2. .BLANK.JUSTIFY.FILL it is the HIDE1 group which prints by default: but that group contains only FILLER, so it won't print at all. Then I put 'real' fields under the redefining TEXT1 header. The final COMPUTED#BY field definition looks a bit odd, but it's what is needed to over-ride the built-in string length and edit string limitations of the FN$STR_EXTRACT function. This function extracts a string of a given length from an input string: I have the strings extract starting at position 1 to the size of the string given by the byte containing the string length in the record (C1, C2, or C3). .PARAGRAPH This record definition will give me the following data: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 6 STATUS__TEXT : %RPC$__SWL-I-RESTART, RPC__SWL started TASK__NAME : DTS66$DUA0:[SYS0.SYSCOMMON.][SYSEXE]RPC$SWLPROC. EXE;2 PROCESS__NAME : RPC$SWL OWNER : SYSTEM .BLANK.JUSTIFY.FILL .BLANK.TEST PAGE 3.CENTER Decoding binary fields. .PARAGRAPH So far so good: but now I have to start looking at the binary fields. Usually the first step is to define a number of longword integer fields, to see if anything comes out making sense. From looking at the file, the first 10 bytes always seem to be null, so I'm going to skip them and work from the "right" (the first text field I know comes out correctly) and work back. .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD RPC2__RECORD OPTIMIZE 01 RPC2__REC. 10 FX. 20 FILLER PIC X(41). 10 BIN1 REDEFINES FX. 20 FILLER PIC X(9). 20 A USAGE LONG. 20 B USAGE LONG. 20 C USAGE LONG. 20 D USAGE LONG. 20 E USAGE LONG. 20 F USAGE LONG. 20 G USAGE LONG. 20 H USAGE LONG. 10 HIDE1. 20 FILLER PIC X(96). [remainder of record definition remains unchanged] .BLANK.JUSTIFY.FILL If I print out some of these new fields, I get this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 14 DTR> for rpc2 print e, f, g, h .BLANK E F G H .BLANK 62914560 629040800 9650200 65540 62914560 1243440800 9650200 65540 62914560 -833961088 9650202 65540 62914560 -354961088 9650202 65540 62914560 1547094048 9655412 65540 62914560 1932194048 9655412 65540 62914560 -1083305504 9657423 65540 62914560 -577505504 9657423 65540 62914560 1388511904 9661051 65540 .BLANK.JUSTIFY.FILL Something else which is worth doing is to look at the binary values of some of these fields: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 14 DTR> for rpc2 print e, f, g, h, fn$hex(h) .BLANK E F G H FN$HEX .BLANK 62914560 629040800 9650200 65540 10004 62914560 1243440800 9650200 65540 10004 62914560 -833961088 9650202 65540 10004 62914560 -354961088 9650202 65540 10004 62914560 1547094048 9655412 65540 10004 62914560 1932194048 9655412 65540 10004 62914560 -1083305504 9657423 65540 10004 62914560 -577505504 9657423 65540 10004 62914560 1388511904 9661051 65540 10004 .BLANK.JUSTIFY.FILL Now the next part depends somewhat on guesswork (or trial and error), but since the text fields have to do with starting a process, and processes have UICs, and the value of field H looks like a valid UIC for the SYSTEM account which is starting this process: so I'm going to assign it that meaning. One way to check this is to link in the FN$FAO function (which has been published in the Newsletter and included in the SIG tapes), as it can change a binary UIC to an identifier: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 14 DTR> for rpc2 print e, f, g, h, fn$fao("!%I", h, 0,0,0,0,0,0,0) .BLANK E F G H String .BLANK 62914560 629040800 9650200 65540 [SYSTEM] 62914560 1243440800 9650200 65540 [SYSTEM] 62914560 -833961088 9650202 65540 [SYSTEM] 62914560 -354961088 9650202 65540 [SYSTEM] 62914560 1547094048 9655412 65540 [SYSTEM] 62914560 1932194048 9655412 65540 [SYSTEM] 62914560 -1083305504 9657423 65540 [SYSTEM] 62914560 -577505504 9657423 65540 [SYSTEM] 62914560 1388511904 9661051 65540 [SYSTEM] .BLANK.JUSTIFY.FILL Next, let's look at the two fields just before the UIC. G is slowly increasing, while F takes on a large number of values which vary a lot. As previously mentioned, VMS stores a standard date and time value in a quadword, which is two long words, and the low order word comes first. Since a date and time are likely to be stored in a log, it makes sense to try looking at these two fields as one date and time: .BLANK.NO JUSTIFY.NO FILL RECORD RPC3__RECORD OPTIMIZE 01 RPC3__REC. 10 FX. 20 FILLER PIC X(41). 10 BIN1 REDEFINES FX. 20 FILLER PIC X(9). 20 A USAGE LONG. 20 B USAGE LONG. 20 C USAGE LONG. 20 D USAGE LONG. 20 E USAGE LONG. 20 DATE USAGE DATE. 20 UIC USAGE LONG. 10 HIDE1. [remainder of record is unchanged] .BLANK.TEST PAGE 10 DTR> for rpc3 print date, fn$time(date) .BLANK DATE FN$TIME .BLANK 21-Mar-1990 09:43:22.89 21-Mar-1990 09:44:24.33 21-Mar-1990 10:02:25.08 21-Mar-1990 10:03:12.98 16-Apr-1990 07:33:51.65 16-Apr-1990 07:34:30.16 26-Apr-1990 07:31:56.03 [records removed to save space] 17-Aug-1990 10:51:29.72 24-Aug-1990 08:19:59.68 24-Aug-1990 08:20:14.92 .BLANK.JUSTIFY.FILL Since this field in all of the records comes out as a valid date which increases in time, and all of the dates fall between the file creation date and last update date, this probably is a date and time field. .PARAGRAPH Hopefully, when decoding a file in this manner at least the nature of some of the data fields will be known. When I started this exercise, I did not know any of the data fields: I have since come across some information which confirms what I have shown so far, and which indicates that probable nature of fields D and E. I did not pursue this record definition any further because it was giving me the information I needed at the time, but decoding other binary fields is largely a matter of trial-and-error, following the procedure shown above. I did "clean up" the formatting of some of the fields, however, so that it now looks like this: .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD RPCA__RECORD OPTIMIZE 01 RPCA__REC. 10 FX. 20 FILLER PIC X(29). 10 BIN1 REDEFINES FX. 20 FILLER PIC X(21). 20 PC__LONGWORD USAGE LONG. 20 PSL__LONGWORD USAGE LONG. 10 PRINT1. 20 DATE1 USAGE DATE. 20 TIME1 COMPUTED BY FN$TIME(DATE1) EDIT__STRING T(11). .TEST PAGE 3 20 UIC COMPUTED BY FN$FAO("!%U", UIC__LONGWORD, 0,0,0,0,0,0,0) EDIT__STRING X(9). .TEST PAGE 3 20 ACNT COMPUTED BY FN$FAO("!%I", UIC__LONGWORD, 0,0,0,0,0,0,0) EDIT__STRING X(14). 20 PC COMPUTED BY FN$HEX(PC__LONGWORD) EDIT__STRING T(8). 20 PSL COMPUTED BY FN$HEX(PSL__LONGWORD) EDIT__STRING T(8). 10 HIDE2. 20 UIC__LONGWORD USAGE LONG. 20 FILLER PIC X(96). 10 TEXT2 REDEFINES HIDE2. 20 X1 USAGE LONG. 20 C1 USAGE BYTE. 20 T1 PIC X(64). 20 C2 USAGE BYTE. 20 T2 PIC X(15). 20 C3 USAGE BYTE. 20 T3 PIC X(12). 20 W2 USAGE WORD. 10 STATUS__TEXT PIC X(132) EDIT__STRING T(48). 10 PRINT2. .TEST PAGE 4 20 TASK__NAME COMPUTED BY FORMAT FN$STR__EXTRACT(T1, 1, C1) USING X(64) EDIT__STRING T(48). .TEST PAGE 4 20 PROCESS__NAME COMPUTED BY FORMAT FN$STR__EXTRACT(T2, 1, C2) USING X(15) EDIT__STRING T(15). .TEST PAGE 4 20 OWNER COMPUTED BY FORMAT FN$STR__EXTRACT(T3, 1, C3) USING X(12) EDIT__STRING T(12). ; .BLANK.JUSTIFY.FILL .BLANK.TEST PAGE 3.CENTER Second example: the ANALYZE/DISK__STRUCTURE/USAGE file. .PARAGRAPH The ANALYZE utility will produce a data file containing statistics on all files on a volume: specifically, the owner of the file, and how much disk space is allocated to and used by the file. This can be very useful in determining which users are taking up disk space: but no utility is provided to summarize the data. Datatrieve will easily report the data once there is a record definition to match the data file. .PARAGRAPH When I first decoded this file, I don't think there was any information available on it's format (or at least I didn't find any). It happens now that there is some format information available, in the form of a C language header file. A portion of SYS$LIBRARY:USGDEF.H is shown below: .BLANK.NO JUSTIFY.NO FILL struct usgdef { unsigned char usg$b_type; /* Record type */ unsigned long int usg$l_serialnum; /* Pack serial number */ char usg$t_strucname [12]; /* Volume set name */ char usg$t_volname [12]; /* Volume name */ char usg$t_ownername [12]; /* Volume owner name */ char usg$t_format [12]; /* Volume format type */ unsigned int usg$q_time [2]; /* Time usage file created */ } ; struct usgdef1 { char usgdef$$_fill_1; unsigned long int usg$l_fileowner; /* File owner UIC */ unsigned long int usg$l_allocated; /* Blocks allocated */ unsigned long int usg$l_used; /* Blocks used */ unsigned short int usg$w_dir_len; /* Length of directory */ /* string */ unsigned short int usg$w_spec_len; /* Length of complete */ /* file specification */ char usg$t_filespec [406]; /* File spec */ /* "[dir]nam.typ;ver" */ } ; .BLANK.JUSTIFY.FILL If we dump a portion of the usage file and compare it with the above definitions, we see a good match. There are several problems which can occur with this type of data file, however. The first is that there are two different types of records in the same file: this is usually called "unnormalized" data. Fortunately, the REDEFINES clause will help us get around this: we can effectively define two types of record within the same file. Another problem is that the complete file specification is of variable size, and unlike the previous example is not blank or null filled: however, a similar COMPUTED#BY field can take care of this. A related problem is that there are not separate fields for the file directory and the remainder of the file specification, but a little effort with COMPUTED#BY fields will also solve this. .PARAGRAPH The record definition I finally derived looks like this: .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD DISK__USE__RECORD OPTIMIZE 01 DISK__USE__REC. 10 TYPE USAGE BYTE. 10 COVERUP. 20 FILLER PIC X(120). 10 IDENTIFICATION REDEFINES COVERUP. 20 SERIAL__NUMBER USAGE LONG. 20 STRUCTURE__NAME PIC X(12). 20 VOLUME__NAME PIC X(12). 20 OWNER__NAME PIC X(12). 20 FORMAT PIC X(12). 20 CREATION__DATE USAGE DATE. 20 CREATION__TIME COMPUTED BY FN$TIME(CREATION__DATE). 10 FILE__RECORD REDEFINES COVERUP. 20 FILEOWNER USAGE LONG. .TEST PAGE 3 20 IDENTIFIER COMPUTED BY FN$FAO("!%I", FILEOWNER, 0,0,0,0,0,0,0) EDIT__STRING X(24). 20 ALLOCATED USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 20 USED USAGE LONG EDIT__STRING ZZZ,ZZZ,ZZ9. 20 DIR__LEN USAGE WORD. 20 SPEC__LEN USAGE WORD. 20 FILESPEC PIC X(96) EDIT__STRING T(76). .TEST PAGE 3 20 DIRECTORY COMPUTED BY FORMAT FN$STR__EXTRACT(FILESPEC, 1, DIR__LEN) USING X(64) EDIT__STRING T(64). .TEST PAGE 4 20 FILE__SPECIFICATION COMPUTED BY FORMAT FN$STR__EXTRACT(FILESPEC, (DIR__LEN + 1), SPEC__LEN) USING X(64) EDIT__STRING T(64). ; .BLANK.JUSTIFY.FILL Defining the record as all filler within group COVERUP and then defining two groups IDENTIFICATION and FILE__RECORD solves the problem of having two different data types within the file: one can say FOR DISK__USAGE WITH TYPE = 2 PRINT FILE__RECORD, for example, to use the correct field layout on the appropriate records. The two fields DIRECTORY and FILE__SPECIFICATION use FN$STR__EXTRACT in a manner similar to the first example to remove the appropriate portions of the file specification field. Notice how DIR__LEN is used as the length of the DIRECTORY field, and then as the starting position of the FILE__SPECIFICATION field. .BLANK.TEST PAGE 3.CENTER Converting date formats and CHOICE. .PARAGRAPH A problem which is often encountered is reading a file containing a date format which contains a date and/or time which is not in the standard VMS data type. While Datatrieve can convert many kinds of date formats directly, there are some which are difficult to handle. I discovered that many of the data files used by ALL-IN-1 have such date fields, which look like this: .BLANK 1990070809101112 .BLANK and work out to be the year, month, day, hour, minute, second, and fraction of a second, in ASCII characters. Now this data type does allow sorting in date and time order when the data is in ASCII characters, but doesn't allow for indexed retrieval and range retrievals as easily as the VMS quadword date type. .PARAGRAPH Converting this data to a standard VMS quadword date using Datatreive would be quite easy if it weren't for the fact that ALL-IN-1 sometimes leaves the fields blank. When the fields are treated as numbers, blanks don't translate to anything reasonable. Fortunately, the CHOICE statement can be used within a record definition to cause more than one value to be assigned to a COMPUTED#BY field depending on what data is encountered. What follows is part of one such record definition: .BLANK.NO JUSTIFY.NO FILL REDEFINE RECORD AI1__NETWORK__RECORD OPTIMIZE 01 AI1__NETWORK__REC. 10 USER__NAME PIC X(30). 10 NODE PIC X(6). 10 DATE1. 20 FILLER PIC X(16). .TEST PAGE 6 20 LAST__UPDATE COMPUTED BY CHOICE OF (IY1 EQ " ") THEN FN$DATE("18-NOV-1858 00:00:00.00") ELSE FN$DATE(ID1 | "-" | IM1 VIA MONTH__TABLE | "-" | IY1 | " " | IH1 | ":" | II1 | ":" | IS1 | "." | IF1) END__CHOICE. .TEST PAGE 3 20 UPDATE__TIME COMPUTED BY FN$TIME(DATE1.LAST__UPDATE) EDIT__STRING X(11). 10 DI1 REDEFINES DATE1. 20 IY1 PIC XXXX. 20 IM1 PIC XX. 20 ID1 PIC XX. 20 IH1 PIC XX. 20 II1 PIC XX. 20 IS1 PIC XX. 20 IF1 PIC XX. ! ! other fields followed these. ! .BLANK.JUSTIFY.FILL The CHOICE statment gives one of two results, depending on whether the date field is blank or not: if it is, the arbitrary base date and time for the VMS quadword date is inserted; if not, then the actual date and time is computed using the input fields. Notice how the date and time string is built up by concatenating all of the necessary data fields and delimiting characters, and that the month is converted to a character string via a table. The table is quite simple: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 8 REDEFINE TABLE MONTH_TABLE 01 : "JAN", 02 : "FEB", 03 : "MAR", [ etcetera ] 12 : "DEC" END_TABLE .BLANK.JUSTIFY.FILL It is important that the months be in upper case for this particular function to work. .PARAGRAPH The data file looks like this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 7 BROWN DTS66 1990070809101112 CHAN DTS66 JONES DTS66 1990090800000000 LEDERMAN DTS66 1990102910111213 SMITH DTS66 WILLIAMS DTS66 1989010100000000 .BLANK.JUSTIFY.FILL but when processed with Datatrieve it looks like this: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 10 USER LAST UPDATE NAME NODE UPDATE TIME .BLANK BROWN DTS66 8-Jul-1990 09:10:11.12 CHAN DTS66 18-Nov-1858 00:00:00.00 JONES DTS66 8-Sep-1990 00:00:00.00 LEDERMAN DTS66 29-Oct-1990 10:11:12.13 SMITH DTS66 18-Nov-1858 00:00:00.00 WILLIAMS DTS66 1-Jan-1989 00:00:00.00 .BLANK.JUSTIFY.FILL .COMMENT ~~~~ .BLANK.TEST PAGE 3.CENTER Possible alignment problems (if dealing with PDP-11s). .PARAGRAPH The length of each field is of interest when using the REDEFINES clause, when the data must be read by other programs, or when you are trying to match an existing file. There is a hidden problem which sometimes occurs that should be noted: .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 6 DEFINE RECORD GOTCHA 01 GOTCHA. 03 A PIC X. 03 B PIC 99 USAGE IS INTEGER. ; .BLANK.JUSTIFY.FILL It might be thought that this record is 3 bytes long, and on a VAX it is, but on a PDP-11 it isn't: it is 4 bytes long, because of something called word alignment. INTEGER, REAL, DOUBLE and DATE must start on a word boundary, which is an even number of bytes: if they don't, Datatrieve inserts a hidden byte to align the data; in this case, between fields A and B. This byte takes up space in the record but can never be accessed, unless you change the definition. This will also occur on the VAX if the ALIGN clause is used to force integers to fall on word boundaries, which is sometimes done with other languages. I find that the best way to work out record length is with a pencil, and a piece of graph paper ruled in squares, or a printer form layout sheet, or CRT display form, or an old coding sheet. By marking off the fields, using one square per byte, the length of each field and it's alignment is easily determined. This is especially important with the REDEFINES clause, which will be shown later.