.PAGE SIZE 62, 60 .RIGHT MARGIN 60 .CENTER ^&Commonly Asked DATATRIEVE Questions _& Answers\& .BLANK 2.CENTER Session Chair: .BLANK.CENTER Larry Jasmann .BLANK.CENTER U.S. Coast Guard .BLANK.CENTER Burke VA .BLANK 2.CENTER Joe H. Gallagher .BLANK.CENTER 4GL Solutions .BLANK.CENTER Kansas City, MO .BLANK 2.CENTER Andy Schneider .BLANK.CENTER Developer, VAX-DATATRIEVE .BLANK 2.CENTER Dick Azzi .BLANK.CENTER Motorola .BLANK.CENTER Phoenix, AZ. .BLANK 2.CENTER Chris Wool .BLANK.CENTER E.I. DuPont .BLANK.CENTER Wilmington, DE .BLANK 2.CENTER B.#Z.#Lederman .BLANK.CENTER Brooklyn, N.Y. .BLANK 2.CENTER Transcribed by B.#Z.#Lederman .TITLE Commonly Asked DATATRIEVE Questions _& Answers .SUBTITLE DT007 Spring 1986 Dallas .NOTE Abstract .BLANK 2 This is a transcription of a panel presentation which answers some of the most common questions asked about DATATRIEVE. Some of the material has been reordered when that would group logical subjects together. The transcription may paraphrase some questions or answers for clarity, and the transcriber apologizes in advance for any misspelled names. This paper follows the usual convention of placing square brackets around interpretations or material supplied by the editor. Throughout this paper DTR is an abbreviation for DATATRIEVE. .END NOTE .RIGHT MARGIN 55 .BLANK 3.TEST PAGE 5.CENTER Why is DATATRIEVE so slow? .BLANK (Larry:) DTR has a lot of power, and does a lot of things, but it also "sits on top of" one of three other products: RMS, DBMS, or Rdb. If you use DTR in such a way as to cause, for example, RMS to do a sequential search of a file containing 20,000 records, you should not be surprised if it takes a long time to respond with an answer. If you are a programmer [in a traditional language] you probably wouldn't do such a silly thing when writing a program, but when you are using DTR interactively on a large file it's really easy to do this. Joe has some slides which show the difference between retrieving data with keys and without keys. [Figure 1]# You can see that when you get beyond 1000 records, the amount of time to access a file sequentially skyrockets compared with keyed retrieval. (Joe:) This example is in fact a CROSS, that is you are doing a relational join between two domains: the second domain does not have a key in one case, and does in the other. The performance ratio is essentially the same as for a simple lookup doing a single keyed retrieval compared with a single sequential search. The point to be made is that DTR is only as fast as what it sits on: it's because DTR hides some of the details of what is going on below it that many times it's possible to do something that seems perfectly reasonable to you, but is very slow because the file design is not appropriate for that function, and performance suffers considerably. (Larry:) A corollary: it's not altogether clear unless you've studied it which constructs in DTR will cause sequential searches, and this is something you need to know well if you have a big file. (Dick:) My first answer to this normally is: if you are on a VT100, try hitting the NO#SCROLL key again. That happens quite often: that will slow the system [your application]. Along with the sequential portion, the number of keys [in an indexed file] has a direct bearing on how fast the application will be: this doesn't matter much on a read, but on a write, the more keys there are the slower it will be. [Figure 2]# (Joe:) if you are going to retrieve records on both keys, the time needed to store the extra key will be well worth it in the retrieval. However, if you are not going to retrieve records on that key, you are going to pay an overhead price. It's important to choose the keys carefully, to use only those which will be used for retrieval. These underlying factors in file design determine how fast the application will be in Datatrieve. (Andy:) One of the advantages to using keys is that RMS will do sorting for you. When you create a primary key and DTR says "give me the records" RMS gives them back in sorted order. If you have an application and you have a primary key, and you enter a command which sorts on that key, why would it take so long? DTR isn't super smart: if you explicitly order DTR to sort the data, it isn't able to tell that it's already sorted. So: don't go out and automatically sort everything, figure out what are primary keys, and don't sort fields that are already sorted. How do you know when your retrieval is using keys? One way is to do this: instead of just running the DTR image, run it with DEBUG. .BLANK DEBUG SYS$SYSTEM:DTR32 .BLANK You get some VAX#DEBUG headers and messages, and then the prompt: simply say "GO", and you will be in DTR. What you are doing is initializing the DEBUGGER, and then you will be in DATATRIEVE. What happens is that when you perform an RSE, if a key is being used you will receive an informational message on your terminal for every key being used in your RSE or Boolean or whatever. If you were assuming that three keys are being used, this way DTR will tell you if those keys are actually being used or not. If it isn't using it, then perhaps there is a flaw in your design, and you can go back and work on it. This is a debugging technique to see if what you are doing is what you thought you were doing. .PARAGRAPH (Larry:) Another thing you need to know is FIND and FOR statements. If you do a FIND and create a current collection, subsequent operations on that collection are going to be done sequentially. [Figure 3]# This is usually the minimum amount of time you will save with a FOR, but there are other savings that are also obtained. You should remember that any operation on the collection is not keyed, even if it looks as if it was. The only time a FIND is better than a FOR is if you have a very large domain, and you can do a FIND to collect a relatively small number of records, and are going to do several operations on that small collection. For example, if you have 10,000 records and want to work on a subset of 50 or 60 records, it makes sense to use a FIND, otherwise not. (Dick:) While we are talking about FINDs, it's important to remember that when you do a SORT, even to do a PRINT (for example, PRINT#FIRST#5#---#SORTED#BY#field), that DTR is going to do the SORT first. If you can do a FIND and reduce the number of records you are going to be using, and then SORT that small number, you will save a lot of time because DTR will not have to sort the whole file. .PARAGRAPH (Andy:) another bottleneck is access to the dictionary (the CDD). It's crucial, especially at initial access time, that the dictionary not be "top heavy". A lot of people make the mistake of putting everything into CDD$TOP, and then when you want to ready a domain the amount of time that it takes for CDD to access the pieces in the dictionary is extremely high. If you have a lot of stuff in CDD$TOP and not much in subdirectories, create a good tree structure and move stuff down the tree. .PARAGRAPH (Joe:) There is one area I run into that most users don't run into. From a scientific and medical standpoint, we have some users who do calculations in DTR rather than some other language like FORTRAN), so in fact they are doing a lot of calculations in DTR. In many cases they created complex procedures where all of the temporary variables are declared something like PIC#999V999 (string variables). If for some reason you have to do heavy calculations in DTR you gain a substantive return by converting those variables to COMP variables (REAL, INTEGER) because DTR does a fair amount of conversion, and these are CPU intensive activities where numbers in one format has to be converted to other formats with a lot of sanity checks. If you are going to do a lot of calculations (such as a data base of scientific data) you get a performance improvement by making the variables the appropriate data type. .PARAGRAPH (Joe:) If you think you are running slow, and you don't know if it's you or other programs [on the system] there is a pair of functions within VAX-DTR which will allow you to initialize a timer and then show the amount of elapsed time between the initialization and the show time. [For example, the following procedure was used when testing the CROSS statement with and without keys to obtain the data shown in Figure 1.] .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 7 FN$INIT__TIMER FN$SHOW__TIMER PRINT field1, field2 of domain1 CROSS domain2 OVER field1 FN$SHOW__TIMER .BLANK.JUSTIFY.FILL You can place them around various sections of code and find those sections that are actually running slow. It will give you information about elapsed clock time, CPU time, page faults, etc., and that's very helpful. If something is not running as fast as you think it ought to, you can go and look and decide if it's your process or someone else who is hogging the system [if CPU time is small but elapsed time is large]. (Bart:) on DTR-11 and PRO-DTR you don't have INIT__TIMER, but you can do remote DTR and the log file will have some information on times and what DTR is doing with the retrievals. This will also work for VAX-DTR, and is another way to find out what is going on inside DTR. You can always do a remote DTR to your own node. .BLANK 3.TEST PAGE 5.CENTER Why can't I put a READY inside my BEGIN-END loop? .BLANK (Andy:) [A suggested source for information on this point is the VAX-DTR internals session], but basically this has to do with the difference between commands and statements. In a nutshell, the reason you can't put a command like READY within a BEGIN-END block is because commands go through one path [when being processed by DTR] and statements go through another. When you put a BEGIN-END around something, what you have done is created one big statement out of everything within the BEGIN-END block. If you stick a command in there and DTR runs across it, it's not down the right path [internally] to execute it. (Larry:) the main thing is to understand that there are such things as commands and statements and that one can't go in the other. With the way the language is constructed there is little need to do that: there are ways around it. (Joe:) A simpler explanation is that statements manipulate the data within an environment, and commands change that environment. By putting a command within a BEGIN-END loop you've changed the environment while trying to work in it. .BLANK 3.TEST PAGE 5.CENTER Can I read DTR files from another language? .BLANK (Dick:) There is no such thing as DTR files. There are RMS files, Rdb files, DBMS, etc. DTR does not create files of it's own. You can read RMS from BASIC, COBOL, or any language. The converse of that is DTR can read files created by other languages: even the editor if you are careful. (Bart:) the problem with the editor is that you may not align everything properly. You may also run into the problem where you create a record definition 80 bytes long, and you go into the editor and type data 80 bytes long but the editor creates a variable length file. When you ready the domain DTR will give you a warning message that the file types don't match, but it will then go ahead and read it anyway. If DTR gets a record which is too short, it pads it out (and may give an error message): if it's too long, DTR truncates the record (and in the past, especially on the PDP-11, tends to abort). If you are in doubt, put a FILLER field on the end of the record definition to make the record definition too long: you may get warning messages but DTR will go ahead and read the data. You can then write it to another file with fixed length records and DTR will be happy. (Larry:) a related question is, what if you have a nasty system manager who won't tell you what the file is like and you are trying to read it? The answer is, use the RMS utilities to find out how long the record is, then create a record definition of the same length with one big field with a PIC length the length of the record, EDIT__STRING#T(80) [to make the data fit on the usual CRT screen), ready the domain, print some records out, and by looking at it you can usually figure out where the fields are, and revise your record definition. (Bart:) remember to ready the domain read only and shared, until you are certain you have the record definition correct. You don't want to modify anything until you know what it is. .BLANK 3.TEST PAGE 5.CENTER Can you sort on a non-keyed field? .BLANK (Dick:) You can sort on any field you have in your record. (Not COMPUTED__BY fields on a PDP-11, but any real field.)# On a VAX, it should be any field. .BLANK 3.TEST PAGE 5.CENTER Why can't I prompt for a domain or a field? .BLANK (Andy:) DTR is really forgiving, but there are certain features intended to be used in some places and not others. Prompting, when you do a *.prompt, is for value expressions and value expressions only. A value expression is a value for a field, or a piece of text. They don't include things like key words or names of things, which is what a domain is. When you say READY *.---, what you are prompting for is a value expression, and DTR will say "oh no I won't!". Essentially, the contents of a quoted string is what it grabs, so when you prompt for anything it must be a value expression or piece. There are some workarounds, one being logical names: you prompt for a string, do an FN$--- to create a logical name translation, ready the logical name and DTR will translate one level of logical names. .BLANK 3.TEST PAGE 5.CENTER How many records can I have in my domain? .CENTER How large a record can I have? .BLANK (Bart:) Basically, the number of records you can have in your domain is limited by how large your disk is (or your disk quota if applicable). As for the size of the record: on the PDP-11 if it's very large you will run out of pool space. On the VAX there may be a limit, but I don't know anyone who has hit it. (Comment from audience indicating it had been reached.)# There is a system wide RMS limit on the maximum size for any record on a VAX, and I believe it's set around 32,000 bytes. As far as the number of records, it's limited by the amount of disk space, and I've done domains with over 130,000 records. (Comment from audience indicating a user with 6000+ byte records, stating that the application seemed a bit slow, but when the application was broken up into smaller pieces with relevant sections connected by crosses, it ran faster. Doesn't it make more sense to keep the record size smaller?)# (Bart:) It's partially record size, and partially the number of fields. If you don't need all 6000 bytes at once, breaking it up into smaller pieces that most logically go together will save you overhead. The other possibility is to have more than one record definition for the same file and use FILLER to skip over the pieces that aren't needed at that time, and that also cuts down the number of fields that DTR has to know about. Either of those approaches would give an improvement. (User: if you use filler, it cuts down the number of fields, but then you have the same number of FILLER fields.)# You use one FILLER field to skip over all of them. (Andy:) One important point we are looking at for 'way in the future is that access to the CDD is very inefficient for metadata. For every attribute you have for every field DTR has to make a call to the CDD. If you have 400 fields, and each field has a name, a query header, and edit string, a query name, missing value, default value, DTR makes one call for each [at least 2800 calls including the PIC clause]. If you can cut unnecessary attributes, or you have fields that aren't used often and you can skip over with FILLER, DTR jumps over FILLER and it's internal field tree is much, much simpler. Also, less memory is used, as it allocates a big block for each field, and this block is the same size for a field with no attributes or with many attributes. If you can eliminate fields, you save time and memory not allocating blocks. (Larry:) Besides, anyone with a record that has 6000 bytes in it needs to go back an re-evaluate how the data is being structured. (User:) the records were a complete record of our field engineers, including their education, experience, etc. In essence, we had 10 major areas of interest, and instead of 1 record we really had 10. It worked a lot faster [after we changed to 10 records.]# (Bart:) Not just speed but other considerations apply: if you give someone write access to that domain, they now have access to everything in there, and do you really want to give them everything at once? From the management standpoint you also want to separate the data. .BLANK 3.TEST PAGE 5.CENTER Can I do menu-driven applications in DTR? .BLANK (Larry:) Yes, and I know of about 4 different methods. The first is the way NOT to do it, and that is to use DCL and have it call DTR every time you need to do something (having the menu in DCL). This will work, but it's inefficient: you go through all of the overhead of starting up DTR whenever you want to do something. .PARAGRAPH I like to use the call interface, and a little program that feeds procedures back to DTR. Essentially, DTR tells the program what it wants to do next, and the program tells DTR to run it [a procedure] next. That works very well. .PARAGRAPH (Dick Azzi:) I like to "pre-compile" DTR. Andy mentioned that anything within a BEGIN-END block is treated as one statement, and DTR always has to parse the next statement it is going to work on. We take maybe 75 to 100 "programs", put them all within a large BEGIN-END block with a menu so DTR treats it all as one statement: it takes 15 to 20 minutes to parse that statement (we bring it up on Monday morning and leave it up until everyone goes home Friday night). Included on the menu are functions "sleep" and "pause", which bring up an FMS screen with a no-echo password so that the user can leave a terminal and get back in only if they enter the same password. This process works well in our application, where we treat DTR as the center of the universe: if we have to do something in DCL we will spawn out of DTR, work in DCL (things like word processing, PHONE, running another program), and then return to DTR where all of the pre-compiled statements are still active, all READYs are still there, etc. This gives a very quick turn-around on menu response. .PARAGRAPH (Larry:) Another method is with logical names [calling a procedure which has one fixed name: a logical name assignment is made from the menu to translate that logical name to the name of one of a set of "real" procedures]. There are probably other methods as well. .PARAGRAPH (Mike Nickolas, Bank Ohio:) Another method of doing menus in DTR is to have a simple procedure called DISPLAY__MENU which has a print statement to clear the screen, displays an abbreviated form of the procedure such as ":M1", and that procedure only does one function such as ADD [a record], which would require only a very short compilation time. .PARAGRAPH (Chris:) I'd like to make an exception about how not to do a menu [using DCL described above]. There are times when a DCL menu is appropriate. If you have several choices on the menu and only one is going to be DTR, the startup delay occurs only after the choice of that option is made. The main menu can come up very quickly in a DCL procedure, and if there is only one choice which goes to DTR, or there are many options only one or two of which go to DTR, the total delay is less than if you have to go in and out of DTR a lot. .PARAGRAPH Using logical names is very similar to the "pre-compile" method. The difference is that in your choice statement you use the FN$CREATE__LOGICAL, and at the bottom of the choice statement you invoke the logical name and DTR will execute the procedure name used in the create logical function: you can even go back and invoke the procedure you are in now. This appears to be recursive use of DTR but in fact is not really working recursively. .PARAGRAPH Susan Krantz, NKF Engineering: another way I use DCL and DTR together is in DBMS applications where I have a COBOL program using FMS doing an update function and then have a menu in the beginning asking if I want to use the update program or do I want to go into DTR and do my reporting. That's a good combination because they are either changing the database or they are doing report, and once you are in the report module everything is pre-compiled. .BLANK (Larry:) as Chris said, if you are doing one-shots then the DCL menu is good, but on the other hand if you are going to switch back back and forth between updating and reporting I'd use the call interface and integrate [DTR] right in [to the COBOL program]. .BLANK 3.TEST PAGE 5.CENTER Will the DTR Call Interface support new languages? .PARAGRAPH Ron Swift, Xerox: we use a FORTRAN interface for some of [those applications, such as were discussed for the previous question] which allows us to leave the domains open. It allows us to go through a menu, and appears to speed up tremendously what we're doing. My question is, will there ever be a "C" interface to DTR .PARAGRAPH (Andy:) Do you mean, will there ever be something which DTR ships as part of it's kit to allow you to automatically run with "C"? You can do it today, but you have to create your own DAB. The bottom line is: any language which conforms to the VAX calling standard can use callable DTR today, which means "C" can use it. We have chosen a subset to ship with our kit which means DABs, examples, and so forth. We have been asked for "C" in the past and that may come, although it doesn't stop you from using it today, it just means you have to do some legwork up front. .PARAGRAPH (Bart:) A little advertisement for the DECUS library: if the first person to do it would please submit it to the library then everyone else will get it. .BLANK 3.TEST PAGE 7.CENTER How do I used nested FOR loops? .BLANK.CENTER (i.e., how do I optimize access to two domains?) .PARAGRAPH Bob Brown, INTEL: In regards to optimization, could someone explain to me how nested FOR loops work, where you put the keys (on the inside or outside); it's kind of difficult to understand from the manual. .PARAGRAPH (Bart:) [rather than transcribing the problem description, the example below shows the outline of a nested FOR statement being used to access two domains, with records in the second domain being selected according to the match of some field in the second domain equaling a field in the first domain.] .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 8 FOR domain__1 BEGIN FOR domain__2 WITH field__2 = field__1 BEGIN --- work done here on one or both domains --- END END .BLANK.JUSTIFY.FILL The field that you are specifying in the second domain (the inner loop) [labeled field__2, belonging to domain__2 in the example above] is the one that should be keyed. This is exactly the same as the example shown earlier for the CROSS, where it was going nokey-key. If it's a VIEW, and you specify the first domain, and then the second domain occurs for a field equal to the first domain, it's the second domain where the field should be keyed. For all cases, for a record in the first domain, DTR has to find the matching record in the second domain. [The following illustrates the case of a VIEW. As in the first example, field__2 belonging to domain__2 is the one which normally should be keyed.] .BLANK.NO JUSTIFY.NO FILL.TEST PAGE 10 DEFINE DOMAIN view__domain OF domain__1, domain__2 USING 01 FIRST OCCURS FOR domain__1. 10 field__1 FROM domain__1. --- other fields from domain__1. 10 SECOND OCCURS FOR domain__2 with field__2 = field__1. --- other fields from domain__2. ; .BLANK.JUSTIFY.FILL (Larry:) please understand that the field doesn't HAVE to be keyed, but if you want to take advantage of the keys, it must be the second domain that is keyed. (Bart:) If you would like it to run in a reasonable amount of time, the second should be keyed. .BLANK Bob Brown: even if the outside loop has an RSE? .BLANK (Bart:) Yes, because the outside loop is going to go in the order you specify in the RSE: it may or may not be keyed, depending on how you do it. But if you want the matches on the inner loop to be fast, then the second domain has to be keyed so the retrieval can be on the key. .BLANK 3.TEST PAGE 5.CENTER "Field --- is undefined or used out of context"? .BLANK 2 (Joe:) [We now have] the infamous "undefined or used out of context" error message. This is probably the most frustrating and common occurrence for beginning users of DTR. There are some important obvious things, such as misspellings, that cause this problem. The real underlying cause is a cry for help from DTR because it does not understand what you have told it. There is something in the command that it does not understand, and when it doesn't understand it doesn't know where it doesn't understand (or maybe not know where it doesn't understand). (Larry:) here's the point: a lot of times when you get "undefined ..." the error is not on the quoted string in the error message, the error is going to be somewhere back upstream from that string. What I tell my users is to put your finger on that string, and where I'm pointing is where the error is [using your right hand, the error is somewhere left and up]. This can happen because DTR continues to parse for a while until finally things don't make sense, and then you get the error: the thing that caused it not to understand could be a comma or a space or word encountered previously. (Joe:) You look at the thing it's complaining about and back: sometimes what it's complaining about is a misspelling, but it's also possible that it's something back up the line. This happens because DTR is parsing and compiling, and it has a context within which it has to communicate with you, trying to understand what you're telling it: at this point it's saying "I don't understand anymore". .BLANK 3.TEST PAGE 6.CENTER When do I use hierarchies? .CENTER (OCCURS clauses in record definitions) .PARAGRAPH Ron Wilson, Wilson Concrete: I wonder if there are any rules when one should use "flat" records versus hierarchies? .PARAGRAPH (Larry:) [missing from the tape]. (Bart:) Basically, if you use an OCCURS, you limit how you can access that subordinate data. If you have a very well defined application where some data is definitely subordinate to a main data piece, and I'm absolutely positively always going to be getting the subordinate data with the main data then an OCCURS might make sense. The problem is that when you do do an OCCURS, it makes it difficult to get to the subordinate data only. (Larry:) It's really hard to manipulate within an OCCURS clause. You are better off putting it in another domain and use a CROSS and treat it in a relational way. .PARAGRAPH (Joe:) I'd like to make a dissenting opinion. There are some applications I've used in a medical database where the data is naturally hierarchical: and because the underlying data is naturally hierarchical, DTR is, in my opinion, the best tool for accessing hierarchical data. There are certain prices you must pay in order to do that, but I would argue the other way around. If the data is naturally hierarchical in the way it's used, it should be stored hierarchically, either in an OCCURS within a domain, or in separate domains using a VIEW which in effect creates a hierarchy. (Bart:) [section missing from tape]. .BLANK 3.TEST PAGE 5.CENTER How do I pass information from DTR back to DCL? .PARAGRAPH [name of questioner missing from tape:] ... that logical names created through DTR are user mode logical names. Is there any way that [DTR can create logical names in other modes: question wasn't finished as Andy Schneider was shaking his head "no"]. We use DTR in certain situations to pass values out to DCL and use those values: now we can't do that. .BLANK (Andy;) FN$CREATE__LOGICAL creates a user mode logical name for DTR's purposes only, because 9 people out of 10 will use the logical name while in DTR to optimize [an application] and don't want to have it "kicking around" afterwards so everyone picks it up. What I would suggest is if you want the logical name to be [in existence] afterwards is to use a DTR procedure to create an indirect command file that you execute when you leave DTR to create the logical names. (questioner:) just write it to a file basically. (Andy and Larry:) or create your own function [to create a logical name in some table or mode other than user]. And submit it to the DECUS library.