.PAGE SIZE 62, 60
.RIGHT MARGIN 60
.CENTER
^&Commonly Asked DATATRIEVE Questions _& Answers\&
.BLANK 2.CENTER
Session Chair:
.BLANK.CENTER
Larry Jasmann
.BLANK.CENTER
U.S. Coast Guard
.BLANK.CENTER
Burke VA
.BLANK 2.CENTER
Joe H. Gallagher
.BLANK.CENTER
4GL Solutions
.BLANK.CENTER
Kansas City, MO
.BLANK 2.CENTER
Andy Schneider
.BLANK.CENTER
Developer, VAX-DATATRIEVE
.BLANK 2.CENTER
Dick Azzi
.BLANK.CENTER
Motorola
.BLANK.CENTER
Phoenix, AZ.
.BLANK 2.CENTER
Chris Wool
.BLANK.CENTER
E.I. DuPont
.BLANK.CENTER
Wilmington, DE
.BLANK 2.CENTER
B.#Z.#Lederman
.BLANK.CENTER
Brooklyn, N.Y.
.BLANK 2.CENTER
Transcribed by B.#Z.#Lederman
.TITLE Commonly Asked DATATRIEVE Questions _& Answers
.SUBTITLE DT007 Spring 1986 Dallas
.NOTE Abstract
.BLANK 2
This is a transcription of a panel presentation which
answers some of the most common questions asked about
DATATRIEVE. Some of the material has been reordered when
that would group logical subjects together. The
transcription may paraphrase some questions or answers for
clarity, and the transcriber apologizes in advance for any
misspelled names. This paper follows the usual convention of
placing square brackets around interpretations or material
supplied by the editor. Throughout this paper DTR is an
abbreviation for DATATRIEVE.
.END NOTE
.RIGHT MARGIN 55
.BLANK 3.TEST PAGE 5.CENTER
Why is DATATRIEVE so slow?
.BLANK
(Larry:) DTR has a lot of power, and does a lot of
things, but it also "sits on top of" one of three other
products: RMS, DBMS, or Rdb. If you use DTR in such a
way as to cause, for example, RMS to do a sequential
search of a file containing 20,000 records, you should
not be surprised if it takes a long time to respond
with an answer. If you are a programmer [in a
traditional language] you probably wouldn't do such a
silly thing when writing a program, but when you are
using DTR interactively on a large file it's really
easy to do this. Joe has some slides which show the
difference between retrieving data with keys and
without keys. [Figure 1]# You
can see that when you get beyond 1000 records, the
amount of time to access a file sequentially skyrockets
compared with keyed retrieval. (Joe:) This example is
in fact a CROSS, that is you are doing a relational
join between two domains: the second domain does not
have a key in one case, and does in the other. The
performance ratio is essentially the same as for a
simple lookup doing a single keyed retrieval compared
with a single sequential search. The point to be made
is that DTR is only as fast as what it sits on: it's
because DTR hides some of the details of what is going
on below it that many times it's possible to do
something that seems perfectly reasonable to you, but
is very slow because the file design is not appropriate
for that function, and performance suffers
considerably. (Larry:) A corollary: it's not altogether
clear unless you've studied it which constructs in DTR
will cause sequential searches, and this is something
you need to know well if you have a big file. (Dick:)
My first answer to this normally is: if you are on a
VT100, try hitting the NO#SCROLL key again. That
happens quite often: that will slow the system [your
application]. Along with the sequential portion, the
number of keys [in an indexed file] has a direct
bearing on how fast the application will be: this
doesn't matter much on a read, but on a write, the more
keys there are the slower it will be. [Figure 2]#
(Joe:) if you are
going to retrieve records on both keys, the time needed
to store the extra key will be well worth it in the
retrieval. However, if you are not going to retrieve
records on that key, you are going to pay an overhead
price. It's important to choose the keys carefully, to
use only those which will be used for retrieval. These
underlying factors in file design determine how fast
the application will be in Datatrieve. (Andy:) One of
the advantages to using keys is that RMS will do
sorting for you. When you create a primary key and DTR
says "give me the records" RMS gives them back in
sorted order. If you have an application and you have a
primary key, and you enter a command which sorts on
that key, why would it take so long? DTR isn't super
smart: if you explicitly order DTR to sort the data, it
isn't able to tell that it's already sorted. So: don't
go out and automatically sort everything, figure out
what are primary keys, and don't sort fields that are
already sorted. How do you know when your retrieval is
using keys? One way is to do this: instead of just
running the DTR image, run it with DEBUG.
.BLANK
DEBUG SYS$SYSTEM:DTR32
.BLANK
You get some VAX#DEBUG headers and messages, and then
the prompt: simply say "GO", and you will be in DTR.
What you are doing is initializing the DEBUGGER, and
then you will be in DATATRIEVE. What happens is that
when you perform an RSE, if a key is being used you
will receive an informational message on your terminal
for every key being used in your RSE or Boolean or
whatever. If you were assuming that three keys are
being used, this way DTR will tell you if those keys
are actually being used or not. If it isn't using it,
then perhaps there is a flaw in your design, and you
can go back and work on it. This is a debugging
technique to see if what you are doing is what you
thought you were doing.
.PARAGRAPH
(Larry:) Another thing you need to know is FIND and FOR
statements. If you do a FIND and create a current
collection, subsequent operations on that collection
are going to be done sequentially. [Figure 3]#
This is usually the
minimum amount of time you will save with a FOR, but
there are other savings that are also obtained. You
should remember that any operation on the collection is
not keyed, even if it looks as if it was. The only time
a FIND is better than a FOR is if you have a very large
domain, and you can do a FIND to collect a relatively
small number of records, and are going to do several
operations on that small collection. For example, if
you have 10,000 records and want to work on a subset of
50 or 60 records, it makes sense to use a FIND,
otherwise not. (Dick:) While we are talking about
FINDs, it's important to remember that when you do a
SORT, even to do a PRINT (for example,
PRINT#FIRST#5#---#SORTED#BY#field), that DTR is going
to do the SORT first. If you can do a FIND and reduce
the number of records you are going to be using, and
then SORT that small number, you will save a lot of
time because DTR will not have to sort the whole file.
.PARAGRAPH
(Andy:) another bottleneck is access to the dictionary
(the CDD). It's crucial, especially at initial access
time, that the dictionary not be "top heavy". A lot of
people make the mistake of putting everything into
CDD$TOP, and then when you want to ready a domain the
amount of time that it takes for CDD to access the
pieces in the dictionary is extremely high. If you have
a lot of stuff in CDD$TOP and not much in
subdirectories, create a good tree structure and move
stuff down the tree.
.PARAGRAPH
(Joe:) There is one area I run into that most users
don't run into. From a scientific and medical
standpoint, we have some users who do calculations in
DTR rather than some other language like FORTRAN), so
in fact they are doing a lot of calculations in DTR. In
many cases they created complex procedures where all of
the temporary variables are declared something like
PIC#999V999 (string variables). If for some reason you
have to do heavy calculations in DTR you gain a
substantive return by converting those variables to
COMP variables (REAL, INTEGER) because DTR does a fair
amount of conversion, and these are CPU intensive
activities where numbers in one format has to be
converted to other formats with a lot of sanity checks.
If you are going to do a lot of calculations (such as a
data base of scientific data) you get a performance
improvement by making the variables the appropriate
data type.
.PARAGRAPH
(Joe:) If you think you are running slow, and you don't
know if it's you or other programs [on the system]
there is a pair of functions within VAX-DTR which will
allow you to initialize a timer and then show the
amount of elapsed time between the initialization and
the show time. [For example, the following procedure
was used when testing the CROSS statement with and
without keys to obtain the data shown in Figure 1.]
.BLANK.NO JUSTIFY.NO FILL.TEST PAGE 7
FN$INIT__TIMER
FN$SHOW__TIMER
PRINT field1, field2 of
          domain1 CROSS domain2 OVER field1
FN$SHOW__TIMER
.BLANK.JUSTIFY.FILL
You can
place them around various sections of code and find
those sections that are actually running slow. It will
give you information about elapsed clock time, CPU
time, page faults, etc., and that's very helpful. If
something is not running as fast as you think it ought
to, you can go and look and decide if it's your process
or someone else who is hogging the system [if CPU time
is small but elapsed time is large]. (Bart:) on DTR-11
and PRO-DTR you don't have INIT__TIMER, but you can do
remote DTR and the log file will have some information
on times and what DTR is doing with the retrievals.
This will also work for VAX-DTR, and is another way to
find out what is going on inside DTR. You can always do
a remote DTR to your own node.
.BLANK 3.TEST PAGE 5.CENTER
Why can't I put a READY inside my BEGIN-END loop?
.BLANK
(Andy:) [A suggested source for information on this
point is the VAX-DTR internals session], but basically
this has to do with the difference between commands and
statements. In a nutshell, the reason you can't put a
command like READY within a BEGIN-END block is because
commands go through one path [when being processed by
DTR] and statements go through another. When you put a
BEGIN-END around something, what you have done is
created one big statement out of everything within the
BEGIN-END block. If you stick a command in there and
DTR runs across it, it's not down the right path
[internally] to execute it. (Larry:) the main thing is
to understand that there are such things as commands
and statements and that one can't go in the other. With
the way the language is constructed there is little
need to do that: there are ways around it. (Joe:) A
simpler explanation is that statements manipulate the
data within an environment, and commands change that
environment. By putting a command within a BEGIN-END
loop you've changed the environment while trying to
work in it.
.BLANK 3.TEST PAGE 5.CENTER
Can I read DTR files from another language?
.BLANK
(Dick:) There is no such thing as DTR files. There are
RMS files, Rdb files, DBMS, etc. DTR does not create
files of it's own. You can read RMS from BASIC, COBOL,
or any language. The converse of that is DTR can read
files created by other languages: even the editor if
you are careful. (Bart:) the problem with the editor is
that you may not align everything properly. You may
also run into the problem where you create a record
definition 80 bytes long, and you go into the editor
and type data 80 bytes long but the editor creates a
variable length file. When you ready the domain DTR
will give you a warning message that the file types
don't match, but it will then go ahead and read it
anyway. If DTR gets a record which is too short, it
pads it out (and may give an error message): if it's
too long, DTR truncates the record (and in the past,
especially on the PDP-11, tends to abort). If you are
in doubt, put a FILLER field on the end of the record
definition to make the record definition too long: you
may get warning messages but DTR will go ahead and read
the data. You can then write it to another file with
fixed length records and DTR will be happy. (Larry:) a
related question is, what if you have a nasty system
manager who won't tell you what the file is like and
you are trying to read it? The answer is, use the RMS
utilities to find out how long the record is, then
create a record definition of the same length with one
big field with a PIC length the length of the record,
EDIT__STRING#T(80) [to make the data fit on the usual
CRT screen), ready the domain, print some records out,
and by looking at it you can usually figure out where
the fields are, and revise your record definition.
(Bart:) remember to ready the domain read only and
shared, until you are certain you have the record
definition correct. You don't want to modify anything
until you know what it is.
.BLANK 3.TEST PAGE 5.CENTER
Can you sort on a non-keyed field?
.BLANK
(Dick:) You can sort on any field you have in your
record. (Not COMPUTED__BY fields on a PDP-11, but any
real field.)# On a VAX, it should be any field.
.BLANK 3.TEST PAGE 5.CENTER
Why can't I prompt for a domain or a field?
.BLANK
(Andy:) DTR is really forgiving, but there are certain
features intended to be used in some places and not
others. Prompting, when you do a *.prompt, is for value
expressions and value expressions only. A value
expression is a value for a field, or a piece of text.
They don't include things like key words or names of
things, which is what a domain is. When you say READY
*.---, what you are prompting for is a value
expression, and DTR will say "oh no I won't!".
Essentially, the contents of a quoted string is what it
grabs, so when you prompt for anything it must be a
value expression or piece. There are some workarounds,
one being logical names: you prompt for a string, do an
FN$--- to create a logical name translation, ready the
logical name and DTR will translate one level of
logical names.
.BLANK 3.TEST PAGE 5.CENTER
How many records can I have in my domain?
.CENTER
How large a record can I have?
.BLANK
(Bart:) Basically, the number of records you can have
in your domain is limited by how large your disk is (or
your disk quota if applicable). As for the size of the
record: on the PDP-11 if it's very large you will run
out of pool space. On the VAX there may be a limit, but
I don't know anyone who has hit it. (Comment from
audience indicating it had been reached.)# There is a
system wide RMS limit on the maximum size for any
record on a VAX, and I believe it's set around 32,000
bytes. As far as the number of records, it's limited by
the amount of disk space, and I've done domains with
over 130,000 records. (Comment from audience indicating
a user with 6000+ byte records, stating that the
application seemed a bit slow, but when the application
was broken up into smaller pieces with relevant
sections connected by crosses, it ran faster. Doesn't
it make more sense to keep the record size smaller?)#
(Bart:) It's partially record size, and partially the
number of fields. If you don't need all 6000 bytes at
once, breaking it up into smaller pieces that most
logically go together will save you overhead. The other
possibility is to have more than one record definition
for the same file and use FILLER to skip over the
pieces that aren't needed at that time, and that also
cuts down the number of fields that DTR has to know
about. Either of those approaches would give an
improvement. (User: if you use filler, it cuts down the
number of fields, but then you have the same number of
FILLER fields.)# You use one FILLER field to skip over
all of them. (Andy:) One important point we are looking
at for 'way in the future is that access to the CDD is
very inefficient for metadata. For every attribute you
have for every field DTR has to make a call to the CDD.
If you have 400 fields, and each field has a name, a
query header, and edit string, a query name, missing
value, default value, DTR makes one call for each [at
least 2800 calls including the PIC clause]. If you can
cut unnecessary attributes, or you have fields that
aren't used often and you can skip over with FILLER,
DTR jumps over FILLER and it's internal field tree is
much, much simpler. Also, less memory is used, as it
allocates a big block for each field, and this block is
the same size for a field with no attributes or with
many attributes. If you can eliminate fields, you save
time and memory not allocating blocks. (Larry:)
Besides, anyone with a record that has 6000 bytes in it
needs to go back an re-evaluate how the data is being
structured. (User:) the records were a complete record
of our field engineers, including their education,
experience, etc. In essence, we had 10 major areas of
interest, and instead of 1 record we really had 10. It
worked a lot faster [after we changed to 10 records.]#
(Bart:) Not just speed but other considerations apply:
if you give someone write access to that domain, they
now have access to everything in there, and do you
really want to give them everything at once? From the
management standpoint you also want to separate the
data.
.BLANK 3.TEST PAGE 5.CENTER
Can I do menu-driven applications in DTR?
.BLANK
(Larry:) Yes, and I know of about 4 different methods.
The first is the way NOT to do it, and that is to use
DCL and have it call DTR every time you need to do
something (having the menu in DCL). This will work,
but it's inefficient: you go through all of the overhead
of starting up DTR whenever you want to do something.
.PARAGRAPH
I like to use the call interface, and a little program
that feeds procedures back to DTR. Essentially, DTR
tells the program what it wants to do next, and the
program tells DTR to run it [a procedure] next. That
works very well. .PARAGRAPH (Dick Azzi:) I like to
"pre-compile" DTR. Andy mentioned that anything within
a BEGIN-END block is treated as one statement, and DTR
always has to parse the next statement it is going to
work on. We take maybe 75 to 100 "programs", put them
all within a large BEGIN-END block with a menu so DTR
treats it all as one statement: it takes 15 to 20
minutes to parse that statement (we bring it up on
Monday morning and leave it up until everyone goes home
Friday night). Included on the menu are functions
"sleep" and "pause", which bring up an FMS screen with
a no-echo password so that the user can leave a
terminal and get back in only if they enter the same
password. This process works well in our application,
where we treat DTR as the center of the universe: if we
have to do something in DCL we will spawn out of DTR,
work in DCL (things like word processing, PHONE,
running another program), and then return to DTR where
all of the pre-compiled statements are still active,
all READYs are still there, etc. This gives a very
quick turn-around on menu response.
.PARAGRAPH
(Larry:) Another method is with logical names [calling
a procedure which has one fixed name: a logical name
assignment is made from the menu to translate that
logical name to the name of one of a set of "real"
procedures]. There are probably other methods as well.
.PARAGRAPH
(Mike Nickolas, Bank Ohio:) Another method of doing
menus in DTR is to have a simple procedure called
DISPLAY__MENU which has a print statement to clear the
screen, displays an abbreviated form of the procedure
such as ":M1", and that procedure only does one
function such as ADD [a record], which would require
only a very short compilation time.
.PARAGRAPH
(Chris:) I'd like to make an exception about how not to
do a menu [using DCL described above]. There are times
when a DCL menu is appropriate. If you have several
choices on the menu and only one is going to be DTR,
the startup delay occurs only after the choice of that
option is made. The main menu can come up very quickly
in a DCL procedure, and if there is only one choice
which goes to DTR, or there are many options only one
or two of which go to DTR, the total delay is less than
if you have to go in and out of DTR a lot.
.PARAGRAPH
Using logical names is very similar to the
"pre-compile" method. The difference is that in your
choice statement you use the FN$CREATE__LOGICAL, and at
the bottom of the choice statement you invoke the
logical name and DTR will execute the procedure name
used in the create logical function: you can even go
back and invoke the procedure you are in now. This
appears to be recursive use of DTR but in fact is not
really working recursively.
.PARAGRAPH
Susan Krantz, NKF Engineering: another way I use DCL
and DTR together is in DBMS applications where I have a
COBOL program using FMS doing an update function and
then have a menu in the beginning asking if I want to
use the update program or do I want to go into DTR and
do my reporting. That's a good combination because they
are either changing the database or they are doing
report, and once you are in the report module
everything is pre-compiled.
.BLANK
(Larry:) as Chris said, if you are doing one-shots then
the DCL menu is good, but on the other hand if you are
going to switch back back and forth between updating
and reporting I'd use the call interface and integrate
[DTR] right in [to the COBOL program].
.BLANK 3.TEST PAGE 5.CENTER
Will the DTR Call Interface support new languages?
.PARAGRAPH
Ron Swift, Xerox: we use a FORTRAN interface for some
of [those applications, such as were discussed for the
previous question] which allows us to leave the domains
open. It allows us to go through a menu, and appears to
speed up tremendously what we're doing. My question is,
will there ever be a "C" interface to DTR
.PARAGRAPH
(Andy:) Do you mean, will there ever be something which
DTR ships as part of it's kit to allow you to
automatically run with "C"? You can do it today, but
you have to create your own DAB. The bottom line is:
any language which conforms to the VAX calling standard
can use callable DTR today, which means "C" can use it.
We have chosen a subset to ship with our kit which
means DABs, examples, and so forth. We have been asked
for "C" in the past and that may come, although it
doesn't stop you from using it today, it just means you
have to do some legwork up front.
.PARAGRAPH
(Bart:) A little advertisement for the DECUS library:
if the first person to do it would please submit it to
the library then everyone else will get it.
.BLANK 3.TEST PAGE 7.CENTER
How do I used nested FOR loops?
.BLANK.CENTER
(i.e., how do I optimize access to two domains?)
.PARAGRAPH
Bob Brown, INTEL: In regards to optimization, could
someone explain to me how nested FOR loops work, where
you put the keys (on the inside or outside); it's kind
of difficult to understand from the manual.
.PARAGRAPH
(Bart:) [rather than transcribing the problem
description, the example below shows the outline of a
nested FOR statement being used to access two domains,
with records in the second domain being selected
according to the match of some field in the second
domain equaling a field in the first domain.]
.BLANK.NO JUSTIFY.NO FILL.TEST PAGE 8
FOR domain__1 BEGIN
     FOR domain__2 WITH field__2 = field__1 BEGIN
     ---
     work done here on one or both domains
     ---
     END
END
.BLANK.JUSTIFY.FILL
The field that you are specifying in the second domain
(the inner loop) [labeled field__2, belonging to
domain__2 in the example above] is the one that should
be keyed. This is exactly the same as the example shown
earlier for the CROSS, where it was going nokey-key. If
it's a VIEW, and you specify the first domain, and then
the second domain occurs for a field equal to the first
domain, it's the second domain where the field should
be keyed. For all cases, for a record in the first
domain, DTR has to find the matching record in the
second domain. [The following illustrates the case of a
VIEW. As in the first example, field__2 belonging to
domain__2 is the one which normally should be keyed.]
.BLANK.NO JUSTIFY.NO FILL.TEST PAGE 10
DEFINE DOMAIN view__domain OF domain__1, domain__2 USING
   01 FIRST OCCURS FOR domain__1.
      10 field__1 FROM domain__1.
         --- other fields from domain__1.
      10 SECOND OCCURS FOR domain__2 with
                                   field__2 = field__1.
         --- other fields from domain__2.
;
.BLANK.JUSTIFY.FILL
(Larry:) please understand that the field doesn't HAVE
to be keyed, but if you want to take advantage of the
keys, it must be the second domain that is keyed.
(Bart:) If you would like it to run in a reasonable
amount of time, the second should be keyed.
.BLANK
Bob Brown: even if the outside loop has an RSE?
.BLANK
(Bart:) Yes, because the outside loop is going to go in
the order you specify in the RSE: it may or may not be
keyed, depending on how you do it. But if you want the
matches on the inner loop to be fast, then the second
domain has to be keyed so the retrieval can be on the
key.
.BLANK 3.TEST PAGE 5.CENTER
"Field --- is undefined or used out of context"?
.BLANK 2
(Joe:) [We now have] the infamous "undefined or used
out of context" error message. This is probably the
most frustrating and common occurrence for beginning
users of DTR. There are some important obvious things,
such as misspellings, that cause this problem. The real
underlying cause is a cry for help from DTR because it
does not understand what you have told it. There is
something in the command that it does not understand,
and when it doesn't understand it doesn't know where it
doesn't understand (or maybe not know where it doesn't
understand). (Larry:) here's the point: a lot of times
when you get "undefined ..." the error is not on the
quoted string in the error message, the error is going
to be somewhere back upstream from that string. What I
tell my users is to put your finger on that string, and
where I'm pointing is where the error is [using your
right hand, the error is somewhere left and up]. This
can happen because DTR continues to parse for a while
until finally things don't make sense, and then you get
the error: the thing that caused it not to understand
could be a comma or a space or word encountered
previously. (Joe:) You look at the thing it's
complaining about and back: sometimes what it's
complaining about is a misspelling, but it's also
possible that it's something back up the line. This
happens because DTR is parsing and compiling, and it
has a context within which it has to communicate with
you, trying to understand what you're telling it: at
this point it's saying "I don't understand anymore".
.BLANK 3.TEST PAGE 6.CENTER
When do I use hierarchies?
.CENTER
(OCCURS clauses in record definitions)
.PARAGRAPH
Ron Wilson, Wilson Concrete: I wonder if there are any
rules when one should use "flat" records versus
hierarchies?
.PARAGRAPH
(Larry:) [missing from the tape]. (Bart:) Basically, if
you use an OCCURS, you limit how you can access that
subordinate data. If you have a very well defined
application where some data is definitely subordinate
to a main data piece, and I'm absolutely positively
always going to be getting the subordinate data with
the main data then an OCCURS might make sense. The
problem is that when you do do an OCCURS, it makes it
difficult to get to the subordinate data only. (Larry:)
It's really hard to manipulate within an OCCURS clause.
You are better off putting it in another domain and use
a CROSS and treat it in a relational way.
.PARAGRAPH
(Joe:) I'd like to make a dissenting opinion. There are
some applications I've used in a medical database where
the data is naturally hierarchical: and because the
underlying data is naturally hierarchical, DTR is, in my
opinion, the best tool for accessing hierarchical data.
There are certain prices you must pay in order to do
that, but I would argue the other way around. If the
data is naturally hierarchical in the way it's used, it
should be stored hierarchically, either in an OCCURS
within a domain, or in separate domains using a VIEW
which in effect creates a hierarchy. (Bart:) [section
missing from tape].
.BLANK 3.TEST PAGE 5.CENTER
How do I pass information from DTR back to DCL?
.PARAGRAPH
[name of questioner missing from tape:] ... that
logical names created through DTR are user mode logical
names. Is there any way that [DTR can create logical
names in other modes: question wasn't finished as Andy
Schneider was shaking his head "no"]. We use DTR in
certain situations to pass values out to DCL and use
those values: now we can't do that.
.BLANK
(Andy;) FN$CREATE__LOGICAL creates a user mode logical
name for DTR's purposes only, because 9 people out of
10 will use the logical name while in DTR to optimize
[an application] and don't want to have it "kicking
around" afterwards so everyone picks it up. What I
would suggest is if you want the logical name to be [in
existence] afterwards is to use a DTR procedure to
create an indirect command file that you execute when
you leave DTR to create the logical names.
(questioner:) just write it to a file basically. (Andy
and Larry:) or create your own function [to create a
logical name in some table or mode other than user].
And submit it to the DECUS library.