File:	IMPLEMENTATION.NOTES	(renamed to AAAIMP.NOT for distribution)

General notes on TED/SED internal structures and organization
Or: Things you need to know to port the code to another system.

27-NOV-1985 11:59  Brian Nelson
03-FEB-1986 11:21


Background:

 TED  (also  known as SED) is, in concept, a fairly old text editor, the
first version having been written in  Macro-11  (PDP-11)  several  years
ago.  The  current  version, written in 'C', runs on VAX/VMS version 4.x
and on the PDP-11 under P/OS v2  ,  RSX11M+  2.1  and  RSTS/E  v9.x.  It
provides  both  a full screen and line edit modes, each is callable from
the other. 


 TED maintains a copy of the file being edited in a work file, which  is
LRU  cached.  Each block (512 bytes) contains a group of lines (referred
to a bucket). A bucket is organized as follows: 

	word 0	      word 1        .....	word '_BCK_FIL'

	len(1st_line) len(2nd_line) .....	len(Nth_line)

	 The remaining bytes are used for actual data storage, up to
	either	the end of the block,  or until '_BCK_FIL' lines are
	in the bucket. For the PDP-11, a word is 16 bits, on the VAX
	a longword is 32 bits.

	#define	_tmpsize	512
	#define	_bck_size	_tmpsize - ( _bck_fil*sizeof(int))
	struct
		tempblock
			{ int	rsize[_bck_fil] ;
		  	  char	txt[_bck_size]  ;
			} ;


 Each bucket is linked to it's successor via a forward linked list.  The
linked  list  has in it a field for the character count of the bucket, a
field for the number of lines in the bucket, and a field for the  number
of  the next bucket. A link of zero implies the end of the list. The way
the index (list) is stored and accessed in  memory  is  depends  on  the
hardware.  The  current index being used is always pointed to by WORKPT.
The actual access to the index is done with  a  C  macro,  as  explained
below. 

The data:

	struct
		tempindex
			{ indexlink	link ;
			  indextype	lcount ;
	#if CHARCOUNT
			  indextype	chcount ;
	#endif
			} ;

Access to the data:

	#if	VAXVMS
	#define	_gworkpt(offset) workpt+offset
	#endif
	
	
	#if	RSX
	extern struct tempindex *mapwin() ;
	#define	_gworkpt mapwin
	#endif


 Since  each  index entry consists of (at minimum) a line count for that
bucket and a link to the next index entry, this implies that  to  locate
a  given  line we may have to run through the entire list before we find
the correct entry. This need not always be  the  case.  If  there  is  a
current  context  (from the previous operation) we can look ahead to see
if there exists an index entry linked  to  the  current  one,  and  then
check  to  see  if  the desired line is there. If current context is not
valid, or the link is zero, we must start at the listhead and  look  for
the  line.  If  the next index does not contain the line, we could start
the search from that point; in reality one may as well  start  from  the
listhead.  This  optimization  is only needed during forward searches in
the file being edited. Since  forward  searching  is  very  common,  the
small  improvement  in  access  time  can  reduce the cpu time needed to
search an entire file by a factor of three. 

 Example:


	if ( (lineno >= windpt->botlin) && (lineno <= windpt->toplin) )
	  then 	return(lineno - windpt->botlin) ;
	  else {
		if ((oldcurmode=curmode[w]) == ST_UPDATE) then putback();
		temp = 0 ;
		if (  (curbucket[w]=(_gworkpt(curbucket[w]))->link) != 0
		    && oldcurmode == ST_NORMAL_MODE
		    && ctx_valid )
		  then	{
		    windpt->botlin = windpt->toplin + 1 ;
		    windpt->toplin += (_gworkpt(curbucket[w]))->lcount ;
		    temp = ( ( lineno >= windpt->botlin ) && 
			     ( lineno <= windpt->toplin ) ) ;
			} ; 
		if ( temp == 0 )
		  then {		
		    curbucket[w] = firbucket[w] ;
		    windpt->botlin = 1 ;
		    windpt->toplin = (_gworkpt(curbucket[w]))->lcount ;
		
		    while (! (( lineno >= windpt->botlin ) && 
			      ( lineno <= windpt->toplin )) ) {
		      	windpt->botlin = windpt->toplin + 1 ;
			curbucket[w] = (_gworkpt(curbucket[w]))->link ;
			windpt->toplin += (_gworkpt(curbucket[w]))->lcount ;
			} ;
		    } ;


Linked list access:

VAX/VMS

 For VMS, it is simply allocated via a  call  to  LIB$GET_VM.  When  the
list  grows  too large,  another  call is made to LIB$GET_VM to create a
new, larger, area and the list (index) is copyied to the new  area.  The
old  space  is deallocated by LIB$FREE_VM. Accessing the index is simply
via WORKPT, since the VAX has no problem with address space.  The  macro
to  access  it  is  simply  a  NO-OP.  Another area of virtual memory is
allocated for work file cacheing. 


PDP-11 based

 The PDP-11 presents problems. First of all, not  all  PDP-11's  support
I/D  addressing,  and  not  all  DEC  executives  support  I/D user mode
mapping (as of this  writing,  only  RSX11M+2.1  and  RSTS/E  9.x).  The
problem  is  made worse by the fact that one almost has to use RMSRES on
the PRO/350, thus we immediately loose 8KW  of  address  space,  leaving
24KW.  Of  course,  TED  is overlayed into 5 co-trees but that leaves no
room for the index list. The solution is to make the GWORKPT macro  call
a  routine  which will dymanically map the passed list element number to
an offset in a 18 KW dynamic region which is mapped with one  APR,  from
120000  to  137777.  The  mapping  is  optimized, of course, to minimize
dynamic remaps (MAP$S) by the executive. The allocation of  this  region
is  as  follows:

	0K..4K  (4KW)   Cache buffer pool for the work file
	5K..12K (8KW)   Index (linked list) area. Must be remapped
			by exec if we pass a 4KW boundary.
	13K..16K(4KW)   Index for editing window #1
	16K..17K(2KW)	Buffering for screen editor macros

 Of course, if the current region mapping context  is  invalid  for  the
desired  access  a dynamic remap (MAP$S) is done. For instance, the work
file cacher (described below) calls a routine to map  the  cache  before
it's  access  to  it,  and  then to unmap and map back to the index area
after it completes it's read or write to the cache. The fixed  8KW  area
allows  for  4096 (10) elements in the linked list, this is, under ideal
conditions, sufficient for a 4000 block file. One would more likely  run
into integer overflow problems by then. 

 The exception to this is the I/D space version of ted, TEDID.TSK.  This
version  sets  up  two regions, one for the indexes and one for the work
file  cacheing.  This  eliminates  the  constant  remapping  that  would
otherwise  occur.  Additionally, it is almost three times faster in file
loads, as is the TEDSUP.TSK (RMSRES  in  supervisor  mode)  due  to  the
elimination of cluster library remapping. 

Work file cacheing

 As noted, the file being edited is loaded into a work  file.  The  file
is  direct  access,  the  desired  record being determined via the index
list. The file is cached in a clustered LRU cache, with  the clustersize
being  the  number  of  buckets (512 byte blocks) in a cache entry. This
means that the cache  is  divided  into  SIZE_OF_CACHE/(512*CLUSTERSIZE)
segments.  The clustering improves read/write throughput in the event of
a cache fault as it is much more effiecient to write say 2048  bytes  at
a  time that having to do four separate 512 byte writes (or reads). Each
cache cluster, then,  contains  several  adjacent  buckets.  Since  most
access  to  text  files is sequential and most readonly, there is a very
high likelyhood that the next bucket  in  a  cluster  will  be  the  one
actually desired next, within the limits of size of the cluster. 

 The cache is allocated  4KW  on  the  PDP-11  in  1024  byte  clusters,
allowing  two  buckets  per cluster. On VMS, the cache is allocated by a
non-linear function of WSQUOTA and the clustersize  is  four  (4),  thus
cacheing  four  buckets  per cache entry. In my case of having a WSQUOTA
of 1000 and WSEXTENT of 4000, TED will allocate  330  virtual  pages  of
memory  for  the  cache  which  translates  to 82 clusters. There is one
other optimization; if the input file size is greater  than  the  number
of  buckets  that  can  be  cached  (NUMBER_OF_CLUSTERS*CLUSTERSIZE) the
cacher will change the cache clustersize to as much as it  can  and  use
only  one  page of it, thus creating a very large buffer for writes ONLY
during initial file loading.  After the file is  loaded,  it  will  then
invalidate  cache  and  reset the clustersize to the normal values. This
greatly speeds up loading of large files; on VMS  it  can  use  a  cache
page  size  of  177000 (8) bytes to effect efficient (and few) writes to
the work file at startup. Statistics are kept (STAT command).