A SIMPLE SPELLING CHECKER FOR EVE written by: Thomas Wolfe Jet Propulsion Laboratory Mail Stop 125/123 4800 Oak Grove Drive Pasadena, CA 90019 Office (818) 354-6983 Secretary (818) 354-2048 1 A Simple Spelling Checker For EVE CONTENTS 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 4 2 SYSTEM WIDE LOGICAL NAMES . . . . . . . . . . . . . 4 3 USER LOGIN.COM LOGICAL NAMES AND SYMBOLS . . . . . . 4 4 USING THE STANDALONE SPELLING CHECKER . . . . . . . 5 5 USING EXTENDED EVE . . . . . . . . . . . . . . . . . 5 5.1 THE SPELL COMMAND . . . . . . . . . . . . . . . . 5 5.1.1 SPELL OR SPELL BUFFER . . . . . . . . . . . . . . 6 5.1.2 SPELL PARAGRAPH . . . . . . . . . . . . . . . . . 6 5.1.3 SPELL C . . . . . . . . . . . . . . . . . . . . . 6 5.1.4 SPELL DCL . . . . . . . . . . . . . . . . . . . . 6 5.1.5 SPELL FORTRAN . . . . . . . . . . . . . . . . . . 6 5.1.6 SPELL MACRO . . . . . . . . . . . . . . . . . . . 7 5.1.7 SPELL RNO . . . . . . . . . . . . . . . . . . . . 7 5.2 THE LOAD COMMAND . . . . . . . . . . . . . . . . . 7 5.3 THE UPDATE COMMAND . . . . . . . . . . . . . . . . 7 5.4 THE CRTL D KEY . . . . . . . . . . . . . . . . . 7 5.5 THE CRTL L KEY . . . . . . . . . . . . . . . . . . 8 5.6 THE CRTL P KEY . . . . . . . . . . . . . . . . . . 8 6 HINTS AND KINKS . . . . . . . . . . . . . . . . . . 8 7 DEFINITION OF A WORD . . . . . . . . . . . . . . . . 8 8 DICTIONARY SEARCH . . . . . . . . . . . . . . . . . 8 9 POSSIBLE FUTURE IMPROVEMENTS . . . . . . . . . . . . 8 10 USER MODIFICATION OF EXTENDED EVE . . . . . . . . . 9 11 SPELLING CHECKER FILES AND DESCRIPTIONS . . . . . . 9 11.1 AAAREADME.MEM . . . . . . . . . . . . . . . . . 10 11.2 BUILD_ALL.COM . . . . . . . . . . . . . . . . . 10 11.3 BUILD_COMMON_DICT.EXE . . . . . . . . . . . . . 10 11.4 BUILD_PROJECT_DICT.EXE . . . . . . . . . . . . . 10 11.5 BUILD_USER_DICT.EXE . . . . . . . . . . . . . . 10 11.6 COMMON.DICT . . . . . . . . . . . . . . . . . . 10 11.7 COMMON.INDEX . . . . . . . . . . . . . . . . . 11 11.8 COMMON_DICT.RPT . . . . . . . . . . . . . . . . 11 11.9 COMMON_DICT_REPORT.EXE . . . . . . . . . . . . . 11 11.10 COMMON_WORDS.DAT . . . . . . . . . . . . . . . . 12 11.11 DICT_1_WORDS.DAT . . . . . . . . . . . . . . . . 13 11.12 DICT_2_WORDS.DAT . . . . . . . . . . . . . . . . 14 11.13 EDITOR.GBL . . . . . . . . . . . . . . . . . . 14 11.14 EDITOR.TPU . . . . . . . . . . . . . . . . . . 15 11.15 LINK_TPU_CALLUSER.COM . . . . . . . . . . . . . 15 11.16 PROJECT.DICT . . . . . . . . . . . . . . . . . 15 11.17 PROJECT_WORDS.DAT . . . . . . . . . . . . . . . 15 11.18 SPELL.MEM . . . . . . . . . . . . . . . . . . . 15 11.19 SPELL.EXE . . . . . . . . . . . . . . . . . . . 15 11.20 SPELL_INCLUDE.FOR . . . . . . . . . . . . . . . 15 11.21 SPELL_RV.EXE . . . . . . . . . . . . . . . . . 15 11.22 SPELLIB.OLB . . . . . . . . . . . . . . . . . . 16 11.23 STATS.EXE . . . . . . . . . . . . . . . . . . . 16 11.24 TPU_CALLUSER.EXE . . . . . . . . . . . . . . . . 16 11.25 TEST_COMMON_DICT.EXE . . . . . . . . . . . . . . 16 2 A Simple Spelling Checker For EVE 11.26 TEST_COMMON_INDEX.EXE . . . . . . . . . . . . . 16 11.27 TEST_PROJECT_DICT.EXE . . . . . . . . . . . . . 16 11.28 TEST_USER_DICT.EXE . . . . . . . . . . . . . . . 17 11.29 USER.DICT . . . . . . . . . . . . . . . . . . . 17 11.30 USER_WORDS.DAT . . . . . . . . . . . . . . . . . 17 3 A Simple Spelling Checker For EVE 1 INTRODUCTION A spelling checker that is an extension to the EVE editor is provided. A separate standalone spelling checker patterned after the LBL software tools SPELL utility is also provided. Both spelling checkers user three dictionaries to test the spelling of words. A common dictionary (standard english words), a project dictionary (acronyms, etc.) and a user defined dictionary. The user defined dictionary can be created/updated while in an EVE edit session. Utilities are provided to build all three dictionaries from text files containing one word per line. The source word file for the common dictionary must be in ascending (lexical) sort order. The project and user source word files do not. The common dictionary currently contains 91,000+ words. The project dictionary can contain 1,000 word (or 10,000 bytes). The user dictionary can contain 200 words (or 2,000 bytes). The maximum word size is currently 31 characters. The above limits can be modified by changing a few parameters and recompiling the programs. 2 SYSTEM WIDE LOGICAL NAMES The following is an example of DCL commands that define the system wide logical names for the Spelling checker(s). They point to the directories containing the common and project dictionaries and the shared image file that is part of the extended EVE editor. $ DEFINE/SYSTEM/EXEC COMMON$DICTIONARY SYSTEMDISK:[SPELL] $ DEFINE/GROUP PROJECT$DICTIONARY SYSTEMDISK:[SPELL] $! $ DEFINE/SYSTEM/EXEC TPU$CALLUSER - SYSTEMDISK:[SPELL]TPU_CALLUSER.EXE 3 USER LOGIN.COM LOGICAL NAMES AND SYMBOLS The following should be added to the user LOGIN.COM file. The directory where the user dictionary will reside is defined as well as symbols for the extended EVE and the standalone spelling checker. 4 A Simple Spelling Checker For EVE $ DEFINE USER$DICTIONARY USERDISK:[USERDIR] $! $ E :== EDIT/TPU/SECTION=SYSTEMDISK:[SPELL]EDITOR.GBL $! $ SPELL :== $SYSTEMDISK:[SPELL]SPELL.EXE 4 USING THE STANDALONE SPELLING CHECKER To use the standalone spelling checker enter the command SPELL followed by the name of the file to be checked. If no file is entered the program prompts for one. Each line in the file will be displayed on the terminal. Each word that is not found in any of the dictionaries will be indicated by a row of asterisks under it. 5 USING EXTENDED EVE Several new EVE command have been defined. They are entered by first hitting the "DO" key. After the "Command:" prompt enter one of the new commands. Several control keys (that have nothing to do with the spelling checker) have also been define. I found them useful. 5.1 THE SPELL COMMAND The SPELL command will check the spelling of words in a specified range or type of file. If a range is specified the file is assumed to be a text file and all word are spell checked. If a particular type of file is specified only some lines/word will be checked for spelling. For example, the spelling checker know a little about FORTRAN syntax and will only spell check comments and character constants. Note: only the first character (upper or lower case) of the BUFFER, PARAGRAPH, FORTRAN, etc. parameter needs to be entered. The SPELL command notifies the user that it is loading the dictionaries. A warning message is displayed if the project and/or the user dictionary is not found. Execution is terminated if the common dictionary can not be found. 5 A Simple Spelling Checker For EVE Each word is displayed in reverse video as it is checked. If the word can not be found in any of the dictionaries the prompt "Enter Replacement Word :" is displayed. At this point the user has several options. They may enter a carriage return which accepts the current spelling of the word. They may enter one or more characters (followed by a carriage return) to be substituted for the word. They may also may enter a CRTL Z which stops the spell checking activity and return back to the normal edit mode. 5.1.1 SPELL OR SPELL BUFFER To spell check the entire buffer enter SPELL, SPELL BUFFER or SPELL B. Every word in the buffer will be spell checked. Words do not span across the end of line boundry. 5.1.2 SPELL PARAGRAPH To spell check the current paragraph enter SPELL PARAGRAPH or SPELL P. Paragraphs are delimited by blank lines, beginning or end of buffer. Every word in the paragraph will be spell checked. Words do not span across the end of line boundry. 5.1.3 SPELL C To spell check a C source code file enter SPELL C. Only comments are spell checked. A comment is everything starting with "/*" and ending with "*/". 5.1.4 SPELL DCL To spell check a DCL command file enter SPELL DCL or SPELL D. Only comments are are spell checked. A comments is everything after a "!" on a line. 5.1.5 SPELL FORTRAN To spell check a FORTRAN source code file enter SPELL FORTRAN or SPELL F. Only comments and character constants will be checked. A comment is any line with a 'C' in column one or everything after a '!' on a line. A string constant is anything delimited 6 A Simple Spelling Checker For EVE by single quote marks on a line. Two passes through the file are made. The first checks comments and the second checks character constants. 5.1.6 SPELL MACRO To spell check a MACRO source code file enter SPELL MACRO or SPELL M. Only comments are spell checked. A comment is is everything after a ";" on a single line. 5.1.7 SPELL RNO To spell check a RUNOFF source file enter SPELL RUNOFF or SPELL R. Spell check every line except those with a period in column one. 5.2 THE LOAD COMMAND The LOAD command creates a new buffer and fills it with the words in the user dictionary. If no user dictionary is found the buffer is filled with a default set of words. After filling the new buffer, the buffer may be edited (changed) to reflect the users current needs. 5.3 THE UPDATE COMMAND The UPDATE command initialize the user dictionary and fills it with the words found in the current buffer. If no user dictionary exists one is created. If one already exists it is over-written. The words are inserted into the user dictionary in lexical sort order. If the current buffer was created by the LOAD command the user returns back to the previous buffer. If an error occurs during the update step the user dictionary may be left in an undefined state. Stop and correct the problem. 5.4 THE CRTL D KEY Delete the current line. 7 A Simple Spelling Checker For EVE 5.5 THE CRTL L KEY Display information about the location of the current line in relation to the complete file. 5.6 THE CRTL P KEY Display the current position of the cursor on the line. 6 HINTS AND KINKS The replacement word is not checked for spelling. You can replace a misspelled word with another misspelled word. 7 DEFINITION OF A WORD In the spelling checker(s) a word is defined as one or more contiguous alphabetic characters (case insensitive). Internally words are stored in 32 byte arrays. The first byte of the array is the length of the word followed by the word. 8 DICTIONARY SEARCH The dictionary files are search in the following order: user dictionary, project dictionary, common dictionary index and then the common dictionary. The first three dictionaries are in memory and a binary search used. The common dictionary is searched sequentially starting with the record number supplied by searching the common dictionary index. The common dictionary index contains every 20th word in the common dictionary file. 9 POSSIBLE FUTURE IMPROVEMENTS Expand the special SPELL commands (C, FORTRAN, etc.) abilities. Tune the common dictionary access algorithm to minimize the word access time. Convert the common dictionary from a VAX/VMS relative file to an ISAM file with the key being the word. This would eliminate searching the common dictionary index file. 8 A Simple Spelling Checker For EVE Rewrite the dictionary search code (currently FORTRAN) in a more appropriate language such C or MACRO. Add the spelling checker to EDT. Make the common index, project and user dictionaries mapped global sections. Allow the user to add words to a special dictionary that only exists for an given edit session. Room has been left in the common dictionary index data structures for this purpose. 10 USER MODIFICATION OF EXTENDED EVE It is expected that most users will want to add their own extensions to the editor. There are several ways of doing this. One is to modify the EDITOR.TPU. This would make it difficult to receive updates and corrections to the current editor. The best way is to use the /COMMAND qualifier when executing the editor. This slows down image activation if there are many additions. On the plus side the user gets the latest and greatest version of the editor every time. A third way is to extend the editor and keep a local *.GBL file. If a user does this and uses the procedure TPU$LOCAL_INIT, the following should be included to insure that the spelling checker works correctly. eve$arg1_spell := 'string'; dictionary$available := 0; dictionary$buffer := 0; default$buffer := 0; define_key ('eve_what_line', CTRL_L_KEY); define_key ('eve_delete_line', CTRL_D_KEY); define_key ('eve_show_position',CTRL_P_KEY); 11 SPELLING CHECKER FILES AND DESCRIPTIONS The following is a description of the files that make up the spelling checker(s). The source code files are not described. 9 A Simple Spelling Checker For EVE 11.1 AAAREADME.MEM A brief description of the spelling checker(s). generated by RUNOFF. 11.2 BUILD_ALL.COM A DCL command file that compiles and links all of the spelling checker programs. It also rebuilds all libraries. 11.3 BUILD_COMMON_DICT.EXE This program builds the common dictionary and index files from the text file COMMON_WORDS.DAT. COMMON_WORDS.DAT is a file that contains common dictionary source words (one per line). The first set of contiguous non-blank characters is used as the word. COMMON_WORDS.DAT must be in ascending (lexical) sort order. 11.4 BUILD_PROJECT_DICT.EXE This program builds the project dictionary file from the text file PROJECT_WORDS.DAT. PROJECT_WORDS.DAT is a file that contains dictionary source words (one per line). The first set of contiguous non-blank characters is used as the word. 11.5 BUILD_USER_DICT.EXE This program builds the user dictionary file from the text file USER_WORDS.DAT. USER_WORDS.DAT is a file that contains dictionary source words (one per line). The first set of contiguous non-blank characters is used as the word. 11.6 COMMON.DICT Common dictionary file created by BUILD_COMMON_DICT.EXE. This file is a VAX/VMS relative file with fixed length records of 32 bytes. 10 A Simple Spelling Checker For EVE 11.7 COMMON.INDEX Common dictionary index file created by BUILD_COMMON_DICT.EXE. 11.8 COMMON_DICT.RPT Report file generatd by COMMON_DICT_REPORT. 11.9 COMMON_DICT_REPORT.EXE Generate a report containing all of the words in the common dictionary. The report is generatd from the file COMMON.DICT. 11 A Simple Spelling Checker For EVE 11.10 COMMON_WORDS.DAT 91,000+ words used to build the common dictionary and common dictionary index. It is a combination of DICT_1_WORDS.DAT and DICT_2_WORDS.DAT. The following statistics describe the file: Word size and count 1 3 Word size and count 2 69 Word size and count 3 780 Word size and count 4 2952 Word size and count 5 5872 Word size and count 6 9500 Word size and count 7 12785 Word size and count 8 13800 Word size and count 9 12794 Word size and count 10 10708 Word size and count 11 8148 Word size and count 12 5690 Word size and count 13 3771 Word size and count 14 2249 Word size and count 15 1273 Word size and count 16 673 Word size and count 17 361 Word size and count 18 139 Word size and count 19 80 Word size and count 20 33 Word size and count 21 13 Word size and count 22 9 Word size and count 23 1 Word size and count 24 2 Word size and count 25 1 Word size and count 26 0 Word size and count 27 1 Word size and count 28 0 Word size and count 29 0 Word size and count 30 0 Word size and count 31 0 Word size and count 32 0 Total word count 91707 Average word length 7.75 Common Dictionary Index Statistics Pointers used 4587 pointers free 113 Byte buffer used 44752 Byte buffer free 1248 12 A Simple Spelling Checker For EVE 11.11 DICT_1_WORDS.DAT 87,000+ words that could be used to build the common dictionary and common dictionary index if a smaller dictionary is needed. This dictionary originaly came from an old LBL software tools distribution on a DECUS tape and was modified locally. The following statistics describe the file: Word size and count 1 3 Word size and count 2 62 Word size and count 3 711 Word size and count 4 2684 Word size and count 5 5342 Word size and count 6 8730 Word size and count 7 12027 Word size and count 8 13176 Word size and count 9 12287 Word size and count 10 10333 Word size and count 11 7914 Word size and count 12 5561 Word size and count 13 3684 Word size and count 14 2207 Word size and count 15 1256 Word size and count 16 664 Word size and count 17 354 Word size and count 18 139 Word size and count 19 80 Word size and count 20 33 Word size and count 21 13 Word size and count 22 9 Word size and count 23 1 Word size and count 24 2 Word size and count 25 1 Word size and count 26 0 Word size and count 27 1 Word size and count 28 0 Word size and count 29 0 Word size and count 30 0 Word size and count 31 0 Word size and count 32 0 Total word count 87274 Average word length 8.81 13 A Simple Spelling Checker For EVE 11.12 DICT_2_WORDS.DAT 45,000+ words that could be used to build the common dictionary and common dictionary index if a smaller dictionary is needed. The following statistics describe the file: Word size and count 1 0 Word size and count 2 48 Word size and count 3 602 Word size and count 4 2332 Word size and count 5 4123 Word size and count 6 6226 Word size and count 7 7503 Word size and count 8 7113 Word size and count 9 6095 Word size and count 10 4698 Word size and count 11 3107 Word size and count 12 1937 Word size and count 13 1129 Word size and count 14 520 Word size and count 15 234 Word size and count 16 80 Word size and count 17 50 Word size and count 18 14 Word size and count 19 3 Word size and count 20 0 Word size and count 21 0 Word size and count 22 0 Word size and count 23 0 Word size and count 24 0 Word size and count 25 0 Word size and count 26 0 Word size and count 27 0 Word size and count 28 0 Word size and count 29 0 Word size and count 30 0 Word size and count 31 0 Word size and count 32 0 Total word count 45814 Average word length 6.98 11.13 EDITOR.GBL Extended EVE editor. 14 A Simple Spelling Checker For EVE 11.14 EDITOR.TPU TPU source code for the extended EVE editor. 11.15 LINK_TPU_CALLUSER.COM DCL command file that links the TPU call user shared image TPU_CALLUSER.EXE. Because of the special nature of the link this command file is provided. None of the other programs need a special link command. 11.16 PROJECT.DICT Sample project dictionary created by BUILD_PROJECT_DICT.EXE. 11.17 PROJECT_WORDS.DAT Sample word list for the project dictionary. This file is input to BUILD_PROJECT_DICT.EXE. 11.18 SPELL.MEM This document. Generated by RUNOFF. 11.19 SPELL.EXE Standalone spelling checker. 11.20 SPELL_INCLUDE.FOR A FORTRAN include file defining the internal dictionary data structures. Used by almost every routine. 11.21 SPELL_RV.EXE Standalone spelling checker that uses reverse video. It does not work well and only works on VT100 terminals. It was a quick 15 A Simple Spelling Checker For EVE test progam. It should be converted to use the VAX/VMS screen management routines. 11.22 SPELLIB.OLB An object library containing spell checker routines. This library must be linked to almost every program. 11.23 STATS.EXE Generates statistics from the text word files used to build dictionaries. The information generated is useful for determining the size of internal data structures that hold the dictionaries. 11.24 TPU_CALLUSER.EXE Routines that access the dictionaries for extended EVE. 11.25 TEST_COMMON_DICT.EXE A program that test the validity of the dictionary file built by BUILD_COMMON_DICT.EXE. Currently it does very little. 11.26 TEST_COMMON_INDEX.EXE A program that test the validity of the dictionary index file built by BUILD_COMMON_DICT.EXE. Currently it does very little. 11.27 TEST_PROJECT_DICT.EXE A program that test the validity of the dictionary file built by BUILD_PROJECT_DICT.EXE. Currently it does very little. 16 A Simple Spelling Checker For EVE 11.28 TEST_USER_DICT.EXE A program that test the validity of the dictionary file built by BUILD_USER_DICT.EXE. Currently it does very little. 11.29 USER.DICT Sample user dictionary built by extended EVE or BUILD_USER_DICT.EXE. 11.30 USER_WORDS.DAT Sample word list for the user dictionary. This file is input to BUILD_USER_DICT.EXE. 17