HSM Internal Operation (Confidential...do not disclose) The HSM package operates by an intercept driver (using the techniques I published a couple years back in an attempt to standardize such intercepts and avoid products making each other crash) which inserts some special FDT code in disk drivers (doing so in such a way that each disk unit gets its own intercept unit and thus at least potentially separate processing). It intercepts the IO$_ACCESS function and the IO$_DEACCESS function to handle file moving, though there is logic to intercept IO$_MODIFY and IO$_CREATE also which is used (when enabled) to notice extends or creates that will fail due to disk space exhaustion and to fire up the "make room" script should this happen so as to avoid open/create failure due to no disk space, if this can be done. The system assumes that files that have been migrated offline still have their file headers present, but that they are zero length. This is done generally by setting the EOF point of the file to 0 and using the truncate I/O call to free all space in the file again. Thus the file headers are never in an illegal state, as DEC-supported operations are used for space management. (The EOF pointer is set back to the original size after this for user convenience.) It further assumes that file migration is accompanied by adding an ACE to the file ACL. This ACE is a special application ACE tagged by a magic pattern so it can be clearly recognized. When a file is out-migrated, an ISAM file (one per disk) indexed by file ID gets information about the file name and new location which the inswap operation can then use to get it back. One must work with file IDs at this point since by the time one gets to the driver, that is the form the I/O request takes. In the intercept code, to avoid extra context switches, the driver saves the original I/O context and issues a new I/O request to read the ACL into a temporary buffer in pool. On return it finds our ACE if present. (If the ACE is too long it must get its associated daemon to do this and poke it back.) If the ACE is absent it just reissues the original I/O lower on the kernel stack and returns after deallocating the pool it used. If it is present, it is inspected for migration tags and softlinks. Softlinks are handled in the driver directly, replacing the file ID in the user FIB and changing the user channel to the correct device (and queueing it to the correct other intercept unit to be replaced at close time). The softlinks have a file ID of the link and the destination device name plus a few flag bits (for r/o softlinks) in the ACE itself. Thus they are blindingly fast. If the ACE flags that the daemon must be called, a message is composed and sent via EXE$WRITEMBX to a mailbox being read by the daemon. (This sort of communication is used for FQdriver, FRdriver and others and was chosen because it is a documented and supported interface which I prefer.) The message includes the address of a special kernel AST in the driver which the daemon calls, and the address of the pool structure which is used as an AST argument. This makes it easy for the AST code to get the original context back and issue the user's I/O for him again, AFTER the daemon has run, and then unwind everything and return. The daemon either does the inswap itself or spawns a subprocess to do it. Since the daemon is running as a normal program, it has access to all VMS and RMS (and network) facilities and can adopt whatever methods it pleases. Since at the time of the replacement the user open has NOT occurred yet, there are no locking problems. The daemon will have a record lock outstanding on its database record during this time, which doubles as a convenient way to ensure that other opens from elsewhere in the cluster are delayed until the current operation finishes; the daemon delays and retries when it finds the record locked for this reason. Therefore only the first open will do the inswap in a cluster with no further complexity. It should be noted that the daemon and processes in its job do not get filtered in their file access; neither does the XQP. There are a good many synchronization issues and details not covered here, but these are the basics. The extra I/O to read the ACE is quite low cost since the first I/O reads the file header from disk, but then the user open immediately afterwards finds it in the header cache. Doing a monitor file command will show that the header cache hit rate on disks rises to 50% or more with this program in use. Since there's no extra disk hit, the added time is minimal. (If one operates with header cache disabled, the hit is more noticeable.) User operation generally will not notice HSM, except that the inswap time might cause delays that would be a problem occasionally if users don't expect them. The system does NOT move files in active use, as that would induce delays that could not be predicted and which could be quite hazardous. (The SP/32 "Hierarchy" product does this, and when they move a hot file, they stop all processes accessing the hot file while they move it. If one of the processes happens to be doing realtime process control, monitoring a dose of something, or the like, that's too bad. While there are ways to move open files, the best of them involve at least trebling I/O load during the move and require adding extra code to the data read path, slowing down ALL reads. HSM on the other hand hits only the open path, where software designs ordinarily do not expect extreme speed.) File headers are marked with ACEs in the following format (extensible by adding more codes): ace: .byte length .byte ace$c_info ;info ACE = application ACE .word flags ;hidden, protected both get set .long info-flags ;1 means call the daemon .ascii /GEHS/ ;flag this is my HSM ACE .blkb data ;data, up to 244 bytes Data format: 00 - nothing. Terminates list 01 - "inspectme" - flags daemon should be called. Always accompanied by info-flags bit 1 set. 02 - "moveme" - flags that the file has been moved 06 - "softlink" - format: 06,len,flags, variable length. Flags and len are bytes. Devicename is used for ioc$searchdev; only text in the ACE. Descriptor is constructed on the fly. 07 - "temporary" - used for time tagging. Not used by anything at the moment. format: 07,len, Total 16 bytes. Softlink flags not fully defined; 0 is normal soft links, 1 is readonly links (softlink if readonly open, else move). Data is recorded in an ISAM file that is variable length, keyed by file ID. One file per disk is used, though for volume sets a file can have two aliases (using set file/enter) (or more), as the whole file ID including RVN is used. Format of the ISAM record needed: file ID (8 characters) file size (4 bytes) (We only care about # blocks) Flags (4 bytes). Flags move type, whatever... Strings. Variable size. Contains: S(original-file-spec) A(ace entry after ID part) E(softlink dest. filename) (if applicable) F(dest file spec) (if applicable) (not currently used) Currently the dest. filename is constructed algorithmically and includes the FID of the original file and filename and some parts of device name; files are stored in an area designated by a logical name and retrieved from there. The file moving script moves and marks a file at a time, taking care to truncate data ONLY if the file is safely moved elsewhere. Ordering of operations is such that the worst that can happen is that a file is not marked but is truncated, but with the copy actually in existence in the backup area. On "inswap" of a file, the ACE is erased.