JNET_LINKWATCH 01-003 Hunter Goatley goathunter@WKUVX1.BITNET August 5, 1991 JNET_LINKWATCH is a detached process that will query a remote BITNET node and wait for a response. If no response is received within a specified time period, a NETWORK alarm is generated suggesting that the link status be checked: %%%%%%%%%%% OPCOM 16-JUL-1991 11:16:28.23 %%%%%%%%%%% Message from user SYSTEM on WKUVX1 No response detected from BITNET node ULKYVM -- check the link status Optionally, a mail message can be sent to a list of addresses. This program grew from the frustration of rare communications problems that kept the BITNET line connected, but clogged. In an effort to alert operators that the line should be checked, this program periodically verifies that the remote node is reachable. This alarm is sent to all operator terminals enabled to receive NETWORK alarms. You can enable a terminal to receive these alarms using either of the following DCL commands: $ REPLY/ENABLE !Receive all classes $ REPLY/ENABLE=NETWORK !Receive only network alarms This program uses the documented Jnet API (Application Programmer's Interface) for Jnet v3.5. It will not work with older versions of Jnet. JNET_LINKWATCH can be configured via logicals. The logicals would normally be defined in the system logical name table (LNM$SYSTEM_TABLE), but may be defined in the process table for the detached process. Only one logical is required: JNET_LW_NODES. The equivalence string is expected to be the name of one or more nodes that are to be queried. Multiple nodes must be separated by commas. Building JNET_LINKWATCH ----------------------- To build JNET_LINKWATCH, just execute the supplied command procedure BUILD_JNET_LINKWATCH.COM. BLISS and MACRO sources are supplied; if you don't have BLISS, the MACRO version will be assembled. The commands needed to build JNET_LINKWATCH are: $ BLISS JNET_LINKWATCH,HG_SEND_MAIL $ LINK/NOTRACE JNET_LINKWATCH,HG_SEND_MAIL,SYS$INPUT/OPTIONS JANSHR/SHARE $ Running JNET_LINKWATCH ---------------------- JNET_LINKWATCH should normally be run as a detached process under the SYSTEM account. This would typically be done using a command similar to the following: $ RUN/DETACH/OUTPUT=NL: dev:[dir]JNET_LINKWATCH where dev:[dir] is replaced with the proper device and directory names. To aid in debugging, JNET_LINKWATCH writes status messages out to SYS$OUTPUT. This allows you to run JNET_LINKWATCH interactively and watch its actions. Should you want to capture the output of the detached process, add the /OUTPUT qualifier to the RUN command above: $ RUN/DETACH JNET_LINKWATCH/OUTPUT=JAN_LOG:JNET_LINKWATCH.LOG This line, and all related logical definitions, should be added to either JANSITE.COM or JANSITECOMMON.COM to ensure that the detached process is started each time Jnet is started. Stopping JNET_LINKWATCH ----------------------- The JCP command STOP should be used to stop JNET_LINKWATCH. You should specify the hook name, which is LNKWATCH by default. For example: $ RUN JAN_SYS:JCP JCP> STOP LNKWATCH If the detached process is deleted by any other means, you may need to remove the hook name before restarting JNET_LINKWATCH. To manually remove the hook, use the JCP command REMOVE/HOOK: $ RUN JAN_SYS:JCP JCP> REMOVE/HOOK LNKWATCH JNET_LINKWATCH Logicals ----------------------- All of the logicals used by are prefixed by the string JNET_LW_. Each is described below. These logicals are checked each cycle so that they can be changed at will without having to restart JNET_LINKWATCH. JNET_LW_NODES - One or more remote node names, separated by commas JNET_LW_HOOK - The Jnet hook name. Default is LNKWATCH. JNET_LW_CYCLE - Time between cycles. Default is 30 minutes. JNET_LW_DELAY - Timeout for remote response. Default is 30 seconds. JNET_LW_RCMD - Command to query remote node. Default: CPQ U OPERATOR. JNET_LW_PRCNAM - Detached process's name. Default is "Jnet LinkWatch." JNET_LW_MAIL_USERS - List of users to receive e-mail when a node is down. JNET_LW_NODES ------------- This is the only required logical. The equivalence string is treated as the name of one or more nodes to query. Multiple nodes should be separated by commas. Examples: $ DEFINE/SYSTEM/EXEC JNET_LW_NODES ULKYVM !Check one $ DEFINE/SYSTEM/EXEC JNET_LW_NODES "WKYUVM,ULKYVM" !Check two JNET_LW_HOOK ------------ The hook name can be displayed using the JCP command SHOW HOOKS. The default hook name is LNKWATCH. This name must be unique on the system; you can override the default hook name using JNET_LW_HOOK: $ DEFINE/SYSTEM/EXEC JNET_LW_HOOK CHECKIT JNET_LW_CYCLE ------------- The time between query cycles. The default is 30 minutes. The value of this logical must be a valid VMS time: $ DEFINE/SYSTEM/EXEC JNET_LW_CYCLE "0 01:00:00" !One hour JNET_LW_DELAY ------------- The amount of time to wait for a response from a remote node before assuming the link is down and generating the NETWORK alarm. The default is 30 seconds; you may need to increase this value if the node is several hops away. The value of this logical must be a valid VMS time: $ DEFINE/SYSTEM/EXEC JNET_LW_DELAY "0 00:00:45" !45 seconds JNET_LW_RCMD ------------ The command to be sent to the remote system can be any valid or invalid command, as long as only one response is generated. The default command, "CPQ U OPERATOR", queries to see if the operator is logged on. This command will generate a one-line response indicating that the operator is or isn't logged on. The only information JNET_LINKWATCH uses is the name of the remote node to make sure it matches the queried node. This means that the remote system can return an error like "Unrecognized command" and everything is still OK---all JNET_LINKWATCH cares about is that the response originates from the system to which the message was sent. If no response is received within the timeout period, the following alarm is generated: %%%%%%%%%%% OPCOM 16-JUL-1991 11:16:28.23 %%%%%%%%%%% Message from user SYSTEM on WKUVX1 No response detected from BITNET node ULKYVM -- check the link status If any response is received other than a response from the target node, an alarm is generated and the received response is included in the alarm message. For example, if a link is down, a node between the host and remote nodes may report something like "(WKYUVM) - Link ULKYVM is not connected". Because the response did not come from the target node, an alarm is generated: %%%%%%%%%%% OPCOM 16-JUL-1991 14:49:43.60 %%%%%%%%%%% Message from user SYSTEM on WKUVX1 Error querying BITNET node ULKYVM (WKYUVM) - Link ULKYVM is not connected JNET_LW_PRCNAM -------------- The default process name for the detached process is "Jnet LinkWatch." You can override the name using this logical: $ DEFINE/SYSTEM/EXEC JNET_LW_PRCNAM "Jnet BigBro" JNET_LW_MAIL_USERS ------------------ List of users who are to receive e-mail when a remote node is unreachable. Defining the logical causes JNET_LINKWATCH to supplement the OPCOM alarm to ensure that the proper personnel are notified when a link is down. The equivalence string is treated as one or more addresses to which e-mail is to be sent. Multiple addresses should be separated by commas. You can specify any valid VMS Mail address, including MX, PMDF, and Jnet addresses. Examples: $ DEFINE/SYSTEM/EXEC JNET_LW_MAIL_USERS POSTMASTER !Only one $ DEFINE/SYSTEM/EXEC JNET_LW_MAIL_USERS "POSTMASTER,SYSTEM" $ DEFINE/SYSTEM/EXEC JNET_LW_MAIL_USERS MX%"""OPERATOR@WKYUVM.BITNET""" Questions, comments, or suggestions are welcome. Hunter Goatley, VAX Systems Programmer E-mail: goathunter@WKUVX1.BITNET Academic Computing, STH 226 Voice: (502) 745-5251 Western Kentucky University Bowling Green, KY 42101