GREP(1V) USER COMMANDS GREP(1V) NAME grep, egrep, fgrep - search a file for a string or regular expression SYNOPSIS grep [ -bchilnsvw ] [ -e expression ] [ filename... ] egrep [ -bchilnsv ] [ -e expression ] [ -f filename ] [ expression ] [ filename... ] fgrep [ -bchilnsvx ] [ -e string ] [ -f filename ] [ string ] [ filename... ] SYSTEM V SYNOPSIS /usr/5bin/grep [ -bchilnsvw ] [ -e expression ] [ filename... ] AVAILABILITY The System V version of this command is available with the System V software installation option. Refer to Installing SunOS 4.1 for information on how to install optional software. DESCRIPTION Commands of the grep family search the input filenames (the standard input default) for lines matching a pattern. Nor- mally, each line found is copied to the standard output. grep patterns are limited regular expressions in the style of ed(1). egrep patterns are full regular expressions including alternation. fgrep patterns are fixed strings - no regular expression metacharacters are supported. In general, egrep is the fastest of these programs. Take care when using the characters `$', `*', [, `^', `|', `(', `)', and `\' in the expression, as these characters are also meaningful to the shell. It is safest to enclose the entire expression argument in single quotes '...'. When any of the grep utilities is applied to more than one input file, the name of the file is displayed preceding each line which matches the pattern. The filename is not displayed when processing a single file, so if you actually want the filename to appear, use /dev/null as a second file in the list. OPTIONS -b Precede each line by the block number on which it was found. This is sometimes useful in locating disk block numbers by context. -c Display a count of matching lines rather than Sun Release 4.1 Last change: 30 November 1988 1 GREP(1V) USER COMMANDS GREP(1V) displaying the lines which match. -h Do not display filenames. -i Ignore the case of letters in making comparisons - that is, upper and lower case are considered identical. -l List only the names of files with matching lines (once) separated by NEWLINE characters. -n Precede each line by its relative line number in the file. -s Work silently, that is, display nothing except error messages. This is useful for checking the error status. -v Invert the search to only display lines that do not match. -w Search for the expression as a word as if surrounded by \< and \>. This applies to grep only. -x Display only those lines which match exactly - that is, only lines which match in their entirety. This applies to fgrep only. -e expression Same as a simple expression argument, but useful when the expression begins with a `-'. -e string For fgrep the argument is a literal character string. -f filename Take the regular expression (egrep) or a list of strings separated by NEWLINE (fgrep) from filename. SYSTEM V OPTIONS The -s option to grep indicates that error messages for nonexistent or unreadable files should be suppressed, not that all messages except for error messages should be suppressed. REGULAR EXPRESSIONS The following one-character regular expressions match a sin- gle character: c An ordinary character (not one of the special charac- ters discussed below) is a one-character regular expression that matches that character. Sun Release 4.1 Last change: 30 November 1988 2 GREP(1V) USER COMMANDS GREP(1V) \c A backslash (\) followed by any special character is a one-character regular expression that matches the spe- cial character itself. The special characters are: o+ `.', `*', `[', and `\' (period, asterisk, left square bracket, and backslash, respec- tively), which are always special, except when they appear within square brackets ([]). o+ `^' (caret or circumflex), which is special at the beginning of an entire regular expres- sion, or when it immediately follows the left of a pair of square brackets ([]). o+ $ (currency symbol), which is special at the end of an entire regular expression. A backslash followed by one of `<', `>', `(', `)', `{', or `}', represents a special operator in the regular expres- sion; see below. . A `.' (period) is a one-character regular expression that matches any character except NEWLINE. [string] A non-empty string of characters enclosed in square brackets is a one-character regular expression that matches any one character in that string. If, however, the first character of the string is a `^' (a circum- flex or caret), the one-character regular expression matches any character except NEWLINE and the remaining characters in the string. The `^' has this special meaning only if it occurs first in the string. The `-' (minus) may be used to indicate a range of consecutive ASCII characters; for example, [0-9] is equivalent to [0123456789]. The `-' loses this special meaning if it occurs first (after an initial `^', if any) or last in the string. The `]' (right square bracket) does not terminate such a string when it is the first character within it (after an initial `^', if any); that is, []a-f] matches either `]' (a right square bracket ) or one of the letters a through f inclusive. The four characters `.', `*', `[', and `\' stand for themselves within such a string of characters. The following rules may be used to construct regular expres- sions: * A one-character regular expression followed by `*' (an asterisk) is a regular expression that matches zero or more occurrences of the one-character regular expres- sion. If there is any choice, the longest leftmost Sun Release 4.1 Last change: 30 November 1988 3 GREP(1V) USER COMMANDS GREP(1V) string that permits a match is chosen. \(and\) A regular expression enclosed between the character sequences \( and \) matches whatever the unadorned reg- ular expression matches. This applies only to grep. \n The expression \n matches the same string of characters as was matched by an expression enclosed between \( and \) earlier in the same regular expression. Here n is a digit; the sub-expression specified is that beginning with the nth occurrence of \( counting from the left. For example, the expression ^\(.*\)\1$ matches a line consisting of two repeated appearances of the same string. Concatenation The concatenation of regular expressions is a regular expression that matches the concatenation of the strings matched by each component of the regular expression. \< The sequence \< in a regular expression constrains the one-character regular expression immediately following it only to match something at the beginning of a "word"; that is, either at the beginning of a line, or just before a letter, digit, or underline and after a character not one of these. \> The sequence \> in a regular expression constrains the one-character regular expression immediately following it only to match something at the end of a "word"; that is, either at the end of a line, or just before a char- acter which is neither a letter, digit, nor underline. \{m\} \{m,\} \{m,n\} A regular expression followed by \{m\}, \{m,\}, or \{m,n\} matches a range of occurrences of the regular expression. The values of m and n must be non-negative integers less than 256; \{m\} matches exactly m occurrences; \{m,\} matches at least m occurrences; \{m,n\} matches any number of occurrences between m and n inclusive. Whenever a choice exists, the regular expression matches as many occurrences as possible. ^ A circumflex or caret (^) at the beginning of an entire regular expression constrains that regular expression to match an initial segment of a line. $ A currency symbol ($) at the end of an entire regular expression constrains that regular expression to match Sun Release 4.1 Last change: 30 November 1988 4 GREP(1V) USER COMMANDS GREP(1V) a final segment of a line. The construction example% ^entire regular expression $ constrains the entire regular expression to match the entire line. egrep accepts regular expressions of the same sort grep does, except for \(, \), \n, \<, \>, \{, and \}, with the addition of: * A regular expression (not just a one- character regular expression) followed by `*' (an asterisk) is a regular expression that matches zero or more occurrences of the one- character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. + A regular expression followed by `+' (a plus sign) is a regular expression that matches one or more occurrences of the one-character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. ? A regular expression followed by `?' (a ques- tion mark) is a regular expression that matches zero or one occurrences of the one- character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. | Alternation: two regular expressions separated by `|' or NEWLINE match either a match for the first or a match for the second. () A regular expression enclosed in parentheses matches a match for the regular expression. The order of precedence of operators at the same parenthesis level is `[ ]' (character classes), then `*' `+' `?' (closures),then concatenation, then `|' (alternation)and NEWLINE. ---------------Regex info from ARCHIE (4) "regex" This is the DEFAULT search method. ed(1) regular expressions. Searches the database with the user (search) string which is given in the form of an ed(1) regular expression. NOTE: Unless specifically anchored to the beginning (with ^) or end (with $) of a line, ed(1) regular expressions have ".*" prepended and appended to them. For example, it is NOT NECESSARY to say prog .*xnlock.* since prog xnlock will suffice. Thus the regex match becomes a simple substring match. An "ed(1) regular expression" (from here on called RE) is the particular type of regular expression used in the "ed" editor under Unix. For those who are interested in all the gory details of REs see the help for "regex" (which is incomplete, at the moment :-(), otherwise what follows should be sufficient for most needs. A regular expression is a convenient way to search for a set of specific strings matching a pattern. To be able to specify such a pattern with only the ordinary set of printable character we have to co-opt some of them. For example in a RE the period means _any_ single character, while an asterisk, '*', means zero or more occurences of the *PRECEDING* RE. For example: knob - matches any string containing the substring 'knob' a*splat - matches strings that contain zero or more a's followed by the string 'splat' #.*# - would match anything containing a '#' followed by zero or more occurences of _any_ character, followed by another '#' Other special characters that may be useful are '[' and ']', which are used together. They can be used to specify either a set of characters to match or a set of characters to not match. An example of the first case is: [abcd] which matches any of one of the four letters, while an example of the second case is: [^abcd] in which the '^' _in_the_first_position_ means that any character _not_ in the list will be matched. As well, ranges can be specified with a '-'. [a-z] matches any lower case letter and, [^a-z] matches any character other than a lower case letter. Furthermore, you can specify multiple ranges such as: [%@a-z0-9] or [^A-Za-z] meaning: match '%' or '@' or any lower case letter or digit, and match any character other than a letter, respectively. When you want to match a character which has a special meaning you should precede it by a backslash, '\'. Some final examples of REs are: [Mm]ac\.txt - match anything containg the string "Mac.txt" or "mac.txt" [^aeiou][^aeiou]* - match any string consisting entirely of non-vowels foo-v[0-9]\.tar\.Z - match "foo-v0.tar.Z" through "foo-v9.tar.Z" Good luck, and remember that many things can be found with only a simple substring (e.g. latex).