|
Grep Fgrep Egrep
At this point grep and egrep depart from one another. egrep stands for extended grep. The POSIX 1003.2 standard defined a set of regular expression characters, called modern, extended, or full regular expressions. The regular expressions I cited earlier are frequently called older or basic regular expressions. There is some overlap between the two, and recent versions of grep can be made to behave like egrep by using the -E option. The egrep utility uses extended regular expressions, with a useful one being the plus (+) character, which works like the asterisk (*) but means "one or more" rather than "zero or more." Using egrep in the above example with a + instead of an * would cause the search to exclude "ct" because it doesn't contain one or more vowels. $ egrep 'c[aeiou]+t' somewords.txt cat coat coot cot cout cut $ $ grep 'c[aeiou][aeiou]*t' somewords.txt cat coat coot cot cout cut $ * = zero or more occurrences + = one or more occurrences ? = zero or one occurrence $ egrep 'c[aeiou]+t|p[aeiou]+l' somewords.txt cat coat coot cot cut cet cit pal paella paul paula peal peel pool $ In the following example, the first part of the command is entered on one line, and then Enter is pressed while the single quotes are still open. The shell prompts for additional input and continues to accept lines until the closing quote appears. Each individual line represents a separate search string to grep. This trick is useful with any version of grep. $ grep 'c[aeiou][aeiou]*t > p[aeiou][aeiou]*l' somewords.txt cat coat coot cot cut cet cit pal paella paul paula peal peel pool $ $ egrep '([Ss]ome|[Aa]ny)one' somewords.txt someone Someone anyone Anyone $
egrep grep meaning
[a-z]{2,4} [a-z]\{2,4\} Two through four characters
[a-z]{4} [a-z]\{4\} Exactly four characters
[a-z]{4,} [a-z]\{4,\} Four or more characters
[a-z]{,4} [a-z]\{,4\} Zero through four characters
character matches . Any character \. A period $ End of line \$ A dollar sign * Zero or more occurrences of the preceding expression \* An asterisk \ Nothing -- is an escape character \\ A backslash | Create an "or" branch between two expressions \| A vertical bar It can be hard to remember all of the grep and egrep characters that have a special meaning, and regular expressions are unfortunately far from regular. You have already seen that curly braces can be escaped in grep and, when escaped, acquire a special meaning. The same is true for parentheses and angle brackets. The following characters have special meanings in grep or egrep:
In egrep:
| ^ $ . * + ? ( ) [ { } \
In grep:
^ $ . * \( \) [ \{ \} \
^ $ . * \( \) [ \ \< \>
The last collection of grep or egrep search pattern options is in fact a simple shorthand for describing a class of characters. [:alpha:] Any alphabetic character [:lower:] Any lowercase character [:upper:] Any uppercase character [:digit:] Any digit [:alnum:] Any alphanumeric character (alphabetic or digit) [:space:] Any white space character (space, tab, vertical tab) [:graph:] Any printable character, except space [:print:] Any printable character, including the space [:punct:] Any punctuation (i.e., a printable character that ... [:cntrl:] Any nonprintable character
$ egrep '[[:digit:]]{10}' somenumbers.txt
1234554321
$
Pattern 2 searches for zip codes -- five digits followed by zero or one hyphen, followed by zero to four digits -- either with or without the following hyphen and four digit extension. Pattern 3 searches for lines containing P.O. Box number addresses by using a case-independent search for "p," followed by zero or one period, then zero or more spaces, zero or one period and one or more spaces, and finally "box" or "drop." This should match most of the styles of data entry for a P.O. Box, including "PO Box," "PO BOX," "P.O. Box," "P O Box," "P. O. Drop," and so on. Pattern 4 matches the word "cat" by searching for it where it's preceded by a beginning or line, or one or more spaces and followed by one or more spaces, or an end of line. This search will not match "concatenate."
1. egrep -n '\([0-9]{3}\)[0-9]{3}\-[0-9]{4}' somenumbers.txt
2. egrep -n '[0-9]{5}\-?[0-9]{0,4}' somenumbers.txt
3. egrep -in 'p\.? *o\. +(box|drop)' someaddresses.txt
4. egrep -n '(^| +)cat( +|$)' sometext.txt
|