Using grep and its alternatives for source code (ack/ag/git-grep/cgrep/sgrep/jq/xgrep) and fuzzy searches (agrep/tre)

grep@man print lines matching a pattern. In addition, two variant programs egrep and fgrep are available. egrep is the same as grep -E. fgrep is the same as grep -F.


# matching control
'-E,-F,-G,-P' interpret PATTERN as extended regexp, fixed string, basic regexp (default) or perl regexp
'-i/--ignore-case' case insensitive
'-w/--word-regexp' select only those lines containing matches that form whole words

# output control
'-c/--count' output only match count
'-l/--files-with-matches' output only file names
'-m/--max-count=NUM' stop at num matches
'-q/--quiet/--silent' dont write any output, exit immediately with zero status if any match is found
'--color[=always|never|auto]' surround the matched string in color

# output prefix
'-H/--with-filename' output file name for each match
'-n/--line-number' output match line number
'-A/--after-context=NUM','-B/--before-context=NUM','-C/--context=NUM' output 'NUM' lines before/after/around

# file selection
'--exclude=GLOB','--include=GLOB' exclude/include-only files whose base name matches GLOB
'-R/-r/--recursive' read files recursively

# regexp howto
'.' matches any single char
'[]' matches list of chars, eg: [:alnum:],[:digit:], '^[]' matches any chars not in
'^','$' match at begining/end
'?' '*' '+' '{n}' '{n,}' '{,m}' '{n,m}' match quantifiers
'|' matches either regexp
'()' group regexps
in basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning and must be backslashed

from grep examples and howto use grep
see why GNU grep is fast

ack/ack@man is a faster (skips unecessary files) grep like perl script optimized for code search. Searches current directory and recursively by default, ignores meta directories (.git) and binaries and backups (~), prints line numbers, highlines matches in color, supports perl regexp.

# install
$ sudo apt-get install ack-grep | sudo yum install ack (EPEL)
$(deb) sudo dpkg-divert --local --divert /usr/bin/ack --rename --add /usr/bin/ack-grep

ack [options] PATTERN [FILE...]

# matching control
'-w/--word-regexp' force PATTERN to match only whole words
'-Q,/--literal' quote all metacharacters in PATTERN, it is treated as a literal.

# file selection
'--[no]ignore-dir=DIRNAME' ignore/dont ignore directory
'--type=[no]TYPE' specify the types of files to include or exclude from a search
'--type-set=[NAME]=.[ext],.[another-ext]' adds types
'--help-type' list types

# output control
'-A/--after-context=NUM','-B/--before-context=NUM','-C/--context=NUM' output 'NUM' lines before/after/around
'-c/--count' output only match count
'--group/--nogroup' groups matches by file name

from ack@xmodulo

ag is like ack but faster, ignores ‘.gitignore,.agignore’.

# install
$ sudo apt-get install silversearcher-ag | sudo yum install the_silver_searcher (EPEL) | cinst ag (windows/chocolatery)

git-grep same as ack/ag but only for git repos.

git grep [options] [<pathspec>...]

# file selection (defaults to working directory)
'--cached' searches blobs registered in the index file
'--no-index' searches files in the current directory that is not managed by Git
'--untracked' also searches in untracked files

# matching control
'-E,-F,-G,-P' interpret PATTERN as extended regexp, fixed string, basic regexp (default) or perl regexp
'-i/--ignore-case' ignores case
'--max-depth DEPTH' decent at most DEPTH directories
'-w/--word-regexp' match the pattern only at word boundary
'-v/--invert-match' select non-matching lines
'-e,--and,--or,--nor,()' specify how multiple patterns are combined using Boolean expressions

# output control
'-c/--count' print only line number
'--color[=always|auto|never]' show colored matches
'-h/-H' suppress file name match
'-n/--line-number' prefix line number
'-q/--quiet' dont write any output, exit immediately with zero status if any match is found
'-A/--after-context=NUM','-B/--before-context=NUM','-C/--context=NUM' output 'NUM' lines before/after/around
'-p/--show-function' show preceding line with function name
'-W/--function-context' showing the whole function in which the match was found

cgrep/cgrep@ubuntu context-aware grep for source codes. Another alternative to ack/ag.

# install
$ wget | sudo apt-get install

cgrep [OPTIONS] [ITEM]

# context filters and semantic (generic)
'-c/--code' search in source code
'-m/--comment' search in comments
'-l/--literal' search in string literals
'-S/--semantic'"code" pattern: _, _1, _2... (identifiers), $, $1, $2... (optionals), ANY, KEY, STR, CHR, NUM, HEX, OCT, OR. 
e.g. "_1(_1 && $)" search for move constructors, "struct OR class _ { OR : OR <" search for a class declaration

# search for a variable
$ cgrep -r --identifier VARname

# search recursively for headers
$ cgrep -r --header "stdio.h"

# search for call (from any struct or pointer) to 'func' with '5' as 2nd argument
$ cgrep --code --semantic '_1 . OR -> func ( _2 , 5, _3 )' file.c

# show all lines containing "sort" but no "nest" in files with an extension .c, preceded by the name of the file
$ sgrep -o "%f:%r" '"n" _. "n" containing "sort" not containing "nest"' *.c

# show the beginning of conditional statements, consisting of "if" followed by a condition in parentheses, in files *.c
# ignore "if"s appearing within comments "/* ... */" or on compiler control lines beginning with '#':
$ sgrep '"if" not in ("/*" quote "*/" or ("n#" .. "n")) .. ("(" ..  ")")' *.c

from cgrep@github

sgrep grep for structured text files. The data model of sgrep is based on regions, which are non-empty substrings of text. Regions are typically occurrences of constant strings or meaningful text elements, which are recognizable through some delimiting strings.

# install
$ sudo apt-get install sgrep | $ sudo yum install sgrep (Olea)

# show all blocks delimited by braces
$ sgrep '"{" .. "}"' file.c
# show the outermost blocks that contain "sort" or "nest"
# sgrep 'outer("{" .. "}" containing ("sort" or "nest"))' file.c

from sgrep@man

jq@github command-line JSON processor in C (no extra dependencies).
You can use it to slice and filter and map and transform structured data, alternative to awk, sed and grep.

# install
$ sudo yum install jq (EPEL) | sudo apt-get install jq

$ cat json.txt
{"name": "Google", 
 "location": {"street": "1600 Amphitheatre Parkway","city": "Mountain View", "state": "California","country": "US"},
 "employees": [{"name": "Michael","division": "Engineering"},{"name": "Laura","division": "HR"},{"name": "Elise","division": "Marketing"}]

# parse object
$ cat json.txt | jq '.name' 

# parse nested object
$ cat json.txt | jq '' 
Mountain View

# parse array
$ cat json.txt | jq '.employees[0].name'

# extract specific fields from object
$ cat json.txt | jq '.location | {street, city}' 
{"city": "Mountain View","street": "1600 Amphitheatre Parkway"}

from How to parse JSON string via command line on Linux and jq tutorial

xgrep@man search content of an XML file

# install
$ sudo yum install xgrep (EPEL) | sudo apt-get install xgrep

'-x xpath' xpath specification of the elements of interest
'-s string' string format in base-element:element/regex/,element/regex/,... where base-element is the name of the elements within which a match should be attempted, the match succeeding if, for each element/regex/ pair, the content of an element of that name is matched by the corresponding regex. If multiple -s flags are specified, a match by any one of them is returned.

# find all person elements with "Smith" in the content of the name element and "2000" in the content of the hiredate element
$ xgrep -s 'person:name/Smith/,hiredate/2000/' *.xml

agrep@wiki “approximate grep” is a proprietary fuzzy grep. TRE/agrep@man is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.

# install
$ sudo apt-get install tre-agrep | sudo yum install agrep (EPEL)
$(deb) sudo dpkg-divert --local --divert /usr/bin/agrep --rename --add /usr/bin/tre-agrep

agrep [OPTION]... PATTERN [FILE]...

# regexp selection and interpretation
'-i/--ignore-case' ignore case distinctions
'-k/--literal' treat PATTERN as a literal string
'-w--word-regexp' force PATTERN to match only whole words
'-v/--invert-match' select non-matching records instead of matching records

# approximate matching settings
'-D/–delete-cost=NUM' set cost of missing characters to NUM
'-I/–insert-cost=NUM' set cost of extra characters to NUM
'-S/-–substitute-cost=NUM' set cost of incorrect characters to NUM
Note that a deletion (a missing character) and an insertion (an extra character) together constitute a substituted character, but the cost will be the that of a deletion and an insertion added together.
'-E/--max-errors=NUM' select records that have at most NUM errors.
'-#' select records that have at most # errors (# is a digit between 0 and 9)

# output control
'--color' show colored matches
'-c/--count' print only line number
'-s/--show-cost' print match cost
'-H/--with-filename' prefix with file name
'-l/--files-with-matches' only print file name

$ tre-agrep -5 -s -i resume example.txt

from How to do fuzzy search with tre-agrep


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s