Lately I have been focused on e-mail archive migrations and often I find myself sifting through thousands of log files. The other day I needed to pull  entries from approximately  1500+ logs, roughly 8GB of data. Using grep I was able to pipe out the information that I was looking for into a file that I named execution_times.

grep -E -ron "Execution time:.{0,20}" > /media/RAMDISK-16GB/logs/execution_times

Grep: what are all these options:

-E Interpret PATTERN as an extended regular expression
-r recurse through all the files
-o show only the part of a line matching PATTERN
-n print line umber with output lines
. dot means to match almost any character (except for line break characters)
The {0,20} means include up to 20 characters after the search pattern Execution Time:

The resulting file was 300MB in size and 5 million lines long! For this particular task I realized that I needed to sort the results. Unfortunately the maximum number of rows for both Microsoft Excel and  LibreOffice Calc are capped at 1,048,576.

This is where I turned to the Linux sort command. The sort that I needed to perform involved the 5th column (left to right, separated by : ) This would allow me to zero in on the number of minutes certain processes within my application were taking to complete.

sort -nk5 -r -t : execution_times -o sby_minutes_execution_times

Sorted data
image006
Sort: what are all these switches:

-n indicates this will be a numerical sort
-t followed by : tells sort to use the : as a separator to define columns
-k followed by a number indicates which column to sort
-r reverses the sort, which in this case sorts largest to smallest values
-o defines the output file

That one liner alone sorted 5 million lines in just a matter of seconds!