Parsing Linux Logs

I often spend countless hours pouring through server logs. If I think there’s a possible application issue I’ll be sifting through system and application logs. If there’s a connectivity issue I may be digging through Apache logs. Either way, without the proper tools it can be a tedious job. Oh heck, who am I kidding? It’s tedious regardless.

Let’s see if we can help you out a little.

If I could give any one tip regarding log files it would be this: pipe-delimit your logs! This is one of the biggest beefs I have and something I plan on rectifying. Not all of our logs do this and it’s a royal PITA trying to do research when your delimiter is a space and you have spaces in your data. I’ve seen the colon used also but what happens when you have the time (which you should)? The pipe (|) is the (imho) perfect delimiter…especially if you love awk like me. 🙂

Let’s look at this default log format as an example.

192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
127.0.0.1 - - [28/Jul/2006:10:22:04 -0300] "GET / HTTP/1.0" 200 2216

Now let’s pipe-delimit it.

192.168.2.20|-|-|[28/Jul/2006:10:27:10 -0300]|"GET /cgi-bin/try/ HTTP/1.0"|200|3395
127.0.0.1|-|-|[28/Jul/2006:10:22:04 -0300]|"GET / HTTP/1.0"|200|2216

Now there’s no confusion as to where each field sits.

In tracking this issue it’s been important to know how many times per day the problem arises. Since we store several days’ worth of logs it easy to get a daily count. If you’re not familiar with the -c option of grep, give it a shot. It gives you a count of how many times your search criteria was found.

Let’s use this sample log from http://www.monitorware.com.

64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|200|4523
64.242.88.10|-|-|[07/Mar/2004:16:10:02 -0800]|"GET /mailman/listinfo/hsdivision HTTP/1.1"|200|6291
64.242.88.10|-|-|[07/Mar/2004:16:11:58 -0800]|"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1"|200|7352
64.242.88.10|-|-|[07/Mar/2004:16:20:55 -0800]|"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1"|200|5253
64.242.88.10|-|-|[07/Mar/2004:16:23:12 -0800]|"GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oops HTTP/1.1"|200|11382
64.242.88.10|-|-|[07/Mar/2004:16:24:16 -0800]|"GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1"|200|4924
64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:31:48 -0800]|"GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1"|200|3732
64.242.88.10|-|-|[07/Mar/2004:16:32:50 -0800]|"GET /twiki/bin/view/Main/WebChanges HTTP/1.1"|200|40520

Let’s say you want to know how many 401 errors happened today. You can grep for it and then pipe the output to wc -l to get a count.

grep "|401|" sample.log | wc -l
       3

Notice that you need to include the pipe characters and enclose the search expression with quotes.

Here’s a sidebar tip. This log file is called sample.log and I wanted to create multiple copies. Here’s my trick since you can’t use cp to copy to multiple destinations

for target in sample-20160325.log sample-20160326.log; do cp sample.log $target; done;

So piping grep to wc works, but what about when you have multiple files? This is where the -c option comes into play.

ls -al sample*
-rw-r--r--@ 1 runlevl4  staff  1118 Mar 26 19:11 sample-20160325.log
-rw-r--r--@ 1 runlevl4  staff  1118 Mar 26 19:11 sample-20160326.log
-rw-r--r--@ 1 runlevl4  staff  1118 Mar 26 19:06 sample.log
 
grep "|401|" sample* | wc -l
       9

Now we know that we had 9 occurrences of the 401, but how many per file?

grep -c "|401|" sample*
sample-20160325.log:3
sample-20160326.log:3
sample.log:3

Now we can see exactly how many per day.

But grep is so last decade. awk is where it’s at, right? Right. 😉 So since we have a nice pipe-delimited file we can search by field. This is just what we need for expertly slicing and dicing. If we look back to our piped log we can count out that the HTTP response code is the 6th field. In awk we use the $ to denote the field, therefore we’re interested in $6. The ~ (tilde) is used to indicate “matches” and !~ indicates “does not match”. We’ll also use awk’s internal FS variable to specify the Field Separator (can be shortened to just F). The /xxx/ denotes our regular expression.

awk -F"|" '$6 ~ /401/' sample.log 
64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851

Beautiful! We quickly identified all records with a 401 result code. If we use a file wildcard we can see them all.

awk -F"|" '$6 ~ /401/' sample*
64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851

Unfortunately we don’t see the log the error occurred in. Let’s fix that.

for file in sample*; do export file; awk -F"|" '$6 ~ /401/ {count++} END {print ENVIRON["file"], count}' $file; done;
sample-20160325.log 3
sample-20160326.log 3
sample.log 3

Let me break this down for you.

for file in sample*;Begins a for loop to iterate through all files beginning with "sample".
do export file;exports the filename to a variable called "file".
awk -F"|" '$6 ~ /401/ {count++} END {print ENVIRON["file"], count}' $file;The meat of our command, this takes each file and increases the value of count for each occurrence. The awk ENVIRON command allows you to reference an environment variable (in this case our filename). Then we print the filename and the count.
done;This just closes the loop.

What if you just want to see what errors you had regardless of the result code? I’ve added a new log file to the mix and threw in some non-401 error codes. Let’s change the last command to find everything that isn’t a HTTP 200.

for file in sample*; do export file; awk -F"|" '$6 !~ /200/ {count++} END {print ENVIRON["file"], count}' $file; done;
sample-20160324.log 5
sample-20160325.log 3
sample-20160326.log 3
sample.log 3

We can see that we have a total of 14 failed requests. Let’s scale the command back a bit to see what they are.

awk -F"|" '$6 !~ /200/ {print FILENAME,">>" ,$0}' sample*
sample-20160324.log >> 64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|503|4523
sample-20160324.log >> 64.242.88.10|-|-|[07/Mar/2004:16:20:55 -0800]|"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1"|500|5253
sample-20160324.log >> 64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
sample-20160324.log >> 64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
sample-20160324.log >> 64.242.88.10|-|-|[07/Mar/2004:16:32:50 -0800]|"GET /twiki/bin/view/Main/WebChanges HTTP/1.1"|302|40520
sample-20160325.log >> 64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
sample-20160325.log >> 64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
sample-20160325.log >> 64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
sample-20160326.log >> 64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
sample-20160326.log >> 64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
sample-20160326.log >> 64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851
sample.log >> 64.242.88.10|-|-|[07/Mar/2004:16:06:51 -0800]|"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1"|401|4523
sample.log >> 64.242.88.10|-|-|[07/Mar/2004:16:29:16 -0800]|"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Config HTTP/1.1"|401|12851
sample.log >> 64.242.88.10|-|-|[07/Mar/2004:16:30:29 -0800]|"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1"|401|12851

I threw a couple of extra things in here for you. I’m not looping but I still want to display the filename. awk uses the FILENAME variable to reference the file passed in at the command line. The print command lets me specify which fields I want to print. First I print the filename, then I use “>>” as a separator. The $0 refers to the entire line.

And there you go. You’ve successfully searched your logs for important information! Pat yourself on the back.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.