Good regex for parsing apache CommonLog format
Been working on a custom real-time logging system (kinda like Analytics) in Perl. Anywho, this is not the first time I have needed to parse through an Apache web log to mine some sort of data out and I figured I would share the regex to get the job done.
/^(\S+) (\S+) (\S+) \[([^\]\[]+)\] \"([^"]*)\" (\S+) (\S+) \"?([^"]*)\"? \"([^"]*)\"/o
Now this works fine in Perl and PHP, but I image it should be fine with any language that handles PCRE.
Heres a perl snipplet of how to capture the sub-patterns:
my ( $remote_ip, $rfc931, $authuser, $date_time, $request, $status, $bytes, $referer, $user_agent ) = ( $1, $2, $3, $4, $5, $6, $7, $8, $9 ); my ( $method, $url, $protocol ) = qw( - - - ); if( $request =~ /([a-zA-Z]*)\s(\S+)\s(HTTP\/1\.[01])/ ) { ( $method, $url, $protocol ) = ( $1, $2, $3 ); }
Hope that helps!
No comments yet.