Good regex for parsing apache CommonLog format

Been working on a custom real-time logging system (kinda like Analytics) in Perl. Anywho, this is not the first time I have needed to parse through an Apache web log to mine some sort of data out and I figured I would share the regex to get the job done.

/^(\S+) (\S+) (\S+) \[([^\]\[]+)\] \"([^"]*)\" (\S+) (\S+) \"?([^"]*)\"? \"([^"]*)\"/o

Now this works fine in Perl and PHP, but I image it should be fine with any language that handles PCRE.

Heres a perl snipplet of how to capture the sub-patterns:

my (
	$remote_ip,
	$rfc931,
	$authuser,
	$date_time,
	$request,
	$status,
	$bytes,
	$referer,
	$user_agent
) = ( $1, $2, $3, $4, $5, $6, $7, $8, $9 );
my ( $method, $url, $protocol ) = qw( - - - );
if( $request =~ /([a-zA-Z]*)\s(\S+)\s(HTTP\/1\.[01])/ ) {
     ( $method, $url, $protocol ) = ( $1, $2, $3 );
}

Hope that helps!

  1. No comments yet.

  1. No trackbacks yet.

The featured post is HERE