<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CTLUG &#187; parse</title>
	<atom:link href="http://www.supergluetech.com/wp/tag/parse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.supergluetech.com/wp</link>
	<description>Cookeville TN Linux Users Group</description>
	<lastBuildDate>Thu, 10 Jun 2010 05:21:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Good regex for parsing apache CommonLog format</title>
		<link>http://www.supergluetech.com/wp/2009/10/good-regex-for-parsing-apache-commonlog-format/</link>
		<comments>http://www.supergluetech.com/wp/2009/10/good-regex-for-parsing-apache-commonlog-format/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 19:55:49 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[parse]]></category>

		<guid isPermaLink="false">http://www.supergluetech.com/wp/?p=16</guid>
		<description><![CDATA[Simple regex and snipplet for parsing apache logs]]></description>
			<content:encoded><![CDATA[<p>Been working on a custom real-time logging system (kinda like Analytics) in Perl. Anywho, this is not the first time I have needed to parse through an Apache web log to mine some sort of data out and I figured I would share the regex to get the job done.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #339933;">/^</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">\S</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">\S</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">\S</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span> \<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span>\<span style="color: #009900;">&#93;</span>\<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span>\<span style="color: #009900;">&#93;</span> \<span style="color: #ff0000;">&quot;([^&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>\<span style="color: #ff0000;">&quot; (<span style="color: #000099; font-weight: bold;">\S</span>+) (<span style="color: #000099; font-weight: bold;">\S</span>+) <span style="color: #000099; font-weight: bold;">\&quot;</span>?([^&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>\<span style="color: #ff0000;">&quot;? <span style="color: #000099; font-weight: bold;">\&quot;</span>([^&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>\<span style="color: #ff0000;">&quot;/o</span></pre></div></div>

<p>Now this works fine in Perl and PHP, but I image it should be fine with any language that handles PCRE.</p>
<p>Heres a perl snipplet of how to capture the sub-patterns:</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #009900;">&#40;</span>
	<span style="color: #0000ff;">$remote_ip</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$rfc931</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$authuser</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$date_time</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$request</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$status</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$bytes</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$referer</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">$user_agent</span>
<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$4</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$5</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$6</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$7</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$8</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$9</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">my</span> <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$method</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$url</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$protocol</span> <span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #000066;">qw</span><span style="color: #009900;">&#40;</span> <span style="color: #339933;">-</span> <span style="color: #339933;">-</span> <span style="color: #339933;">-</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$request</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/([a-zA-Z]*)\s(\S+)\s(HTTP\/1\.[01])/</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
     <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$method</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$url</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$protocol</span> <span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$3</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Hope that helps!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.supergluetech.com/wp/2009/10/good-regex-for-parsing-apache-commonlog-format/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
