Configure LogStash to read Apache HTTP Server logs and add GeoIP information in it.

LogStash is a tool that you can use for managing your logs. Basic idea is you configure logstash to read the log file, it enhances log records and then it writes those records to ElasticSearch. Then you can use Kibana to view your log files. I wanted to figure out where my web traffic is coming from, so i configured the LogStash server to read the HTTP server log, then used its geoip capability to find out the location of the request based on the ip of the request and store it in elastic search. This is how my logstash configuration looks like, before starting this i did download the GeoCity database from maxmind and configured LogStash to use it. Next i did start elasticsearch server on local machine to collect logs and used following command to start logstash server

java -jar logstash-1.3.2-flatjar.jar agent -f httpaccess.conf
Once logstash server was started i could see how it was parsing logs and posting them in elasticsearch. For example for the following log statement

129.143.71.36 - - [31/Aug/2011:08:35:17 -0700] "GET /favicon.ico HTTP/1.1" 200 3935 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10"
I could see logstash converting it into following JSON before posting it into elasticsearch

1 comment:

Anonymous said...

hi sunil , actually i follow your documentation for map reduce and doing the same like you, only difference is i ma using GeoIP2-City.mmdb this file and "DatabaseReader reader = new DatabaseReader.Builder(cityFile).withCache(new CHMCache()).build();" i am getting jackson bind exception on aws emr .....