Determining connection source from honeyd.log – cymruwhois version

InfoSanity’s honeyd-geoip.py script has been useful for analysing the initial findings from a HoneyD installation, but one of weaknesses identified in the geolocation database used by the script was that a large proportion of the source IP addresses connecting to the honeypot environment weren’t none within the database. Markus pointed me in the direction of the cymruwhois (discussed previously)python module as an alternative. I’ve re-written the initial script, below:

#!/usr/bin/python
from cymruwhois import Client
import sys
logfile = open('/var/log/honeypot/honeyd.log', 'r')
source = []
for line in logfile:
    source.append(line.split(' ')[3])
src_country = []
src_count = []
c=Client()
results=c.lookupmany_dict(set(source))
for res in results:
    country = results[res].cc
    try:
        pos = src_country.index( country )
        src_count[pos] += 1
    except:
        src_country.append( country )
        src_count.append( 1 )
for i in range( 0, ( len( src_country ) - 1 ) ):
    sys.stdout.write( "%s:\t%i\n" %( src_country[i], src_count[i] ) )

So far this has resulted in far fewer unknown source locations, 249 using geoip compared to 3 using cymruwhois. The downside unfortunately is performance, the cymruwhois communicates with a remote host to gather information compared with the geolocation database that is already stored locally on the machine. Both perform some local caching of results/data however so I would expect the performane difference to decrease as larger datasets are analysed.
Using the newer script, based on the same 24hr data set, the top ten host countries communicating with InfoSanity’s honeyd environment are:

RU:     397
US:     234
TW:     179
BR:     158
CN:     123
RO:     107
DE:     101
IT:     96
JP:     91
AR:     86

— Andrew Waite

Join the conversation

1 Comment

  1. Hi Andrew,
    I had just found the tools available on the cymru site whilst browsing during my lunch break last week. Appreciate your work as my python skills aren’t too good yet 🙂
    Cheers,
    Cooper

Leave a comment

Your email address will not be published. Required fields are marked *