Determining connection source from honeyd.log – cymruwhois version
InfoSanity’s honeyd-geoip.py script has been useful for analysing the initial findings from a HoneyD installation, but one of weaknesses identified in the geolocation database used by the script was that a large proportion of the source IP addresses connecting to the honeypot environment weren’t none within the database. Markus pointed me in the direction of the cymruwhois (discussed previously)python module as an alternative. I’ve re-written the initial script, below:
#!/usr/bin/python from cymruwhois import Client import sys logfile = open('/var/log/honeypot/honeyd.log', 'r') source =  for line in logfile: source.append(line.split(' ')) src_country =  src_count =  c=Client() results=c.lookupmany_dict(set(source)) for res in results: country = results[res].cc try: pos = src_country.index( country ) src_count[pos] += 1 except: src_country.append( country ) src_count.append( 1 ) for i in range( 0, ( len( src_country ) - 1 ) ): sys.stdout.write( "%s:\t%i\n" %( src_country[i], src_count[i] ) )
So far this has resulted in far fewer unknown source locations, 249 using geoip compared to 3 using cymruwhois. The downside unfortunately is performance, the cymruwhois communicates with a remote host to gather information compared with the geolocation database that is already stored locally on the machine. Both perform some local caching of results/data however so I would expect the performane difference to decrease as larger datasets are analysed.
Using the newer script, based on the same 24hr data set, the top ten host countries communicating with InfoSanity’s honeyd environment are:
RU: 397 US: 234 TW: 179 BR: 158 CN: 123 RO: 107 DE: 101 IT: 96 JP: 91 AR: 86
— Andrew Waite