Analysis: Honeypot Datasets

Earlier this week Markus released two anonymised data sets from live Dionaea installations. The full write-up and data sets can be found on the newly migrated news feed here. Perhaps unsurprisingly I couldn’t help but run the data through my statistics scripts to get a quick idea of  what was seen by the sensors.
This caused some immediate problems, before the data was released Markus had contacted me to point out/complain that the performance from my script is ideal. Performance wasn’t an issue I had encountered, but the database from the sensor I run is ~1MB, the smaller of the released data sets is ~300MB, with the larger being 4.1GB. I immediately tried to rectify the problem and am proud to report,…
I failed miserably. I had tried to move some of the counting and loops from the python code and migrate to more complex SQL queries, working on the theory that working with large datasets should be more efficient within databases as they are designed for working with sets of data. Theory was proved false, actually increasing run-time by about 20%, so I won’t be releasing the changes. Good job I’ve never claimed to be a developer. All this being said, the script still crunches through the raw data in 30seconds and 3minutes respectively.
Without further ado, the Berlin data-set:

Statistics engine written by Andrew Waite –
Number of submissions: 2726
Number of unique samples: 133
Number of unique source IPs: 639
First sample seen: 2009-11-05 12:02:48.104760
Last sample seen: 2009-12-07 11:13:55.930130
SystemrRunning: 31 days, 23:11:07.825370
Average daily submissions: 87.935483871
Most recent submissions:
2009-12-07 11:13:55.930130,,, ae8705a7b4bf8c13e5d8214d374e6c34
2009-12-07 11:12:59.389940,, ftp://1:1@, 14a09a48ad23fe0ea5a180bee8cb750a
2009-12-07 11:10:27.296370,, tftp://, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:55:24.607140,, tftp://, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:43:48.872170,, ftp://1:1@, 14a09a48ad23fe0ea5a180bee8cb750a

And Paris:

Statistics engine written by Andrew Waite –
Number of submissions: 749518
Number of unique samples: 2064
Number of unique source IPs: 30808
First sample seen: 2009-11-30 03:10:24.591650
Last sample seen: 2009-12-07 08:46:23.657530
SystemrRunning: 7 days, 5:35:59.065880
Average daily submissions: 107074.0
Most recent submissions:
2009-12-07 08:46:23.657530,,, d45895e3980c96b077cb4ed8dc163db8
2009-12-07 08:46:20.985190,,, 94e689d7d6bc7c769d09a59066727497
2009-12-07 08:46:21.000540,,, 908f7f11efb709acac525c03839dc9e5
2009-12-07 08:46:18.398500,,, ed12bcac6439a640056b4795d22608da
2009-12-07 08:46:15.753080,,, 94e689d7d6bc7c769d09a59066727497

Still need to dig further into the data, they’ll be another post in the making if I uncover anything interesting…
— Andrew Waite

Join the conversation


    1. I didn’t do much mate, just working with others’ data. Need to work on getting my own system more ‘popular’, don’t get anywhere near the hit-rate of these systems. Still got a long way to go

  1. good day hacker.
    please i need a cc (credit card)
    here is my email address

Leave a comment

Your email address will not be published. Required fields are marked *