As I started life as a Linux server admin I’m only too aware that many attackers see remote access functionality as a way into a system, and as SSH is the de facto standard for Linux access it is a prime target for attack. The stats collected by DShield give an indication to the extent of the problem.
As a result I’ve had the Kippo honeypot is something that I’ve had on my radar for a while. For a number of reasons I hadn’t found time to implement the system in a live environment, but a recent post on the Diatel blog suggested that installation may be quick and pain free.
Kippo is described by it’s author (Desaster) as:
Kippo is a medium interaction SSH honeypot designed to log brute force attacks and, most importantly, the entire shell interaction performed by the attacker.
Kippo is inspired, but not based on Kojoney.
Installation for me was painless, running a Debian system I downloaded the latest archive to disk, unpacked and installed the pyton-twisted package (I hadn’t read Mig5′s comment until after install so now need to go back and live on the bleeding edge…). I did hit a couple of problems when trying to start up the system, which is as simple as invoking ./start.sh
- First, I was logged in as root when I first tried to start the system (not clever I know, was testing…). Kippo encounters an error when started by a root user. As Desaster rightly states, it’s not wise to run Kippo as a root user anyway and running as a regular user resolves the issue.
- Second, when running as a normal user I got a ‘meaningful’ error of “Failed to load application: ‘NoneType’ object has no attribute ‘get’.” A quick piece of Google-fu lead me to this ticket, which explained Kippo was missing the file kippo.cfg, as explained copying kippo.cfg.dist to kippo.cfg correct the issue and produced a fully functional system.
There are a couple of key files that can be edited to change the feel of the system that is provided to malicious users:
- kippo.cfg contains runtime information including log location, fake hostname etc.
- kippo.tac contains an array ‘users’, which lists the username/password combination which the emulated SSH login will accept as ‘valid’.
- The honeyfs/ directory goes so far as to allow you to create a ‘real’ filesystem for the malicious user to interact with, potentially copying a live server’s filesystem to the directory to help camouflage the emulated system (after sensitive data is removed/sanitised obviously….). I haven’t tried this myself yet but is definitely on my to-do list.
From initial testing I’ve got high hopes for Kippo becoming a mainstay in my honeypot toolbox; the interaction session provided to a malicious user is reasonably convincing at first glance, and I particularly like the trick to keep users logged in after they think they’ve sent an ‘exit’ command to close the session, it could get some interesting results.
For post compromise analysis Kippo also provides some an interesting utility, utils/playlog.py. This allows you to replay a malicious terminal session in real-time, typos and all, to truely provide a feel for the malicious users interaction with the session. To help whet your apetite whilst I wait for someone to target my kippo installation, Kippo has a few demo’s of the playlog capabilities from compromise attempts. Get your demos here.
I’ve been a bit lax in writing this post; around a month ago Miguel Jacq got in contact to let me know about a couple of errors he encountered when running InfoSanity’s mimic-nepstats.py with a small data set. Basically if your log file did not include any submissions, or was for a period shorter than 24hours the script would crash out, not the biggest problem as most will be working with larger data sets but annoying non the less.
Not only did Miguel let me know about the issues, he was also gracious enough to provide a fix, the updated script can be found here. An example of the script in action is below:
cat /opt/dionaea/var/log/dionaea.log| python mimic-nepstats_v1-1.py
Statistics engine written by Andrew Waite – http://www.infosanity.co.uk
Number of submissions: 84
Number of unique samples: 39
Number of unique source IPs: 65
First sample seen: 2010-06-08 08:25:39.569003
Last sample seen: 2010-06-21 15:24:37.105594
System Uptime: 13 days, 6:58:57.536591
Average daily submissions: 6
Most recent submissions:
2010-06-21 15:24:37.105594, 22.214.171.124, emulate://, 56b8047f0f50238b62fa386ef109174e
2010-06-21 15:18:08.347568, 126.96.36.199, tftp://188.8.131.52/ssms.exe, fd28c5e1c38caa35bf5e1987e6167f4c
2010-06-21 15:17:08.391267, 184.108.40.206, tftp://220.127.116.11/ssms.exe, bb39f29fad85db12d9cf7195da0e1bfe
2010-06-21 06:29:03.565988, 18.104.22.168, tftp://22.214.171.124/ssms.exe, fd28c5e1c38caa35bf5e1987e6167f4c
2010-06-20 23:34:15.967299, 126.96.36.199, http://188.8.131.52/trying.exe, 094e2eae3644691711771699f4947536
– Andrew Waite
Amun has been running away quite happily in my lab since initial install. From a statistic perspective my wor has been made really easy as Miguel Cabrerizo has previously taken one of the InfoSanity statistic scripts written for Nepenthes and Dionaea and adapted it to parse Amun’s submission.log files.
Results generated from the script in my environment are below, if you’re wanting to get an overview of submissions from another Amun sensor the script has been uploaded alongside the other InfoSanity resources and is available here.
~$ cat /opt/amun/logs/submissions.log* | ./amun_submission_stats.py
Statistics engine written by Andrew Waite (www.infosanity.co.uk) modified by Miguel Cabrerizo (diatel.wordpress.com)
Number of submissions : 25
Number of unique samples : 25
Number of unique source IPs: 18
Origin of the malware:
Ukraine : 1
None : 7
Poland : 2
Romania : 1
United States : 8
Russian Federation : 2
Hungary : 1
Norway : 1
Bulgaria : 2
MS08067 : 13
DCOM : 12
Most recent submissions:
2010-05-31, 11:37:22, 184.108.40.206, 63.exe, acf5c09d547417fe53c163ec09199cab, MS08067
2010-05-30, 19:23:09, 220.127.116.11, 63.exe, 89b578839f1c39f79d48e5f9e70b5e2f, MS08067
2010-05-28, 10:27:03, 18.104.22.168, 63.exe, f7c4f677218070ab52d422b3c018a4ba, MS08067
2010-05-27, 16:23:14, 22.214.171.124, ssms.exe, 1f8a826b2ae94daa78f6542ad4ef173b, DCOM
2010-05-24, 19:46:35, 126.96.36.199, 63.exe, 53979f1820886f089a75689ed15ecf6e, MS08067
A comment on a recent post asked for a comparison between different honeypots, while this is far from conclusive and only focuses on a single aspect of the technologies one of InfoSanity’s Nepenthes sensors ‘saw’ more attacks in the last 24hrs than my Amun installation did in the almost three weeks shown above. As both are running within the same, small, IP allocation I think I’m safe in assuming that one IP isn’t actually receiving a disproportionate level of interest from the badguys and bots that are out there.
– Andrew Waite
InfoSanity’s honeyd-geoip.py script has been useful for analysing the initial findings from a HoneyD installation, but one of weaknesses identified in the geolocation database used by the script was that a large proportion of the source IP addresses connecting to the honeypot environment weren’t none within the database. Markus pointed me in the direction of the cymruwhois (discussed previously)python module as an alternative. I’ve re-written the initial script, below:
#!/usr/bin/python from cymruwhois import Client import sys logfile = open('/var/log/honeypot/honeyd.log', 'r') source =  for line in logfile: source.append(line.split(' ')) src_country =  src_count =  c=Client() results=c.lookupmany_dict(set(source)) for res in results: country = results[res].cc try: pos = src_country.index( country ) src_count[pos] += 1 except: src_country.append( country ) src_count.append( 1 ) for i in range( 0, ( len( src_country ) - 1 ) ): sys.stdout.write( "%s:\t%i\n" %( src_country[i], src_count[i] ) )
So far this has resulted in far fewer unknown source locations, 249 using geoip compared to 3 using cymruwhois. The downside unfortunately is performance, the cymruwhois communicates with a remote host to gather information compared with the geolocation database that is already stored locally on the machine. Both perform some local caching of results/data however so I would expect the performane difference to decrease as larger datasets are analysed.
Using the newer script, based on the same 24hr data set, the top ten host countries communicating with InfoSanity’s honeyd environment are:
RU: 397 US: 234 TW: 179 BR: 158 CN: 123 RO: 107 DE: 101 IT: 96 JP: 91 AR: 86
– Andrew Waite
Since posting my Python whois class it’s lead to a (relatively) high volume of search hits pointing people to it. So I’d like to apologise for inflicting my code on other people. After a recent post with the honey-geoip.py script I was pointed in the direction Team Cymru’s whois service and accompanying python script. If you’ve not come across the stuff released by Team-Cymru I would strongly suggest that you take a look. I always manage to find some interesting new info, three overall sections Monitoring, Services and Reading Room.
Making my life easier, Justin Azoff has released a Python module hosted on github for the whois.cymru.com service. Using the client is incredible simple as the sample code included in the package shows:
>>> import socket >>> ip = socket.gethostbyname("www.google.com") >>> from cymruwhois import Client >>> c=Client() >>> r=c.lookup(ip) >>> print r.asn 15169 >>> print r.owner GOOGLE - Google Inc.
Overall Justin’s client works faster than my own attempt, especially has it has functions specifically designed for bulk lookups. If you’re working with IP, whois or geolocation data I’d suggest giving the cymruwhois utility a look. Thanks to Justin and the Team Cymru people for releasing tools and info that make my work easier.
Following on from my work with gathering statistics from the Honeypot systems that I run I have released a limited alpha of a new script/tool that I am working on. The tool provides access to common result sets from the sqlite database, without the requirement for remembering the database architecture and entering lengthy SQL statements by hand.
Disclaimer first: the tool doesn’t do anything outrageously new, and most of the SQL queries have been borrowed from Markus’ post on SQL logging with Dionaea when the feature was first introduced. However I have found the script makes my analysis of the honeypot logs simpler and quicker, and I’ve a positive reaction from a limited few that have had a copy of the script before this post. Hopefully it will be of use others.
Usage is relatively simple, shown below:
Dionaea database query collection
Author: Andrew Waite – http://www.InfoSanity.co.uk
Inspiration from carnivore.it article:
/path/to/python dionaea-sqlquery.py –query #
Where # is:
1: Port Attack Frequency
2: Attacks over a day
3: Popular Malware Downloads
4: Busy Attackers
5: Popular Download Locations
6: Connections in last 24 hours
The script can be found here. There is still a good level of work to be undertaken to tidy up the output, potentially allowing for output in different formats, and I also want to add additional and more complex queries as time progresses. If you have any success, failure, comments or suggests please let me know.
– Andrew Waite
After too long away from the project I have been trying to implement some additional functionality to my submissions2stats script for parsing Nepenthes log files. Something that I’ve had in mind for a while is utilising Whois data to better analyse the source of the malware submissions.
I had assumed that this would be relatively simple, after all the ability to port any required functionality is an integral part of geek humour. This wasn’t to be the case this time as I was unable to find anything this time around (although I didn’t discover giskismet until after I’d wrote my kistmet2gmapstatic scripts). To cover the functionality I have written a short python class that queries a 3rd party whois service for a provided IP address and provides metods to access the returned data.
Whois information for 188.8.131.52
Inetnum: 184.108.40.206 – 220.127.116.11
N.B. Text is tab delimeted in actual usage
I’ve started adding the class’ functionality into my submissions2stats script. So far things are progressing well and hopefully I should be able to have an updated script available shortly.
I’ve spent the day adding some additional functionality to my GPS mapping proof of concept (original here).
The second release, kismet2gmapstatic-0_2.py, changes the scripts output to wrap the Google maps API call in a self contained HTML page, and contains multiple map images to mitigate the URL length limit.
The third release, kismet2gmapstatic-0_3.py, builds on the HTML framework and includes additional information on each mapped access point: SSID, channel and available encryption options.This will likely be the final release of kismet2gmapstatic in this form, the code has grown organically without any real planning and as a result is a hideous mess, but as a PoC I feel it has served it’s purpose. I still have several ideas and additional functionality that I would like to implement, so watch this space for similar tools in the future.
I’m still following my recent interest in wireless networks and devices. In the past month I gained a USB gps reciever (which I forgot to write about, may have a short review shortly). After adding gps capability to my wardrive setup I proceed to scan the local area, then hit a brick wall. There appears (could be my google-fu is failing me) a lack of available tools to meaningfully use the captured data.
After a lot of digging a stumbled upon this script, designed to parse the output from Kismet and generate a .kml file to import into Google Earth. Unfortunately, I’ve been unable to get this to work as Google Earth complains when opening the file. Could be a version issue so your mileage may vary, if anyone does get it working please let me know.
The PerryGeo code did however provide an excellent foundation to utilise the Kismet log file and generate different output. To this end I have released a basic proof of concept script that generates a static map via the Google Maps API. If you want to do anything similar, or want to extend or modify my image code I found the Google documentation to be invaluable.
To the tool itself, starting a disclaimer:
This tool should not be used for illegal or malicious purposes. It was created to visualise network locations and implemented encryption technologies, in an effort to enhance previous analysis of wireless network statistics.
For each discovered access point, the script places a marker on the map, colour coded to level of encryption: Open access points are green, WEP encrypted access points are yellow, whilst WPA encrypted APs are red.
The Google maps API appears to have a limit to the length of URL that it is able to support, as a result the script limits the plotted APs to the first 50 in a given Kismet xml log file. This should be sufficient for site surveys, but is less useful for mapping the results from a wardrive trip. I haven’t manage to locate any firm documentation on this limit, if anyone is able to shed any light or knows a workaround I’d appreciate a heads up.
Below is an example of the tools output (actually, it just outputs the URL, which in turn requests google create the image). The image is created from a subset of data collected during a drive around the Angel of the North.
This is still very early days for the tool (started coding 24hours ago) so any feedback, issues or feature requests would be appreciated. Download available here: kismet2gmapstatic-0_1b.py
This utility is substantially larger than my previous two releases (although still small) so I’ll not include source code here, head to Infosanity for the submissions2stats.py file. Usage is fairly simple, read logged_submissions file into stdin and let the script do it’s job.
Statistics are quite general at this stage, mainly compiling overall statistics from the log file including:
- Total number of submissions
- Number of unique malware samples (based on MD5 hashes)
- Number of unique source IPs
- Run time
- Average daily submissions
- Five most recent submissions
By default the script outputs plaintext to standard out, but this can be changed to HTML via the –output=html commandline flags.
I’m going to hold back releasing any example output from my own servers as I wanted to generate the statistics for use in an upcoming presentation I’m giving for local group Super Mondays. If you’re free and in the area (Newcastle, UK) on May 26th please stop by for the event and to say hi.
If you’re running a Nepenthes server I’d appreciate any feedback or issues running the script. I’m still looking to flesh the system’s capabilities out, so any suggestions/requests for additional features or statistics would be appreciated (contact(no-spam)[at]infosanity[dot]co[dot]uk ).
N.B. The latest versions of all Infosanity tools related to statistic generation for Nepenthes can be found here.