Starting with HoneyD

Since reading Virtual Honeypots I’ve been wanting to implement a HoneyD system, developed by Niels Provos. From it’s own site, HoneyD is:

a small daemon that creates virtual hosts on a network. The hosts can be configured to run arbitrary services, and their personality can be adapted so that they appear to be running certain operating systems. Honeyd enables a single host to claim multiple addresses – I have tested up to 65536 – on a LAN for network simulation. Honeyd improves cyber security by providing mechanisms for threat detection and assessment. It also deters adversaries by hiding real systems in the middle of virtual systems.

My initial experience getting HoneyD running was frustration to say the least. Going with Debian to provide a stable OS, the install process should have been as simple as apt-get install honeyd. While keeping upto date with a Debian system can sometimes be difficult, the honeyd package is as current as it gets with version 1.5c.
For reasons that I can’t explain, this didn’t work first (or second) time so I reverted to compiling from source. The process could have been worse, only real stumbling block I hit was a naming clash within Debian’s package names. HoneyD requires the ‘dumb network’ package libdnet, but if you apt-get install libdnet you get Debian’s DECnet libraries. On Debian and deriviates you need libdumbnet1.
HoneyD’s configuration has the ability to get very complex depending on what you are looking to achieve. Thankfully a sample configuration is provided that includes examples of some of the most common configuration directives. Once you’ve got a config sorted (the sample works perfectly for testing), starting the honeyd is simple: honeyd -f /path/to/config-file. There are plenty of other runtime options available, but I haven’t had time to fully experiment with all of them; check the honeyd man pages for more information.
As well as emulating hosts and network topologies, HoneyD can be configured to run what it terms ‘subsystems’. Basically this are scripts that can be used to provide additional functionality on the emulated systems for an attacker/user to interact with. Some basic (and not so basic) subsystems are included with HoneyD. Some additional service emulation scripts that have been contributed to the HoneyD project can be found here. As part of the configuration, HoneyD can also pass specified IP/Ports through to live systems, either more indepth/specialised honeypot system or a full ‘real’ system to combine low and high interaction honeypot.
I’m still bearly scratching the surface of what HoneyD is capable of, and haven’t yet transfered my system to a live network to generate any statistics, but from my reading, research and experimentation I have high expectations.
— Andrew Waite

Book Review: Virtual Honeypots

It took longer than I had wanted, but I have just finished reading through Virtual Honeypots: From Botnet Tracking to Intrusion Detection. The book is written by Niels Provos, creator of HoneyD (among other things) and Thorsten Holz.
Given the authors I had high expectation when the delivery came through, thankfully it didn’t disappoint. Unsurprisingly the first chapter provides an overview of honypotting in general, covering high and low interaction systems over both physical and virtual systems, additionally the chapter introduces some core tools for your toolkit.
The next two chapters cover both high and low interaction honeypots respectively. I really liked the coverage of hi-int honeypots, it was this idea that drew me towards honeypots in the first place the idea of watching an attacker carefully exploit and utilise a dummy system always appealed. The material provided gives a great foundation for starting with a high interaction honeypot and some best practice advice for how to do so securely and safely. While I have read many reports and case studies that involved honeypots I have had difficulty finding in depth setup information and advice, leaving high interaction honeypots feeling a bit like black magic. The author’s information cuts through all the mystery allowing the reader to get a firm understanding of the topic. Likewise the discussion of low-interaction honeypots was equally well covered, although as I’ve spent some time with low-int systems in the past this chapter was more of a refresher than providing unknown information as I had found with the hi-int section.
Given that Neils is one of the books authors, it shouldn’t be too much of a surprise that HoneyD is covered in depth. For me, this was the most useful section of the book. As honeyd is one of the older publicly available low-int systems I had mistakenly assumed that one of the newer systems would provide more functionality, after reading through the material and regularly going ‘ooh’ out loud honeyd is now firmly at the top of my ‘need to implement’ list.
The book also covers honeypot systems that are designed for specialised purposes. For malware collection, the authors mainly focus on Nepenthes, but also touch on Honeytrap among others. This was the only section that I found to be slightly dated, as the Nepenthes’ newly released sprirtual successor Dionaea was not covered. But as the fundamental material is very well explained, Nepenthes is still a very functional system and the inherent similarities between Nepenthes and Dionaea the material still useful regardless so the chapter still provides an excellent foundation if you’re wanting to start collecting malware.
An interesting chapter covers the idea of hybrid honeypots, which is the idea of using low-int systems to monitor and handle the bulk of traffic, while forwarding anything unknown or unusual to a high-int system for more indepth analysis of the attack traffic. Unfortunately at this point openly available hybrid systems are limited, with the more functional systems being kept closed by the researchers and companies that build them (but I have just found Honeybrid while looking for a good link for hybrid systems which I wasn’t aware of. Looks promising…)
The last chapter covering honeypot systems looks at client-side honeypots, designed to look for client-side attacks. As client-side attacks have become more prominent over the last few years this is an evolving area of research, but as the attack vector is newer than traditional attacks, the honeypot systems aren’t as mature as more traditional systems. This isn’t an area that I’m experienced with so I can’t comment too much on the systems detailed by the authors, but they cover several honeyclient systems in great detail, and I’m intending to use the chapter as a foundation for implementing the systems and techniques proposed.
As well as detailing the use of honeypot systems, the authors also provide a brilliant discussion of ways that attackers (or users) can determine that they are interacting with a honeypot system. While the detailed descriptions for ways to identify a honeypot system is interesting and important from a theoretical standpoint, from previous experience running honeypot systems there are more than enough attackers and automated threats that blindly assume the system is legitimate to still enable honeypots to provide plenty of benefit to the honeypot administrators.
The book finishes up with an fairly detailed discussion of both tracking botnets using the information gathered from honeypot systems (this chapter is available as a sample PDF download from thanks to InformIT, here) and analysing the malware sample reports provided by CWSandbox. While both chapters are useful in he context of honeypot systems I didn’t think there was enough room to provide the reader with anything beyond a general overview of the topics, which if you were interested in the topic enough to purchase the book, then the reader will likely already have a similar level of understanding to the information provided.
There is also a chapter covering case studies of actual incidents that were captured by the books authors during their research. I’ve always been a fan of case studies, so enjoyed this chapter, it definitely helps whet the appetite to implement the technologies covered by the book.
Overall I really enjoyed the book, if you’re interested in systems and network monitoring, honeypots or malware then this book should probably be on your bookshelf.
Andrew Waite

2009: A review

Well, the year is nearly over and it seems everyone is in a reflective mode so I thought I’d join in. And I’m glad I did, didn’t really just how turbulent year I’ve had. I’d better (on pain of death) start with the none technical, as it is around 12 months since I got engaged to my long-time girlfriend.
Back to the technical: The InfoSanity blog went live in February with the first post. Originally I was far from confident that I would be able to keep up blogging as I had a ‘fear’ of social media and web2.0, but nearly a year on I’m still here and despite some peaks and troughs posting articles regularly. I’ve found it a great platform for getting ideas out of my head and into practice, hopefully I’ve managed to be of benefit to others in the process.
Lab environment: February was also when I purchased the server for my virtual lab environment. This has got be the best buy of the year, providing a solid framework for testing and experimenting with everything else I have done this year. Lab environments also seem to be one of the areas that gathers a lot of interest from others, the two posts discussing configuration of virtual networks and guest systems were InfoSanity’s most popular posts this year by a good margin. In the process of improving my lab environment I also read Thomas Wilhelm’s Professional Penetration Testing book and reviewed it for the Ethical Hacker Network, for which I’m indebted to Don for organising.
Wireless: Included in my long list of purchases this year was an Alfa AWUS036H wireless card and a BU-353 GPS Reciever. This resulted in a basic attempt to write a utility to create maps from the results of wardriving with Kismet, whilst the short development time of the project was enjoyable it was promptly shelved once people introduced me to Jabra’s excellent giskismet. It also resulted in the creation of the still to be field-tested, James Bond-esque warwalking case.
Honeypots: Whilst I had had Nepenthes honeypot system running before the turn of the year, I hadn’t really worked with it in earnest until the first post on the subject in February, and subsequent statistic utilities. These posts also became the topic for my first experience with public speaking, for local (and rapidly expanding) technical group, SuperMondays. As the technology has improved the honeypot system has recently been migrated over to Nepenthes’ spiritual successor Dionaea. Over the year I have also had the pleasure and privilege of talking with Markus Koetter (lead dev of Nepenthes and Dionaea) and Lukas Rist (lead dev of Glastopf), these guys *really* know their stuff.
Public Speaking: As mentioned above I gave my first public talk for SuperMondays, discussing Nepenthes honeypots and the information that can be gathered from them. Unfortunately (or thankfully) there is only limited footage available for the session as the camera’s battery ran out of juice. My second session was for a group of Northumbria University’s Forensics and Ethical Hacking students as an ‘expert speaker’, and I still think they mistook me for someone else. This time a recording was available thanks to a couple of the students, full review and audio available here. My public speaking is still far from perfect, coming out at a rapid fire pace, but I’m over my initial dread and actually quite enjoy it. Hopefully they’ll be additional opportunities in the future.
Friends and Contacts: Throughout the year I have ended up in contact with some excellent and interesting people; from real-world network events like SuperMondays and Cloudcamp, old school discussions in forums (EH-Net) and IRC channels, to the ‘2.0’ of Twitter (@infosanity btw). Along with good debates and discussions I’d also like to think I’ve made some good friendships, too many people to name (and most wouldn’t want to be associated 😉 ) but you know who you are.
So that’s the year in brief, couple of smaller activities along the way, from investigating newly released attack vectors to trying my hand at lock picking. In hindsight it has been one hell of a year, and with some of the side projects in the pipeline I’m expecting 2010 to be even better. Onwards and upwards.
— Andrew Waite

Fuzzy hashing, memory carving and malware identification

I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.

Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.

I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.
One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.
Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.
After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.
To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.
With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.
From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.
Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.
–Andrew Waite
P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.

Analysis: Honeypot Datasets

Earlier this week Markus released two anonymised data sets from live Dionaea installations. The full write-up and data sets can be found on the newly migrated carnivore.it news feed here. Perhaps unsurprisingly I couldn’t help but run the data through my statistics scripts to get a quick idea of  what was seen by the sensors.
This caused some immediate problems, before the data was released Markus had contacted me to point out/complain that the performance from my script is ideal. Performance wasn’t an issue I had encountered, but the database from the sensor I run is ~1MB, the smaller of the released data sets is ~300MB, with the larger being 4.1GB. I immediately tried to rectify the problem and am proud to report,…
I failed miserably. I had tried to move some of the counting and loops from the python code and migrate to more complex SQL queries, working on the theory that working with large datasets should be more efficient within databases as they are designed for working with sets of data. Theory was proved false, actually increasing run-time by about 20%, so I won’t be releasing the changes. Good job I’ve never claimed to be a developer. All this being said, the script still crunches through the raw data in 30seconds and 3minutes respectively.
Without further ado, the Berlin data-set:

Statistics engine written by Andrew Waite – www.infosanity.co.uk
Number of submissions: 2726
Number of unique samples: 133
Number of unique source IPs: 639
First sample seen: 2009-11-05 12:02:48.104760
Last sample seen: 2009-12-07 11:13:55.930130
SystemrRunning: 31 days, 23:11:07.825370
Average daily submissions: 87.935483871
Most recent submissions:
2009-12-07 11:13:55.930130, 10.48.60.253, http://zonetech.info/61.exe, ae8705a7b4bf8c13e5d8214d374e6c34
2009-12-07 11:12:59.389940, 10.13.103.23, ftp://1:1@10.101.229.251:61751/ssms.exe, 14a09a48ad23fe0ea5a180bee8cb750a
2009-12-07 11:10:27.296370, 10.13.103.23, tftp://10.13.103.23/ssms.exe, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:55:24.607140, 10.183.36.128, tftp://10.183.36.128/ssms.exe, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:43:48.872170, 10.183.36.128, ftp://1:1@10.20.216.112:53971/ssms.exe, 14a09a48ad23fe0ea5a180bee8cb750a

And Paris:

Statistics engine written by Andrew Waite – www.infosanity.co.uk
Number of submissions: 749518
Number of unique samples: 2064
Number of unique source IPs: 30808
First sample seen: 2009-11-30 03:10:24.591650
Last sample seen: 2009-12-07 08:46:23.657530
SystemrRunning: 7 days, 5:35:59.065880
Average daily submissions: 107074.0
Most recent submissions:
2009-12-07 08:46:23.657530, 10.46.210.146, http://10.9.0.30:3682/udqk, d45895e3980c96b077cb4ed8dc163db8
2009-12-07 08:46:20.985190, 10.98.174.44, http://10.200.78.235:2708/lzhffhai, 94e689d7d6bc7c769d09a59066727497
2009-12-07 08:46:21.000540, 10.204.219.219, http://10.38.56.49:6968/tyhxqm, 908f7f11efb709acac525c03839dc9e5
2009-12-07 08:46:18.398500, 10.174.62.175, http://10.108.210.203:3058/pghux, ed12bcac6439a640056b4795d22608da
2009-12-07 08:46:15.753080, 10.39.96.46, http://10.132.244.66:3255/dhti, 94e689d7d6bc7c769d09a59066727497

Still need to dig further into the data, they’ll be another post in the making if I uncover anything interesting…
— Andrew Waite

Starting out with Glastopf

I’ve been lax in writing up my initial experience with Glastopf. For those new to Glastopf, initially created by Lukas Rist as part of the Google summer of code program in collaboration with the Honeynet Project and Thorsten Holz.
I must admit that I found the installation of Glastopf to be a complete nightmare. Although this is mostly due to my systems lack of some of the Python pre-requisites that I needed to compile from source, which in turn had other unmet pre-requisites, which in turn… you get the idea. But I did manage to get my install complete eventually, and have learnt a few things in the process, so it can’t be all bad.
At this point I also need to thank the guys from the #glastopf irc channel on freenode. The advice and suggestions provided made the job much easier than it could have been, and simplified my initial testing of the system once working.
My Glastopf system has been running a couple of weeks and I’m starting to take a closer look logs being recorded. I’m not entirely sure what I was expecting as a result of the install, I must confess to being a little disappointed so far, but as I’m no expert in the realm of web applications the findings may mean more to those with more insight.
Overall I have logged several scans for various resources, I’m assuming looking for vulnerabilities in installed services. Nothing too unexpected for example scans for Roundcube mail or phpMyAdmin installations.
I have also found some links to inocious, legitimate online resources. Again I am no expert with web attacks (one of my motivations for installing a web honeypot in the first place was to learn more about them), but I am assuming that this was to test the effect of a particular attack vector before providing host systems with malicious URLs in the logs for an unsuccessful attack. If anyone knows I’m wrong, or can provide a better explanation I’d appreciate a heads up.
With this installation the InfoSanity honeytrap environment is slowly expanding to show a wider and more indepth understanding of live attack vectors targetting production systems.
— Andrew Waite

New dionaea statistics script

Following on from my work with gathering statistics from the Honeypot systems that I run I have released a limited alpha of a new script/tool that I am working on. The tool provides access to common result sets from the sqlite database, without the requirement for remembering the database architecture  and entering lengthy SQL statements by hand.
Disclaimer first: the tool doesn’t do anything outrageously new, and most of the SQL queries have been borrowed from Markus’ post on SQL logging with Dionaea when the feature was first introduced. However I have found the script makes my analysis of the honeypot logs simpler and quicker, and I’ve a positive reaction from a limited few that have had a copy of the script before this post. Hopefully it will be of use others.
Usage is relatively simple, shown below:

Dionaea database query collection
Author: Andrew Waite – www.InfoSanity.co.uk
Inspiration from carnivore.it article:
http://carnivore.it/2009/11/06/dionaea_sql_logging
Usage:
/path/to/python dionaea-sqlquery.py –query #
Where # is:
1:      Port Attack Frequency
2:      Attacks over a day
3:      Popular Malware Downloads
4:      Busy Attackers
5:      Popular Download Locations
6:      Connections in last 24 hours

The script can be found here. There is still a good level of work to be undertaken to tidy up the output, potentially allowing for output in different formats, and I also want to add additional and more complex queries as time progresses. If you have any success,  failure, comments or suggests please let me know.
— Andrew Waite

Expert speaker session at Northumbria University

Last week I had the pleasure of being asked to speak at Northumbria University, presenting to students of the Computer Forensics and Ethical Hacking for Computer Security programmes. As I graduated from Northumbria a few years ago it was interesting to come back to see some familiar faces and have a look at how the facilities had developed.
Despite the nerves of having to speak in front of a crowd I really enjoyed the event, especially as the other speakers were excellent and I enjoyed their sessions. The event kicked off with Dave Kennedy, a soon to retire member of Durham Police’s computer crime unit. Dave’s talked about his personal experience with a couple of high profile cases, explaining some of the groundwork and behind the scenes activity that isn’t known to the general public. I found the information interesting; but also disturbing, given the nature of the material that is handled by Dave and his department I can safely state that I wouldn’t want to have much experience in the area.
Next up was Phil Byrne, an internal auditor for HM Revenue and Customs (HMRC). For those that don’t know, HMRC were/are at the centre of one of the UK’s largest data loss stories in 2007 after CDs containing approximately 25 million child benefit records were sent, unencrypted, by standard post and did not reach their intended destination (some backstory here). Phil talked openly about the incident, discussing both the incident itself and the changes made in response. One of Phil’s comments has stayed with me (if I’m mis-quoting someone let me know):

If you put people into the process, something will go wrong at some time

Third to the stand was Gary Witts, owner of a manage services company specialising in on-line backups. The talk was very indepth and had some interesting content, but from my perspective I felt it was more of a sales pitch than a technical discussion of the secure backup’s place within a security standing.
I took the fourth and final slot of the day, which left me with the unenviable position of being between around 100 students and the pub, which didn’t help my usual rapid-fire presentation style. My presentation took a different focus from the previous sessions, discussing some of the real-world security incidents that can regularly be encountered, and some advice on handling the incidents in question. I also discussed my findings from honeypot systems, introducing a less common method for monitoring an environment for malicious activity. Assuming the feedback I’ve recieved is genuine the presentation seems to have been well-recieved.
From a student’s perspective; Tom was in the audience and has been writing up his take on the event in a series of blog postings. Tom also recorded the talks, for any one interested a direct link to my session is available here.
Andrew Waite

Rise of explo.it database

The team from Offensive Security have just announced the opening of explo.it (re-directs to exploits.offensive-security.com, just more memorable). The site is designed as a successor to milw0rm. If you’ve ever browsed the milw0rm site the layout will be instantly familiar.
I think this is great news for the infosec community, not only does the OffSec team always produce high quality output, but it helps provide some stability in the wake of milw0rms recent uncertainty.
At this point the site’s content volume is growing rapidly, when I looked this morning the archives exploits numbered around 9000, already it has reach 10000+, and a refresh of the front page has this number increase a good percentage of the time.
One feature of the site that I do like is a link (where available) to the vulnerable version of the application or code. I believe this will make testing much easier as it removes the need to trawl the web for an often unsupported and unavailable old version of an application. I really hope that this feature will become popular and all/most of the published exploits will link to a download location for retrieving the vulnerable code where possible.
Happy exploiting (in your lab, obviously)
Andrew Waite

Article Review: Carving malware from memory

I’ve recently had the pleasure of talking with Leon van der Eijk which resulted in me getting the opportunity to review an article he had been working on. The focus of the article is to identify and collect malware samples from running processes within volatile memory. Given my predilection for malware collection and analysis Leon correctly guessed that I would enjoy the article, which does a great job of describing a method for collecting and analysing malware (and other files and processes) from RAM on a live Windows system
Leon’s method utilises Meterpreter’s memdump.rb script to collect the a snapshot of an infected system’s memory, then utilises Foremost to carve up the collected memory image into individual files which can then be analysed as normal. As the article has just been published today I won’t try to improve on the work already, but I would suggest giving it a read here.
My own forensics skills aren’t yet up to the level that I would like, but I was able to replicate Leon’s process relatively easily within my own lab environment, and without too many problems. This, along with my experience at Northumbria University last week (more later), has re-ignited my interest in improving my forensic skills, and has proved to me that some of the basic skills and techniques involved with the forensic process isn’t all black magic.
The article is definitely worth a read if you have an interest in either computer forensics and/or malware analysis. In case you missed it above, link to article: Carving malware from live memory. Keep up the good work Leon.
Andrew Waite