Archive

Archive for the ‘Malware’ Category

Cuckoo Sandbox 101

It’s a while since I’ve found time to add a new tool to my malware environment, so when a ISC post highlighted a new update to Cuckoo sandbox it served as a good reminder that I hadn’t got around to trying Cuckoo, something that has now changed. For those that don’t know, from it’s own site:

[...] Cuckoo Sandbox is a malware analysis system.

Its goal is to provide you a way to automatically analyze files and collect comprehensive results describing and outlining what such files do while executed inside an isolated environment.

It’s mostly used to analyze Windows executables, DLL files, PDF documents, Office documents, PHP scripts, Python scripts, Internet URLs and almost anything else you can imagine.

Considering Cuckoo is the combined product of several tools, mostly focused around VirtualBox, I found install and setup was largely trouble free, mostly thanks to the detailed installation instructions from the tools online documentation. I only encountered a couple of snags.

No VMs

[2011-12-29 17:21:56,470] [Core.Init] INFO: Started.
[2011-12-29 17:21:56,686] [VirtualMachine.Check] INFO: Your VirtualBox version is: “4.1.2_Ubuntu”, good!
[2011-12-29 17:21:56,688] [Core.Init] INFO: Populating virtual machines pool…
[2011-12-29 17:21:56,703] [VirtualMachine] ERROR: Virtual machine “cuckoo1″ not found: 0x80bb0001 (Could not find a registered machine named ‘cuckoo1′)
[2011-12-29 17:21:56,704] [VirtualMachine.Infos] ERROR: No virtual machine handle.
[2011-12-29 17:21:56,705] [Core.Init] CRITICAL: None of the virtual machines are available. Please review the errors.

The online documentation specifies creating a dedicated user for the cuckoo process. Sound advice, but if you create your virtual guest machines under a different user (like I did, under a standard user account), then the cuckoo process cannot interact with the virtualbox guests. Either changing ownership of cuckoo, or specifically creating the guest VMs as the cuckoo user will solve the issue.

Creating Database

Last problem encountered was Cuckoo’s database, which if it doesn’t exist when the process will create a blank database. Which (obviously, in hindsight) will fail if the running user doesn’t have permissions to write to Cuckoo’s base directory.

cuckoo.py

With problems out of the way, Cuckoo runs quite nicely, with three main parts. the cuckoo.py script does the bulk of the heavy lifting and needs to be running before doing anything else. If all is well it should run through some initialisation and wait for further instructions:

/opt/cuckoo $ ./cuckoo.py
_
____ _ _ ____| | _ ___ ___
/ ___) | | |/ ___) |_/ ) _ \ / _ \
( (___| |_| ( (___| _ ( |_| | |_| |
\____)____/ \____)_| \_)___/ \___/ v0.3.1

www.cuckoobox.org
Copyright (C) 2010-2011

[2011-12-29 20:27:17,120] [Core.Init] INFO: Started.
[2011-12-29 20:27:17,719] [VirtualMachine.Check] INFO: Your VirtualBox version is: “4.1.2_Ubuntu”, good!
[2011-12-29 20:27:17,720] [Core.Init] INFO: Populating virtual machines pool…
[2011-12-29 20:27:17,779] [VirtualMachine.Infos] INFO: Virtual machine “cuckoo1″ information:
[2011-12-29 20:27:17,780] [VirtualMachine.Infos] INFO: \_| Name: cuckoo1
[2011-12-29 20:27:17,781] [VirtualMachine.Infos] INFO: | ID: 9a9dddd8-f7d6-40ea-aed3-9a0dc0f30e79
[2011-12-29 20:27:17,782] [VirtualMachine.Infos] INFO: | CPU Count: 1 Core/s
[2011-12-29 20:27:17,783] [VirtualMachine.Infos] INFO: | Memory Size: 512 MB
[2011-12-29 20:27:17,783] [VirtualMachine.Infos] INFO: | VRAM Size: 16 MB
[2011-12-29 20:27:17,784] [VirtualMachine.Infos] INFO: | State: Saved
[2011-12-29 20:27:17,785] [VirtualMachine.Infos] INFO: | Current Snapshot: “cuckoo1_base”
[2011-12-29 20:27:17,785] [VirtualMachine.Infos] INFO: | MAC Address: 08:00:27:BD:9C:4F
[2011-12-29 20:27:17,786] [Core.Init] INFO: 1 virtual machine/s added to pool.

submit.py

The submit.py script is one of the ways for getting cuckoo to analysis files:

python submit.py –help
Usage: submit.py [options] filepath

Options:
-h, –help show this help message and exit
-t TIMEOUT, –timeout=TIMEOUT              Specify analysis execution time limit
-p PACKAGE, –package=PACKAGE           Specify custom analysis package name
-r PRIORITY, –priority=PRIORITY              Specify an analysis priority expressed in integer
-c CUSTOM, –custom=CUSTOM                 Specify any custom value to be passed to postprocessing
-d, –download                                                   Specify if the target is an URL to be downloaded
-u, –url                                                                Specify if the target is an URL to be analyzed
-m MACHINE, –machine=MACHINE          Specify a virtual machine you want to specifically use for this analysis

Most of the options above are self-explanatory, just make sure to select the relevant analysis package depending on what you’re working with; possibilities are listed here.

web.py

Finally, web.py provides a web interface for reviewing the results of all analysis performed by cuckoo, bound to localhost:8080.

I’d like to thank the team that developed and continue to develop the cuckoo sandbox. I look forward to getting more automated results going forward and hopefully getting to a point where I’m able to add back to the project; until then I’d recommend getting your hands dirty, from my initial experiments I doubt you’ll be disappointed. But if you won’t take my word for it, watch Cuckoo in action analysing Zeus here.

– Andrew Waite

AVG & FUD?

Like most techies I get the job of fixing and maintaining relatives’ PCs. As part of this after fixing whatever is broken I have some common clean-up and install routines that I go through to both help the system run faster and to extend the period before I’m called back, and I’ve used AVG free as part of this for many years to keep costs down for my users.

During a recent job I came across a new (I’m assuming, hadn’t noticed it before) feature of AVG free, the PC Analyzer component. Being the curious sort I hit the go button, scan ran for around 5 minutes and I was presented with this:

PCAnaylzer-results

PCAnaylzer-results

Ouch, I was surprised with the number of errors as this is a machine I keep a regular eye on, and in some cases use myself (it’s the missus’). Time to panic? Let’s see:

  • Registry errors: Errors affect system stability: (125)

That doesn’t sound good, checking the ‘Details…’ link presented me with a long list Registry keys, which to a standard end-user would result in turning on BofH’s Dummy Mode. In reality, it found a lot of keys to set the ‘open with’ right-click function depending on file extension. ‘Affect system stability’? Not so much, and I find the links useful enough that I’ve previously researched how to add my own

  • Junk Files: These files take up disk space: (599)

Again checking the details, long list of randomly named files. In the temporary folder. All ~600 took a total of less the 300MB, and the machine has more the 200GB free. Something to correct come next house cleaning session, but not really a problem.

  • Fragmentation: Reduces disk access speed

In fairness to the tool, it did come back clean and we know that fragmentation can be an issue. But that’s why every machine I’ve ever used has come with a defrag utility, as standard, for free. (OK, my BBC Micro B didn’t, but then it also had a cassette deck rather than a hard disk).

  • Broken Shortcuts: Reduces explorer browsing speed(42)

Ok, so I forget a folder of shortcuts to junk that came pre-installed with the system. I’d deleted the junk, forgot the shortcuts. Thanks for the reminder, fixed.

Summary

Plenty of ‘problems’ highlighted, time to run out and drop £25 for an annual subscription to the clean-up tool? Nope, ignoring the fact that many of these issues are system settings that actually aid the end user, the remaining issues won’t have any negative impact that the end-user will notice.

In my own opinion, AVG is taking a leaf out of the fake AV scams and scaring non-techies into parting with their hard earned coin in a bid to keep the computer running and bank details away from the scary hackers that the nice lady on the news keeps taking about. Presenting a list of meaningless (to most) information and saying it’s bad is exactly the tactic I encountered with cold call scammers earlier in the year.

As a final side note, I’ve lost two of my ‘users’ this year to AVG simply because when the AVG free license I’d installed expired, they couldn’t find a link to download the latest free version, only MANY links to the paid version. As my users are nice people (latest ‘victim’ was my grandfather), they decided themselves that it was better for them to pay the small fee than have to call me and interrupt my life.

Can anyone recommend a free AV suite that doesn’t con the unwitting into unnecessary purchases to perform a cleanup that could be performed manually with around 5 minutes and half a clue? AVG Free is a great tool, and for free I shouldn’t really complain, but when the sales tactics change to make money selling things people don’t need, to those that don’t know any better?

–Andrew Waite

Categories: InfoSec, Malware, Tool-Kit

Cold calling IT Support

I’m sure by now most people are aware of a new round of scams where victims are being called by a ‘support company’ suggesting that the victim’s computer has malware installed which they can fix. If you need it, this BBC article covers the basics. Well, I just got the call ;)

First up the caller seemed to be auto-dialling large volumes of numbers looking for someone to pick-up as the caller (male, poor line quality meant I missed the name given) was unprepared when I answered. The caller was clearly reading from a script, I may have over-played the ‘Sorry, I’m just a dumb user that knows nothing about computers card’ but despite telling him I was clueless and willing to accept everything he told me I was still present with a long winded argument for ‘if you don’t believe us this is how I’ll prove it’ speech.

Unfortunately I wasn’t able to through the full process as, despite telling my new friend otherwise, I wasn’t able to get to a Windows machine to work through the process. Only laptop to hand was my netbook running Ubuntu, and my landline isn’t mobile so I couldn’t head up stairs. (My landline never rings, everything I do is via mobile and only have landline for ADSL connection. I’m suspicious of all landline calls before I even pick up the phone.)

After ensuring I was looking at the system wallpaper, I was instructed to press the ‘key on bottom left of keyboard with four squares that looks like the Microsoft logo’ and with another finger press the ‘r’ key. This is where I was given ‘proof’ that my system was infected, using a ‘hidden’ command that will list all infections, what is the magic command? inf (for ‘infections’), which opens Windows Explorer in C:\Windows\inf, screenshot below shows the infections on my system. I’m guessing at this point, the every user may have just entered dummy mode.

At this point I lost the caller, whether a technical fault or he’d guessed something wasn’t right (I can’t act for toffee). I’m hoping that I’ll get a second bite at the cherry at some point; my missus took a similar call a few weeks back, having spent too long listening to my security rants she immediately spotted the scam, pointed out that I was a ‘security guy’ and hung up. Information that they clearly didn’t have when ringing back (could be more that one cold calling organisation).

Unfortunately, despite my usual laughing at people who fall for these scams I can see how those with less knowledge could fall for the premise. Computers and software regularly phone home to check for updates etc, using this information to identify infected systems would/could make sense, and from an end user perspective I struggled to tell the difference between the sorts of actions I was asked to take by my ‘friend’ than those I regularly instruct friends and family members when I’m trying to provide remote support.

Be safe and spread the word to those less knowledgeable about computers that this is an active scam. Bottom line is: no legit IT company will call you to fix a problem that you weren’t aware of.

–Andrew Waite

Categories: InfoSec, Malware

Mercury – Live Honeypot DVD

Mercury Live DVD was initially (I believe) announced in a post to the Nepenthes Mailing list. It is a remastered Ubuntu distribution with pre-installed honeypot applications and malware analysis tools created by John Moore. From the ReadMe:

This live DVD is a remastered version of Ubuntu 10.0 Beta LTS x86_32. It was designed due to my being disappointed with another reverse engineering malware live CD that was released recently. I have decided to call my creation MERCURY, which is an acronym for Malware Enumeration, Capture, and Reverse Engineering.

The Mercury live DVD contains tools used for digital forensics, data recovery, network monitoring, and spoofing. It should primarily be used as a honeypot or network monitoring platform as well as a laboratory and teaching aid. There are three honeypots installed – honeyd, nepenthes, and dionaea. Four, if you include netcat.

The majority of the additional applications reside in /opt:

  • Dionaea (0.1.0) – Dionaea is a malware collection honeypot focusing primarily on SMB emulation, covered on InfoSanity numerous times before.
  • FFP – Fuzzy Fingerprinting is a util to aid SSH MitM attacks.
  • jsunpack-n – Is a Javascript unpacker, perfect for analysis captured or potentially malicious URLs in more depth.
  • Kippo (svn rev.169) – Kippo is an low-medium interaction SSH honeypot, Also covered
  • mitm-ssh – Unsurprisingly, a utility for aiding man in the middle attacks against SSH connections.
  • Origami & pdftools – Two frameworks for analysing malicious PDF files.
  • Volatility – an excellent memory analysis toolkit
  • Zerowine-vm – A malware behavior analysis platform. I’ve covered ZeroWine here before, and whilst I find it useful for initial analysis I found it a pain to setup and get running. The fact this works out of the box on Mercury is enough reason alone to keep the .iso handy.

Other tools are installed on the system as started, access from standard locations (/etc, /usr/bin, etc.). I won’t try to list them all, but some highlights include:

  • Nepenthes – Dionaea’s predecessor
  • Honeyd – Honeypot system, perfect for emulating multiple different systems from one platform. Covered in more depth here.
  • John – John the Ripper, password cracker
  • ircd-hybrid – irc server daemon, useful for analysis irc-based malware’s interaction with command and control systems.
  • Snort – de-facto intrusion detection system.
  • Wireshark – Packet capture and network analysis tools.

I could go on, but I’m sure you get the idea.

Setting up a honeypot, and analysing the results, has never been easier. And I’m sure the toolkit’s functionality will also be useful in other scenarios; incident response, general network administration or as a safe learning platform. So what are you waiting for?

–Andrew Waite

N.B. there have been several mirror’s and downloads established, the most reliable download source I’ve used is Markus’ mirror at carnivore.it

Example of post exploit utilities (SSH scanners)

2010/07/21 1 comment

So far my Kippo honeypot installation has recieved a number of successful log ins from maliciuos users, some of which have been helpful enough to provide some tools for further analysis. A lot of the archives which have been downloaded show that the kits have been in use for a while, with some archive timestamps going back as far as 2004 (of course this could simply be an incorrect clock on the machine that created the archive). Picking on the most recent download (2010-07-18) I’ve taken a look at the archive containing gosh.tgz.

The archive was downloaded from linux<dot>hostse<dot>com<slash>gosh<tgz>, system is down at time of writing but take care if attempting to investigate yourself. Before downloading the user checked around the system with commands: w, uname -a and cat /proc/cpuinfo, and archive was downloaded and extracted in /dev/shm/.

Once extracted, the archive contains a number of files:

1: ISO-8859 English text, with CRLF line terminators
2: ASCII text
3: ASCII C++ program text, with CRLF line terminators
4: ASCII text
5: ASCII text
a: ISO-8859 text, with CRLF line terminators
common: ASCII C++ program text
gen-pass.sh: Bourne-Again shell script text executable
go.sh: ASCII text
mfu.txt: ASCII text
pass_file: ASCII text
pscan2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.2.5, not stripped
scam: Bourne-Again shell script text executable
secure: Bourne-Again shell script text executable
ss: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.0.0,stripped
ssh-scan: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.0.0, stripped
vuln.txt: empty
  • Interesting files:
  • Files 1 to 5, common and pass_file are password lists, totalling 235,523 potential passwords.
  • mfu.txt is a list of IP addresses, mostly in the 38.99.0.0/16 address space.
  • pscan2 is a fairly common and generic port scanner.
  • scam is a shell script that appears to be the core brains of the toolkit. It essentially looks through scanning a different ranges of IP addresses while periodically emailing the contents of vuln.txt back to it’s master (mafia89tm@yahoo.co.uk).
  • ss: appears to be another scanner used for looking for potential targets.
  • ssh-scan: appears to be a Romanian tool from the message provided if run without arguments, according to Google Translate (possibly NSFW), and as you would guess from the file name is a scanner for SSH services.
  • vuln.txt is blank in the archive, and will be the output of vulnerable systems located by the scanners.

All told this appears to be a kit for performing further scans for unsecured SSH sessions, and it is likely that a similar kit hosted on a different compromised machine was responsible for identifying my installation in the first place. Kits like this also quickly show the problem with tracking down the malicious user behind an compromise or attempt, it is rare for attacks to be launched from systems that can easily be traced back to the malicious user.

A quick Google search confirms that this kit (and user) has been seen in the wild attacking other systems, this posting on the Shell Person blog writes up the aftermath after a production system was compromised by the same kit.

–Andrew Waite

Categories: Honeypot, InfoSec, Kippo, Malware

mimic-nepstats_v1-1.py

I’ve been a bit lax in writing this post; around a month ago Miguel Jacq got in contact to let me know about a couple of errors he encountered when running InfoSanity’s mimic-nepstats.py with a small data set. Basically if your log file did not include any submissions, or was for a period shorter than 24hours the script would crash out, not the biggest problem as most will be working with larger data sets but annoying non the less.

Not only did Miguel let me know about the issues, he was also gracious enough to provide a fix, the updated script can be found here. An example of the script in action is below:

cat /opt/dionaea/var/log/dionaea.log| python mimic-nepstats_v1-1.py

Statistics engine written by Andrew Waite – www.infosanity.co.uk

Number of submissions: 84
Number of unique samples: 39
Number of unique source IPs: 65

First sample seen: 2010-06-08 08:25:39.569003
Last sample seen: 2010-06-21 15:24:37.105594
System Uptime: 13 days, 6:58:57.536591
Average daily submissions: 6

Most recent submissions:
2010-06-21 15:24:37.105594, 113.37.56.28, emulate://, 56b8047f0f50238b62fa386ef109174e
2010-06-21 15:18:08.347568, 195.205.5.71, tftp://195.205.5.71/ssms.exe, fd28c5e1c38caa35bf5e1987e6167f4c
2010-06-21 15:17:08.391267, 195.117.74.62, tftp://195.117.74.62/ssms.exe, bb39f29fad85db12d9cf7195da0e1bfe
2010-06-21 06:29:03.565988, 195.160.222.101, tftp://195.160.222.101/ssms.exe, fd28c5e1c38caa35bf5e1987e6167f4c
2010-06-20 23:34:15.967299, 195.242.145.40, http://208.53.183.164/trying.exe, 094e2eae3644691711771699f4947536

– Andrew Waite

Amun statistics

Amun has been running away quite happily in my lab since initial install. From a statistic perspective my wor has been made really easy as Miguel Cabrerizo has previously taken one of the InfoSanity statistic scripts written for Nepenthes and Dionaea and adapted it to parse Amun’s submission.log files.

Results generated from the script in my environment are below, if you’re wanting to get an overview of submissions from another Amun sensor the script has been uploaded alongside the other InfoSanity resources and is available here.

~$ cat /opt/amun/logs/submissions.log* | ./amun_submission_stats.py

Statistics engine written by Andrew Waite (www.infosanity.co.uk) modified by Miguel Cabrerizo (diatel.wordpress.com)

Number of submissions      : 25
Number of unique samples   : 25
Number of unique source IPs: 18

Origin of the malware:
Ukraine :     1
None :     7
Poland :     2
Romania :     1
United States :     8
Russian Federation :     2
Hungary :     1
Norway :     1
Bulgaria :     2

Vulnerabilities exploited:
MS08067 :    13
DCOM :    12

Most recent submissions:
2010-05-31, 11:37:22, 208.53.183.164, 63.exe, acf5c09d547417fe53c163ec09199cab, MS08067
2010-05-30, 19:23:09, 208.53.183.162, 63.exe, 89b578839f1c39f79d48e5f9e70b5e2f, MS08067
2010-05-28, 10:27:03, 208.53.183.162, 63.exe, f7c4f677218070ab52d422b3c018a4ba, MS08067
2010-05-27, 16:23:14, 195.34.117.180, ssms.exe, 1f8a826b2ae94daa78f6542ad4ef173b, DCOM
2010-05-24, 19:46:35, 208.53.183.163, 63.exe, 53979f1820886f089a75689ed15ecf6e, MS08067

A comment on a recent post asked for a comparison between different honeypots, while this is far from conclusive and only focuses on a single aspect of the technologies one of InfoSanity’s Nepenthes sensors ‘saw’ more attacks in the last 24hrs than my Amun installation did in the almost three weeks shown above. As both are running within the same, small, IP allocation I think I’m safe in assuming that one IP isn’t actually receiving a disproportionate level of interest from the badguys and bots that are out there.

– Andrew Waite

24hrs of HoneyD logs

After an initial setup and configuration of HoneyD I took a snapshot of the honeyd.log file after running for a 24hr period.

Running honeydsum against the log file generated some good overview information. There were over 12000 connections made to the emulated network, averaging one connection every 7 seconds. Despite the volume of connections, each source generally only initiated a handful of connections, likely looking for a single particular service before moving on.

Top 10 Source Hosts
Rank     Source IP       Connections
1    124.207.85.200       3066
2    203.113.137.181      984
3    121.23.82.216           65
4    79.114.107.90          65
5    61.156.31.20             57
6    62.215.178.163        48
7    193.6.48.210            39
8    24.161.18.4               37
9    190.58.213.249       30
10   195.8.36.144          30

The summaries from honeydsum also suggest that the rate of incoming connections is generally constant. The only real variation to this was between 17:00 and 18:00, but the spike coincides with the source IP 124.207.85.200 running an ordered port sweep against a single target IP address, starting at TCP1042 and running up to around TCP 1300. Not sure why anyone is scanning this particular port range (if anyone can provide any additional information to slake my curiosity I’d appreciate it) but this event explains the outliers in both the above and below summary tables, highlighting the dangers of working with a small data set.

Connections per Hour
Hour  Connections
00:00      329
01:00      325
02:00      281
03:00      366
04:00      360
05:00      322
06:00      300
07:00      299
08:00      258
09:00      369
10:00      317
11:00      324
12:00      423
13:00      367
14:00      351
15:00      479
16:00      486
17:00   3590
18:00      498
19:00      515
20:00      576
21:00      441
22:00      397
23:00      311

The below table summarises the targetted resources within the environment. It shouldn’t come as a surprise that the most popular targets were tcp ports 445 and 135, but this is the case even though the honeyd configuration does not have any services listening on those ports. From this I would suggest that if you are trying to gather data on a particular port or service that you employ a filter (firewall/ACL/etc.) to block the noise before it reaches honeyd to keep the log files relevant.

Top 10 Accessed Resources
Rank   Resource    Connections
1           445/tcp         7349
2           135/tcp         1086
3             8/icmp           123
4              22/tcp           102
5            1433/tcp          95
6           8080/tcp          73
7           4899/tcp          52
8           5900/tcp          39
9         10000/tcp         39
10           3/icmp            38

In addition to running honeydsum the data set was run through InfoSanity’s honeyd-geoip.py script, top 10 sources are listed below. The results are likely skewed as the largest ‘location’ for the results is ‘none’ according to the GeoIP Country Lite database being used. One feature of the result set is that the country linked to the public IP addresses used by the honeyd environment did not feature in the list, as infrastructure improves and botnets become more prevalent today’s malware no longer needs to target ‘closer’ IP addresses to remain efficient.

None:   692
United States:  196
Russian Federation:     123
Taiwan: 118
Brazil: 109
Germany:        99
Australia:      99
China:  90
Romania:        86
Italy:  82

– Andrew Waite

Categories: Honeypot, InfoSec, Malware

Book Review: Virtual Honeypots

It took longer than I had wanted, but I have just finished reading through Virtual Honeypots: From Botnet Tracking to Intrusion Detection. The book is written by Niels Provos, creator of HoneyD (among other things) and Thorsten Holz.

Given the authors I had high expectation when the delivery came through, thankfully it didn’t disappoint. Unsurprisingly the first chapter provides an overview of honypotting in general, covering high and low interaction systems over both physical and virtual systems, additionally the chapter introduces some core tools for your toolkit.

The next two chapters cover both high and low interaction honeypots respectively. I really liked the coverage of hi-int honeypots, it was this idea that drew me towards honeypots in the first place the idea of watching an attacker carefully exploit and utilise a dummy system always appealed. The material provided gives a great foundation for starting with a high interaction honeypot and some best practice advice for how to do so securely and safely. While I have read many reports and case studies that involved honeypots I have had difficulty finding in depth setup information and advice, leaving high interaction honeypots feeling a bit like black magic. The author’s information cuts through all the mystery allowing the reader to get a firm understanding of the topic. Likewise the discussion of low-interaction honeypots was equally well covered, although as I’ve spent some time with low-int systems in the past this chapter was more of a refresher than providing unknown information as I had found with the hi-int section.

Given that Neils is one of the books authors, it shouldn’t be too much of a surprise that HoneyD is covered in depth. For me, this was the most useful section of the book. As honeyd is one of the older publicly available low-int systems I had mistakenly assumed that one of the newer systems would provide more functionality, after reading through the material and regularly going ‘ooh’ out loud honeyd is now firmly at the top of my ‘need to implement’ list.

The book also covers honeypot systems that are designed for specialised purposes. For malware collection, the authors mainly focus on Nepenthes, but also touch on Honeytrap among others. This was the only section that I found to be slightly dated, as the Nepenthes’ newly released sprirtual successor Dionaea was not covered. But as the fundamental material is very well explained, Nepenthes is still a very functional system and the inherent similarities between Nepenthes and Dionaea the material still useful regardless so the chapter still provides an excellent foundation if you’re wanting to start collecting malware.

An interesting chapter covers the idea of hybrid honeypots, which is the idea of using low-int systems to monitor and handle the bulk of traffic, while forwarding anything unknown or unusual to a high-int system for more indepth analysis of the attack traffic. Unfortunately at this point openly available hybrid systems are limited, with the more functional systems being kept closed by the researchers and companies that build them (but I have just found Honeybrid while looking for a good link for hybrid systems which I wasn’t aware of. Looks promising…)

The last chapter covering honeypot systems looks at client-side honeypots, designed to look for client-side attacks. As client-side attacks have become more prominent over the last few years this is an evolving area of research, but as the attack vector is newer than traditional attacks, the honeypot systems aren’t as mature as more traditional systems. This isn’t an area that I’m experienced with so I can’t comment too much on the systems detailed by the authors, but they cover several honeyclient systems in great detail, and I’m intending to use the chapter as a foundation for implementing the systems and techniques proposed.

As well as detailing the use of honeypot systems, the authors also provide a brilliant discussion of ways that attackers (or users) can determine that they are interacting with a honeypot system. While the detailed descriptions for ways to identify a honeypot system is interesting and important from a theoretical standpoint, from previous experience running honeypot systems there are more than enough attackers and automated threats that blindly assume the system is legitimate to still enable honeypots to provide plenty of benefit to the honeypot administrators.

The book finishes up with an fairly detailed discussion of both tracking botnets using the information gathered from honeypot systems (this chapter is available as a sample PDF download from thanks to InformIT, here) and analysing the malware sample reports provided by CWSandbox. While both chapters are useful in he context of honeypot systems I didn’t think there was enough room to provide the reader with anything beyond a general overview of the topics, which if you were interested in the topic enough to purchase the book, then the reader will likely already have a similar level of understanding to the information provided.

There is also a chapter covering case studies of actual incidents that were captured by the books authors during their research. I’ve always been a fan of case studies, so enjoyed this chapter, it definitely helps whet the appetite to implement the technologies covered by the book.

Overall I really enjoyed the book, if you’re interested in systems and network monitoring, honeypots or malware then this book should probably be on your bookshelf.

Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Malware

Fuzzy hashing, memory carving and malware identification

I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.

Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.

I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.

One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.

Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.

After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.

To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.

With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.

From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.

Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.

–Andrew Waite

P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.

Follow

Get every new post delivered to your Inbox.