I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.
Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.
I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.
One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.
Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.
After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.
To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.
With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.
From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.
Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.
P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.
I’ve recently had the pleasure of talking with Leon van der Eijk which resulted in me getting the opportunity to review an article he had been working on. The focus of the article is to identify and collect malware samples from running processes within volatile memory. Given my predilection for malware collection and analysis Leon correctly guessed that I would enjoy the article, which does a great job of describing a method for collecting and analysing malware (and other files and processes) from RAM on a live Windows system
Leon’s method utilises Meterpreter’s memdump.rb script to collect the a snapshot of an infected system’s memory, then utilises Foremost to carve up the collected memory image into individual files which can then be analysed as normal. As the article has just been published today I won’t try to improve on the work already, but I would suggest giving it a read here.
My own forensics skills aren’t yet up to the level that I would like, but I was able to replicate Leon’s process relatively easily within my own lab environment, and without too many problems. This, along with my experience at Northumbria University last week (more later), has re-ignited my interest in improving my forensic skills, and has proved to me that some of the basic skills and techniques involved with the forensic process isn’t all black magic.
The article is definitely worth a read if you have an interest in either computer forensics and/or malware analysis. In case you missed it above, link to article: Carving malware from live memory. Keep up the good work Leon.