Archive for the ‘Tool-Kit’ Category

Honeydsum: HoneyD log analyser

2010/04/20 1 comment

Honeydsum is a script created by Lucio Henrique Franco and Carlos Henrique Peixoto Caetano Chaves for the Brazilian Honeynet project. As described by it’s Authors, it is:

a tool written in Perl designed to generate a text summary from Honeyd logs. The summaries may be produced using different parameters as filters, such as ports, protocols, IP addresses or networks. It shows the top source and port access and the number of connections per hour, and supports input from multiple log files. The script can also correlate events from several honeypots.

Using the script from the commandline is straightforward; simple invoke with a config file and pass the honeyd log to be analysed. In addition to the usual textual output honeydsum is also capable of generating HTML results providing a quick and easy visual. The download site also includes some sample output files, both text and html (tgz archive).

$ /usr/share/honeyd/scripts/honeydsum-v0.3/

Usage: -c honeydsum.conf [-hVw] log-file1 log-file2 … log-filen
-c   honeydsum.conf file.
-h   display this help and exit.
-V   display version number and exit.
-w   display output as web page (HTML).

The bulk of the text based output provides a list of connections made from external sources to the systems emulated by the HoneyD instance. Using the provided sample output as an example provides the information below; on a live and publically accessible system this output will be significantly longer:

Source IP        Resource  Connections        21/tcp       1      21/tcp       1     11/icmp       1     11/icmp       1
IPs             Resources  Connections
4                       2        4

The end of the output contains the information that I find most useful. It provides several different summaries of all the traffic captured by the whole HoneyD environment. Summaries include:

The most frequent remote sources:

Top 10 Source Hosts

Rank  Source IP       Connections
1        3
2        2
3           1
4        1
5         1
6         1
7        1

Most requested emulated services/resources:

Top 10 Accessed Resources

Rank Resource    Connections
1    21/tcp             4
2    11/icmp            4
3    53/udp             2

— Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Tool-Kit

Determining connection source from honeyd.log

After getting a working HoneyD environment I wanted to better dig into the information provided by the system. First up was a quick script to get a feel for where the attacks/connections originate from. For location functionality GeoIP is the package for the job, as we’re using both Debian and Python installing the required tools is as simple as ‘apt-get install python-geoip’.

At first glance I really like the log format that is used by honeyd.log, it is nice an easy to parse from. From this I quickly knocked up a python script to parse the honeyd.log file, collect a list of unique source addresses and finally use GeoIP to determine (and count) the county of origin. The script (below) is basic, and most likely full of bugs but shows the ease with which tools can be forged to quickly gain the full value from the information collected by the HoneyD environment.

Version 0.01 is below; ignoring any likely bugs that need fixing the one thing that it definitely needs is to order the output, although I’m undecided if this should be alphabetically by country or by hit-count. Source Code:

import GeoIP
import sys

#log file location hard coded, change to suit environment
logfile = open(‘/var/log/honeypot/honeyd.log’, ‘r’)
source = []
for line in logfile:
source.append(line.split(‘ ‘)[3])

gi =

src_country = []
src_count = []
for src in set(source):
country =  gi.country_name_by_addr( src )
pos = src_country.index( country )
src_count[pos] += 1
src_country.append( country )
src_count.append( 1 )

for i in range( 0, ( len( src_country ) – 1 ) ):
sys.stdout.write( “%s:\t%i\n” %( src_country[i], src_count[i] ) )

Sample output:

# ./
Uruguay: 12
None:   249
Australia:      17
Lithuania:      11
Austria: 4
Russian Federation:     43
Jordan: 3
Taiwan: 29
Hong Kong:      3
Brazil: 39
United States:  54
Hungary: 23
Latvia: 10
Morocco: 1
Macedonia:      3
Serbia: 4
Romania: 44
Argentina:      23
United Kingdom: 12
India:  10
Egypt:  2
Italy:  33
Switzerland:    2
Germany: 43
France: 13
Poland: 27
Canada: 12
China:  23
Malaysia:       8
Panama: 1
Colombia:       6
Japan:  14
Israel: 9
Bulgaria:       9
Turkey: 6
Vietnam: 2
Mexico: 1
Chile:  1
Pakistan:       1
Spain:  7
Portugal:       4
Moldova, Republic of:   1
Antigua and Barbuda:    1
Venezuela:      3
Singapore:      1
United Arab Emirates:   1
Philippines:    2
Croatia: 1
Korea, Republic of:     4
Ukraine: 5
Georgia: 2
Bahamas: 1
Ecuador: 1
South Africa:   1
Peru:   1
Kazakhstan:     2
Costa Rica:     1
Bolivia: 1
Iran, Islamic Republic of:      2
Greece: 1
Bahrain: 1

— Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Tool-Kit

New Projects Section

2010/02/17 Comments off

The core InfoSanity site has just (last 24hours) had the first of several planned refreshes go live. In this case it is a section of the site dedicated to the code and tools released as part of the research carried out by InfoSanity. No new content yet, but it has served as a nice reminder of some of the intended features still incomplete in existing projects, hopefully updates should be coming soon.

The start of the section can be found here, alternatively just navigate from the site’s menu. For those feeling lazy, a sneak peek:

— Andrew Waite

Categories: Tool-Kit

I was recently pointed towards, which is a good resource for all things spam related and is steadily increased the quantity and quality of the information available. As much as I like the statistics that can be gathered from honeypot systems, live and real stats are even better and the data utilised by Report Spammers is taken from the email clusters run by Email Cloud.

One of the first resources released was the global map showing active spam sources (static image below), it is updated hourly and the fully interactive version can be found here.

Where are spammers global map

In addition to the global map, Report Spammers also lists the most recent spamvertised sites seen on it’s mail clusters. I’m undecided with the ‘name and shame’ methodoly due to the risk of false postives, but if your looking for examples of spamvertised sites it will prove a good resource (and one I intend to delve deeper into next time I’m bored). Just beware, sites that actively advertise via spam are rarely places that you want to point you home browser at, you have been warned.

If you are wanting a resource to explain spam and the business model behind it Report Spammers could be a good starting point. It even has the ability to explain spam to non-infosec types that still think spam comes in tins. Keep this in mind next time you need to run another information security awareness campaign.

— Andrew Waite

Categories: InfoSec, Tool-Kit

Starting with HoneyD

Since reading Virtual Honeypots I’ve been wanting to implement a HoneyD system, developed by Niels Provos. From it’s own site, HoneyD is:

a small daemon that creates virtual hosts on a network. The hosts can be configured to run arbitrary services, and their personality can be adapted so that they appear to be running certain operating systems. Honeyd enables a single host to claim multiple addresses – I have tested up to 65536 – on a LAN for network simulation. Honeyd improves cyber security by providing mechanisms for threat detection and assessment. It also deters adversaries by hiding real systems in the middle of virtual systems.

My initial experience getting HoneyD running was frustration to say the least. Going with Debian to provide a stable OS, the install process should have been as simple as apt-get install honeyd. While keeping upto date with a Debian system can sometimes be difficult, the honeyd package is as current as it gets with version 1.5c.

For reasons that I can’t explain, this didn’t work first (or second) time so I reverted to compiling from source. The process could have been worse, only real stumbling block I hit was a naming clash within Debian’s package names. HoneyD requires the ‘dumb network’ package libdnet, but if you apt-get install libdnet you get Debian’s DECnet libraries. On Debian and deriviates you need libdumbnet1.

HoneyD’s configuration has the ability to get very complex depending on what you are looking to achieve. Thankfully a sample configuration is provided that includes examples of some of the most common configuration directives. Once you’ve got a config sorted (the sample works perfectly for testing), starting the honeyd is simple: honeyd -f /path/to/config-file. There are plenty of other runtime options available, but I haven’t had time to fully experiment with all of them; check the honeyd man pages for more information.

As well as emulating hosts and network topologies, HoneyD can be configured to run what it terms ‘subsystems’. Basically this are scripts that can be used to provide additional functionality on the emulated systems for an attacker/user to interact with. Some basic (and not so basic) subsystems are included with HoneyD. Some additional service emulation scripts that have been contributed to the HoneyD project can be found here. As part of the configuration, HoneyD can also pass specified IP/Ports through to live systems, either more indepth/specialised honeypot system or a full ‘real’ system to combine low and high interaction honeypot.

I’m still bearly scratching the surface of what HoneyD is capable of, and haven’t yet transfered my system to a live network to generate any statistics, but from my reading, research and experimentation I have high expectations.

— Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Lab, Tool-Kit

Fuzzy hashing, memory carving and malware identification

2009/12/15 Comments off

I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.

Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.

I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.

One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.

Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.

After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.

To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.

With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.

From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.

Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.

–Andrew Waite

P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.

Rise of database

2009/11/17 Comments off

The team from Offensive Security have just announced the opening of (re-directs to, just more memorable). The site is designed as a successor to milw0rm. If you’ve ever browsed the milw0rm site the layout will be instantly familiar.

I think this is great news for the infosec community, not only does the OffSec team always produce high quality output, but it helps provide some stability in the wake of milw0rms recent uncertainty.

At this point the site’s content volume is growing rapidly, when I looked this morning the archives exploits numbered around 9000, already it has reach 10000+, and a refresh of the front page has this number increase a good percentage of the time.

One feature of the site that I do like is a link (where available) to the vulnerable version of the application or code. I believe this will make testing much easier as it removes the need to trawl the web for an often unsupported and unavailable old version of an application. I really hope that this feature will become popular and all/most of the published exploits will link to a download location for retrieving the vulnerable code where possible.

Happy exploiting (in your lab, obviously)

Andrew Waite

Categories: Exploit, InfoSec, Tool-Kit