Archive

Archive for the ‘Tool-Kit’ Category

Amun statistics

Amun has been running away quite happily in my lab since initial install. From a statistic perspective my wor has been made really easy as Miguel Cabrerizo has previously taken one of the InfoSanity statistic scripts written for Nepenthes and Dionaea and adapted it to parse Amun’s submission.log files.

Results generated from the script in my environment are below, if you’re wanting to get an overview of submissions from another Amun sensor the script has been uploaded alongside the other InfoSanity resources and is available here.

~$ cat /opt/amun/logs/submissions.log* | ./amun_submission_stats.py

Statistics engine written by Andrew Waite (www.infosanity.co.uk) modified by Miguel Cabrerizo (diatel.wordpress.com)

Number of submissions      : 25
Number of unique samples   : 25
Number of unique source IPs: 18

Origin of the malware:
Ukraine :     1
None :     7
Poland :     2
Romania :     1
United States :     8
Russian Federation :     2
Hungary :     1
Norway :     1
Bulgaria :     2

Vulnerabilities exploited:
MS08067 :    13
DCOM :    12

Most recent submissions:
2010-05-31, 11:37:22, 208.53.183.164, 63.exe, acf5c09d547417fe53c163ec09199cab, MS08067
2010-05-30, 19:23:09, 208.53.183.162, 63.exe, 89b578839f1c39f79d48e5f9e70b5e2f, MS08067
2010-05-28, 10:27:03, 208.53.183.162, 63.exe, f7c4f677218070ab52d422b3c018a4ba, MS08067
2010-05-27, 16:23:14, 195.34.117.180, ssms.exe, 1f8a826b2ae94daa78f6542ad4ef173b, DCOM
2010-05-24, 19:46:35, 208.53.183.163, 63.exe, 53979f1820886f089a75689ed15ecf6e, MS08067

A comment on a recent post asked for a comparison between different honeypots, while this is far from conclusive and only focuses on a single aspect of the technologies one of InfoSanity’s Nepenthes sensors ‘saw’ more attacks in the last 24hrs than my Amun installation did in the almost three weeks shown above. As both are running within the same, small, IP allocation I think I’m safe in assuming that one IP isn’t actually receiving a disproportionate level of interest from the badguys and bots that are out there.

– Andrew Waite

Starting with Amun

No single technology can do or handle every situation; the same holds true with honeypot sensors which is why I’m always interested in finding new systems to add to my environment. I’d had Amun on my list of potentials for a while, but after reading a short blog post by Miguel Cabrerizo that suggested install and setup was relatively quick and painless, it got moved up the to-do list.

As suggested the install was quick and easy, with no real problems. Since being installed the system has done what it says on the tin, emulating vulnerabilities and logging interaction with attacking sources. The sensor has been active for around 5 days and has collected 14 unique malware samples to date. Whilst not immediately being indicative of any comparison, three of these samples have not also been ensnared by Nepenthes or Dionaea sensors running within the same IP space.

The Amun log directory shows some interesting information, with logging being split between several different files. From initial results there is some interesting information collected by the system. One aspect of the logging that I’m unsure if I like is that Amun rotates it’s log files on a daily basis, so far this is resulting in my log directory getting cluttered with rotated files. For the curious available log files are:

  • amun_request_handler.log
  • amun_server.log
  • download.log
  • exploits.log
  • logging.log
  • shellcode_manager.log
  • shellemulator.log
  • submissions.log
  • successfull_downloads.log
  • unknown_downloads.log
  • vulnerabilities.log

Going forward there are a number of installation and configurations options available from Amun that I intend to experiment with; high up this list is the ability to log to a MySQL database, I’m hoping that this will provide both a convenient and powerful way to search and analyse the information collected by the sensor. In the meantime Miguel has extended one of InfoSanity’s submission_stats to gather similar statistics from Amun sensors, Miguel’s work is available here.

– Andrew Waite

amun01:/opt/amun# ls -l malware/md5sum/
total 2512
-rw-r–r– 1 root root 155648 2010-05-13 10:53 0cc3c16497214997a9aca72e387c9d9b.bin
-rw-r–r– 1 root root 444416 2010-05-12 15:43 146d61fca77d748f5a5ecff53afd30e4.bin
-rw-r–r– 1 root root 158720 2010-05-11 07:43 14a09a48ad23fe0ea5a180bee8cb750a.bin
-rw-r–r– 1 root root 159744 2010-05-11 00:29 1d419d615dbe5a238bbaa569b3829a23.bin
-rw-r–r– 1 root root 153600 2010-05-15 13:41 53098aa3e420a1be0a5e6a992dc30f3b.bin
-rw-r–r– 1 root root 176128 2010-05-10 23:35 5a951d625eb10b900eb7001892edfa77.bin
-rw-r–r– 1 root root 159744 2010-05-13 19:16 6366b14ed66bf79d6ece8ed8cb116838.bin
-rw-r–r– 1 root root 153600 2010-05-12 13:36 98eb0fdadf8a403c013a8b1882ec986d.bin
-rw-r–r– 1 root root 172032 2010-05-13 06:22 9b1bec8e5fbc9696c60422a031147d07.bin
-rw-r–r– 1 root root 159744 2010-05-13 19:16 a7b197e90b2c5d63b19dfb4797ef7710.bin
-rw-r–r– 1 root root 147456 2010-05-14 07:04 b407982b9eea8c8af3ff4f52ee71c44a.bin
-rw-r–r– 1 root root 147456 2010-05-11 07:09 b786ad96a1dfb330e05595e4657d8a61.bin
-rw-r–r– 1 root root 160768 2010-05-12 14:46 bb39f29fad85db12d9cf7195da0e1bfe.bin
-rw-r–r– 1 root root 152576 2010-05-11 00:00 fd28c5e1c38caa35bf5e1987e6167f4c.bin

Categories: Honeypot, InfoSec, Tool-Kit

Determining connection source from honeyd.log – cymruwhois version

2010/05/03 1 comment

InfoSanity’s honeyd-geoip.py script has been useful for analysing the initial findings from a HoneyD installation, but one of weaknesses identified in the geolocation database used by the script was that a large proportion of the source IP addresses connecting to the honeypot environment weren’t none within the database. Markus pointed me in the direction of the cymruwhois (discussed previously)python module as an alternative. I’ve re-written the initial script, below:

#!/usr/bin/python
from cymruwhois import Client
import sys

logfile = open('/var/log/honeypot/honeyd.log', 'r')
source = []
for line in logfile:
    source.append(line.split(' ')[3])

src_country = []
src_count = []
c=Client()

results=c.lookupmany_dict(set(source))

for res in results:
    country = results[res].cc
    try:
        pos = src_country.index( country )
        src_count[pos] += 1
    except:
        src_country.append( country )
        src_count.append( 1 )

for i in range( 0, ( len( src_country ) - 1 ) ):
    sys.stdout.write( "%s:\t%i\n" %( src_country[i], src_count[i] ) )

So far this has resulted in far fewer unknown source locations, 249 using geoip compared to 3 using cymruwhois. The downside unfortunately is performance, the cymruwhois communicates with a remote host to gather information compared with the geolocation database that is already stored locally on the machine. Both perform some local caching of results/data however so I would expect the performane difference to decrease as larger datasets are analysed.

Using the newer script, based on the same 24hr data set, the top ten host countries communicating with InfoSanity’s honeyd environment are:

RU:     397
US:     234
TW:     179
BR:     158
CN:     123
RO:     107
DE:     101
IT:     96
JP:     91
AR:     86

– Andrew Waite

Team Cymru Whois

2010/05/03 1 comment

Since posting my Python whois class it’s lead to a (relatively) high volume of search hits pointing people to it. So I’d like to apologise for inflicting my code on other people. After a recent post with the honey-geoip.py script I was pointed in the direction Team Cymru’s whois service and accompanying python script. If you’ve not come across the stuff released by Team-Cymru I would strongly suggest that you take a look. I always manage to find some interesting new info, three overall sections Monitoring, Services and Reading Room.

Making my life easier, Justin Azoff has released a Python module hosted on github for the whois.cymru.com service. Using the client is incredible simple as the sample code included in the package shows:

>>> import socket
>>> ip = socket.gethostbyname("www.google.com")
>>> from cymruwhois import Client
>>> c=Client()
>>> r=c.lookup(ip)
>>> print r.asn
15169
>>> print r.owner
GOOGLE - Google Inc.

Overall Justin’s client works faster than my own attempt, especially has it has functions specifically designed for bulk lookups. If you’re working with IP, whois or geolocation data I’d suggest giving the cymruwhois utility a look. Thanks to Justin and the Team Cymru people for releasing tools and info that make my work easier.

–Andrew Waite

Categories: InfoSec, Python, Tool-Kit

Honeydsum: HoneyD log analyser

2010/04/20 1 comment

Honeydsum is a script created by Lucio Henrique Franco and Carlos Henrique Peixoto Caetano Chaves for the Brazilian Honeynet project. As described by it’s Authors, it is:

a tool written in Perl designed to generate a text summary from Honeyd logs. The summaries may be produced using different parameters as filters, such as ports, protocols, IP addresses or networks. It shows the top source and port access and the number of connections per hour, and supports input from multiple log files. The script can also correlate events from several honeypots.

Using the script from the commandline is straightforward; simple invoke with a config file and pass the honeyd log to be analysed. In addition to the usual textual output honeydsum is also capable of generating HTML results providing a quick and easy visual. The download site also includes some sample output files, both text and html (tgz archive).

$ /usr/share/honeyd/scripts/honeydsum-v0.3/honeydsum.pl

Usage: honeydsum.pl -c honeydsum.conf [-hVw] log-file1 log-file2 … log-filen
-c   honeydsum.conf file.
-h   display this help and exit.
-V   display version number and exit.
-w   display output as web page (HTML).

The bulk of the text based output provides a list of connections made from external sources to the systems emulated by the HoneyD instance. Using the provided sample output as an example provides the information below; on a live and publically accessible system this output will be significantly longer:

--------------------------------------
Honeypot: 10.0.0.70
--------------------------------------
Source IP        Resource  Connections
192.168.50.20        21/tcp       1
192.168.100.130      21/tcp       1
192.168.177.253     11/icmp       1
192.168.139.133     11/icmp       1
--------------------------------------
IPs             Resources  Connections
4                       2        4
--------------------------------------

The end of the output contains the information that I find most useful. It provides several different summaries of all the traffic captured by the whole HoneyD environment. Summaries include:

The most frequent remote sources:

Top 10 Source Hosts

Rank  Source IP       Connections
1      192.168.100.130        3
2      192.168.139.133        2
3      192.168.50.20           1
4      192.168.131.157        1
5      192.168.217.41         1
6      192.168.207.84         1
7      192.168.177.253        1

Most requested emulated services/resources:

Top 10 Accessed Resources

Rank Resource    Connections
1    21/tcp             4
2    11/icmp            4
3    53/udp             2

– Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Tool-Kit

Determining connection source from honeyd.log

After getting a working HoneyD environment I wanted to better dig into the information provided by the system. First up was a quick script to get a feel for where the attacks/connections originate from. For location functionality GeoIP is the package for the job, as we’re using both Debian and Python installing the required tools is as simple as ‘apt-get install python-geoip’.

At first glance I really like the log format that is used by honeyd.log, it is nice an easy to parse from. From this I quickly knocked up a python script to parse the honeyd.log file, collect a list of unique source addresses and finally use GeoIP to determine (and count) the county of origin. The script (below) is basic, and most likely full of bugs but shows the ease with which tools can be forged to quickly gain the full value from the information collected by the HoneyD environment.

Version 0.01 is below; ignoring any likely bugs that need fixing the one thing that it definitely needs is to order the output, although I’m undecided if this should be alphabetically by country or by hit-count. Source Code:

#!/usr/bin/python
import GeoIP
import sys

#log file location hard coded, change to suit environment
logfile = open(‘/var/log/honeypot/honeyd.log’, ‘r’)
source = []
for line in logfile:
source.append(line.split(‘ ‘)[3])

#http://www.maxmind.com/app/python
#http://code.google.com/p/pygeoip/
gi = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE)

src_country = []
src_count = []
for src in set(source):
country =  gi.country_name_by_addr( src )
try:
pos = src_country.index( country )
src_count[pos] += 1
except:
src_country.append( country )
src_count.append( 1 )

for i in range( 0, ( len( src_country ) – 1 ) ):
sys.stdout.write( “%s:\t%i\n” %( src_country[i], src_count[i] ) )

Sample output:

# ./honeyd-geoip.py
Uruguay: 12
None:   249
Australia:      17
Lithuania:      11
Austria: 4
Russian Federation:     43
Jordan: 3
Taiwan: 29
Hong Kong:      3
Brazil: 39
United States:  54
Hungary: 23
Latvia: 10
Morocco: 1
Macedonia:      3
Serbia: 4
Romania: 44
Argentina:      23
United Kingdom: 12
India:  10
Egypt:  2
Italy:  33
Switzerland:    2
Germany: 43
France: 13
Poland: 27
Canada: 12
China:  23
Malaysia:       8
Panama: 1
Colombia:       6
Japan:  14
Israel: 9
Bulgaria:       9
Turkey: 6
Vietnam: 2
Mexico: 1
Chile:  1
Pakistan:       1
Spain:  7
Portugal:       4
Moldova, Republic of:   1
Antigua and Barbuda:    1
Venezuela:      3
Singapore:      1
United Arab Emirates:   1
Philippines:    2
Croatia: 1
Korea, Republic of:     4
Ukraine: 5
Georgia: 2
Bahamas: 1
Ecuador: 1
South Africa:   1
Peru:   1
Kazakhstan:     2
Costa Rica:     1
Bolivia: 1
Iran, Islamic Republic of:      2
Greece: 1
Bahrain: 1

– Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Tool-Kit

New Projects Section

The core InfoSanity site has just (last 24hours) had the first of several planned refreshes go live. In this case it is a section of the site dedicated to the code and tools released as part of the research carried out by InfoSanity. No new content yet, but it has served as a nice reminder of some of the intended features still incomplete in existing projects, hopefully updates should be coming soon.

The start of the section can be found here, alternatively just navigate from the site’s menu. For those feeling lazy, a sneak peek:

– Andrew Waite

Categories: Tool-Kit

ReportSpammers.net

I was recently pointed towards www.reportspammers.net, which is a good resource for all things spam related and is steadily increased the quantity and quality of the information available. As much as I like the statistics that can be gathered from honeypot systems, live and real stats are even better and the data utilised by Report Spammers is taken from the email clusters run by Email Cloud.

One of the first resources released was the global map showing active spam sources (static image below), it is updated hourly and the fully interactive version can be found here.

Where are spammers global map

In addition to the global map, Report Spammers also lists the most recent spamvertised sites seen on it’s mail clusters. I’m undecided with the ‘name and shame’ methodoly due to the risk of false postives, but if your looking for examples of spamvertised sites it will prove a good resource (and one I intend to delve deeper into next time I’m bored). Just beware, sites that actively advertise via spam are rarely places that you want to point you home browser at, you have been warned.

If you are wanting a resource to explain spam and the business model behind it Report Spammers could be a good starting point. It even has the ability to explain spam to non-infosec types that still think spam comes in tins. Keep this in mind next time you need to run another information security awareness campaign.

– Andrew Waite

Categories: InfoSec, Tool-Kit

Starting with HoneyD

Since reading Virtual Honeypots I’ve been wanting to implement a HoneyD system, developed by Niels Provos. From it’s own site, HoneyD is:

a small daemon that creates virtual hosts on a network. The hosts can be configured to run arbitrary services, and their personality can be adapted so that they appear to be running certain operating systems. Honeyd enables a single host to claim multiple addresses – I have tested up to 65536 – on a LAN for network simulation. Honeyd improves cyber security by providing mechanisms for threat detection and assessment. It also deters adversaries by hiding real systems in the middle of virtual systems.

My initial experience getting HoneyD running was frustration to say the least. Going with Debian to provide a stable OS, the install process should have been as simple as apt-get install honeyd. While keeping upto date with a Debian system can sometimes be difficult, the honeyd package is as current as it gets with version 1.5c.

For reasons that I can’t explain, this didn’t work first (or second) time so I reverted to compiling from source. The process could have been worse, only real stumbling block I hit was a naming clash within Debian’s package names. HoneyD requires the ‘dumb network’ package libdnet, but if you apt-get install libdnet you get Debian’s DECnet libraries. On Debian and deriviates you need libdumbnet1.

HoneyD’s configuration has the ability to get very complex depending on what you are looking to achieve. Thankfully a sample configuration is provided that includes examples of some of the most common configuration directives. Once you’ve got a config sorted (the sample works perfectly for testing), starting the honeyd is simple: honeyd -f /path/to/config-file. There are plenty of other runtime options available, but I haven’t had time to fully experiment with all of them; check the honeyd man pages for more information.

As well as emulating hosts and network topologies, HoneyD can be configured to run what it terms ‘subsystems’. Basically this are scripts that can be used to provide additional functionality on the emulated systems for an attacker/user to interact with. Some basic (and not so basic) subsystems are included with HoneyD. Some additional service emulation scripts that have been contributed to the HoneyD project can be found here. As part of the configuration, HoneyD can also pass specified IP/Ports through to live systems, either more indepth/specialised honeypot system or a full ‘real’ system to combine low and high interaction honeypot.

I’m still bearly scratching the surface of what HoneyD is capable of, and haven’t yet transfered my system to a live network to generate any statistics, but from my reading, research and experimentation I have high expectations.

– Andrew Waite

Categories: honeyd, Honeypot, InfoSec, Lab, Tool-Kit

Fuzzy hashing, memory carving and malware identification

I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.

Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.

I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.

One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.

Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.

After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.

To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.

With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.

From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.

Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.

–Andrew Waite

P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.

Follow

Get every new post delivered to your Inbox.