BOTNET DETECTION BASED ON DNS RECORDS
AND ACTIVE PROBING
Iria Prieto, Eduardo Maga˜na, Daniel Morat´o and Mikel Izal
Public University of Navarre, Campus Arrosadia, 31006 Pamplona, Spain
Keywords:
Botnet, Command and control, Domain name, DNS record, WHOIS.
Abstract:
Computers connected to Internet are constantly threatened by different types of malware. One of the most
important malware are botnets that convert infected computers into agents that follow actions instructed by a
command-and-control server. A botmaster can control thousands of agents. This means a significant capacity
to accomplish any kind of network attack (DoS), email spam or phishing. In this paper, communication
peculiarities with the command-and-control server are used to provide an identification of computers infected
by a botnet. This identification is based mainly in DNS records of registered domains where command-and-
control servers are hosted. Therefore, processing overhead is reduced avoiding per packet or per flow network
supervision.
1 INTRODUCTION
Computers, and specially computers connected to the
Internet, are becoming an essential tool in working
and entertainment environments. It is usual to send
confidential information through e-mail, make an on-
line bank transaction, online shopping, etc. Unfortu-
nately, the popularity of Internet has been accompa-
nied by the growth of network attacks which try to
obtain benefit from this information. Some attacks
proceed from personal computers that can be infected
with unwanted software known as malware.
Malware covers a large range of software like
viruses, worms, trojans, spyware, loggers and botnets.
Recently, the type of malware growing at a fastest rate
is botnets (Zhaosheng et al., 2008). A botnet is char-
acterized by having a set of compromised computers
called bots. These bots are controlled remotely by a
command-and-control (C&C) server managed by the
botmaster. They use a special protocol that is known
as C&C channel. Through this channel, the botmaster
can send instructions to bots to perform new attacks,
infect other machines or update botnet software. This
channel can use well-known protocols such as IRC,
HTTP or P2P protocols in order to hide itself from
any try of identification (John et al., 2009)(Zhaosheng
et al., 2008).
Currently there is a large collection of active bot-
nets in the Internet. Some of them are Rustock (Chi-
ang and Lloyd, 2007)(John et al., 2009), Zeus (Bin-
salleeh et al., 2010), Conficker (Porras et al., 2009),
Kraken (Jae-Seo et al., 2008)(Stone-Gross et al.,
2009), etc. Botnets can propagate attacks through
networks quickly and, furthermore, those attacks can
have high impact because of the high number of con-
trolled agents. An example of this impact is shown
in (Zeljka, 2009), that describes how in year 2009
89.5 billion unsolicited emails were sent every day by
compromised computers participating in a botnet.
Early detection of botnets is very important as it
can provide a certain grade of trust in network ser-
vices. Even Internet Service Providers are interested
in its identification because of the great percentage
of unwanted traffic generated. Antivirus and antispy-
ware programs try to identify botnet software in in-
fected computers with traditional schemes based on
code signatures. However, botnet software mutates
quickly and therefore those schemes are not useful.
Similar identification schemes can be performed by
firewalls or intrusion detection systems, this time ap-
plying signature-based schemes over network traffic
in the C&C channel. Again, these protocols change
continuously or even they are encrypted so identifi-
cation results are not good enough. Besides, over-
head processing is significant in high-speed networks
as signature checks have to be performed per packet.
Usually C&C servers are identified by one or sev-
eral domain names that have to be known a priori by
bots. This will allow bots to contact the C&C server
and check for their availability. Therefore, previously
307
Prieto I., Magaña E., Morató D. and Izal M..
BOTNET DETECTION BASED ON DNS RECORDS AND ACTIVE PROBING.
DOI: 10.5220/0003522903070316
In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2011), pages 307-316
ISBN: 978-989-8425-71-3
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
to any C&C communication, bots have to resolve IP
addresses of already known domain names for C&C
servers. These domain names have specific charac-
teristics that can be used to identify suspect domain
names to be part of a C&C server and therefore it can
identify computers participating in a botnet.
In this paper, a new method to identify comput-
ers infected by botnets is proposed. This method will
combine in-depth analysis of DNS records with ex-
tra information obtained from active probing in order
to obtain an indicator of suspect for domain names.
Detection capabilities will be demonstrated in a real
scenario.
The rest of the paper is organized as follows. Sec-
tion 2 presents the state of the art in botnet detection
techniques using C&C network traffic. In section 3,
the network scenario and the traffic traces used are
presented. Section 4 introduces selected metrics to
use in the identification, based on experimental anal-
ysis. In section 5, architecture of botnet identification
system is presented. Evaluationand results of the pro-
posal are presented in section 6. Finally, conclusions
and future work are presented.
2 STATE OF THE ART IN
BOTNET DETECTION BASED
ON NETWORK TRAFFIC
Detection of botnets at an early stage of infection is
a challenging task. The great majority of techniques
in the state of the art take into account the main char-
acteristic of botnets: the C&C channel which allows
owners to update and control bots. Botnet detec-
tion techniques are usually oriented to discover those
channels.
Botnet detection techniques can be classified
based on how and which data is processed into
signature-based, anomaly-based and DNS-based
(Feily et al., 2009).
Signature-based detection methods look for cer-
tain patterns in network traffic like recognizable pro-
tocol headers, payloads, packet sizes or interarrival
times. The C&C protocol of well-known botnets can
be characterized and this characterization can be used
to identify their traffic. This characterization can be
implemented as rules in an intrusion detection sys-
tem (IDS). One specific characteristic of botnets is
the existence of concrete C&C servers whose IP ad-
dresses can be known apriori through this character-
ization. Therefore, any computer exchanging traffic
with those IP addresses can be identified as a bot.
In (Goebel and Holz, 2007), detection of IRC bots
is made applying data mining techniques over nick-
names, IRC servers and used ports.
Modifications of botnet software, protocol or
C&C servers are usual. For example, the mutation
speed of STORM botnet is estimated as once every 30
minutes (Grizzard et al., 2007). Therefore, signature-
based detection methods have practical limitations.
Another approach to botnet detection consists on
characterizing normal traffic and, later, identifying
deviations with the presence of botnet infection. This
approach is called anomaly-based detection. Botnets
can use application protocols implemented over stan-
dard IRC or HTTP protocols, so it is not easy to iden-
tify C&C communication from a normal chat or web
traffic. In (Binkley and Singh, 2006), bots connected
to an IRC channel are identified by their specific ac-
tivity: IRC messages used, communication profile,
number of sent/received packets, number of shared
channels, etc. In (Gu et al., 2008b), extensions are
made to support IRC and HTTP-based protocols, this
time using correlation of communications from mul-
tiple bots and level of network activity. Part of the
identification is also based on signatures as packet
payloads are analyzed.
Data mining techniques are used in Botminer (Gu
et al., 2008a). It uses K-means algorithm to clus-
ter data with metrics corresponding to normal traf-
fic and metrics corresponding to malicious activity.
Anomaly-based detection schemes have to be tuned
up for specific scenarios and this is one of their main
disadvantages. Also, the rate of false positives can be
quite high depending on the percentage of total iden-
tification that we wanted to achieve. Some apriori
knowledge about C&C protocols is also needed, but
in less extension than with signature-based methods.
The last approaches for botnet detection methods
in the state of the art are DNS-based methods. Botnets
are controlled by one or more C&C servers whose IP
addresses have to be known by bots. Usually, domain
names are known instead of IP addresses. Those do-
main names can be hardcoded in the bot code or can
be updated online via some configuration file. There-
fore, before contacting C&C servers, a DNS resolu-
tion request has to be made to map the known domain
name into the corresponding IP address. In (Dagon,
2005), bots are detected with the hypothesis that bot-
net domain name requests are concentrated in a time
window for several infected computers. Therefore,
correlations in DNS requests are supervised. Also,
the time-to-live (TTL) of domain names is consid-
ered. The TTL indicates the number of seconds for
which the mapping of domain name to IP address is
valid. Botnets usually use TTL values around few
seconds in order to be able to change the mapping
SECRYPT 2011 - International Conference on Security and Cryptography
308
dynamically and avoid C&C servers to be discovered.
However, short TTL values also appear in other do-
main names as those registered by Content Delivery
Networks (CDN) so the results are not conclusive.
Whitelists can be used, but again a priori extensive
knowledge is required.
In order not to be easily identified, botnets usually
register multitude of domain names for the same C&C
server. Bots usually check for these domain names
and most of those mappings are not existent in each
moment. Even unreliable and temporal DNS servers
are used. In those cases, a “NXDOMAIN (Non-
Existent Domain)” answer is obtained for the domain
name request. In (Villamarn-Salomonand Brustoloni,
2008), the rate of NXDOMAIN responses is used as
an indicator of botnet presence. However, last ver-
sions of botnets do not show this behavior (Feily et al.,
2009). In (Feily et al., 2009), time proximity in DNS
requests between different bots is used to identify bot-
nets. The traffic of a high number of computers has to
be analyzed in order to be able to find correlations in
DNS requests. This could be the case of an Internet
trunk of an Internet Service Provider. However, the
detection needs to propagate between several hosts
before being detectable.
Domain names used in C&C servers sometimes
follow a pattern because they are generated algorith-
mically. In (Yadav et al., 2010), a methodology to
detect botnet domain names is presented looking for
those patterns in domain names that are different to
those generated by humans.
3 NETWORK SCENARIO
For the proposal and evaluation, real traffic traces
have been obtained from Public University of Navarre
(UPNA, Spain). Its Internet access link has been mon-
itored specifically for DNS request/response packets.
The main significant results have been obtained from
a traffic trace dated on September 15-17th, 2010. In
this trace, 4,807,719 DNS requests have been per-
formed that correspond to 452,601 different domain
names. DNS responses are 3,962,032, correspond-
ing to 405,338 different domains and 67,671 domain
names have returned NXDOMAIN at least once.
Also a testbed with Zeus, Conficker and Kraken
botnets has been deployed. This testbed has been se-
cured with a honeywall (Jones and Romney, 2004)
and it has allowed to obtain direct information about
domain names requested by infected hosts. A black-
list of domain names corresponding to C&C channels
have been discovered this way. It contains 100,108
domains.
4 CHARACTERIZATION OF DNS
METRICS TO BE USED IN
BOTNETS IDENTIFICATION
An in-depth study of relation between botnets and do-
main names has been made in order to improve cur-
rent proposals in the state of the art. Current met-
rics in the state of the art have been evaluated: DNS
TTL, DNS NXDOMAIN and DNS pattern. New met-
rics with significant importance have been identified
and evaluated: DNS record age, DNS e-mail record,
authoritative DNS server, DNX MX record and web
presence.
4.1 DNS TTL
These works (Dagon, 2005) (Holz et al., 2008)
(Perdisci et al., 2009) have stated the relation between
short TTLs in DNS names definition and the presence
of botnets behind those names. However, in our re-
vision, TTLs with zero value have been discovered
in multitude of domains, most of them because of
misconfiguration of authoritative DNS servers. Also
short TTLs have been found in successful services
such as Google (TTL=46, 68, 300, etc.), YouTube
(TTL=66, 70, 89, etc.) or Facebook (TTL=1, 6, 7,
8, etc.). Those short TTL values are chosen in order
to use DNS as load balancer (as for example in Con-
tent Delivery Networks) and allowing to adapt users
better to performance and availability of end-servers.
In figure 1, the cumulative distribution functions
of TTL values for normal and botnet domain names
are plotted. As explained before, a large percentage
of normal domains uses short TTLs. For example,
60% of normal domains use TTL values equal or less
to 500. Therefore, using DNS TTLs in botnet identi-
fication is not significant nowadays.
0
0.2
0.4
0.6
0.8
1
0 15000 30000 45000 60000 75000 90000
P(x<X)
TTL
Normal domains
Botnet domains
Figure 1: Cumulative distribution of TTL for normal and
botnet domains.
BOTNET DETECTION BASED ON DNS RECORDS AND ACTIVE PROBING
309
4.2 NXDOMAIN
The NXDOMAIN technique proposed in (Villamarn-
Salomon and Brustoloni, 2008) is based on rating
the number of non-existent answers in domain name
requests. This technique has been evaluated in our
traces and only 172 domains answered NXDOMAIN
responses with at least one correct answer. This cor-
rect answer is necessary to validate the existence of
the domain. All those 172 domains correspond to nor-
mal domains so the NXDOMAIN rate is not signifi-
cant in botnet evaluation.
The number of domains that always answered
with NXDOMAIN responses were 67,427. Only 311
of them correspond to botnet domains and it is not
possible to identify botnet and normal domains with
this ratio. Therefore, results are not as good as ex-
pected.
4.3 Domain Name Pattern
Metrics based on domain name patterns (Yadav et al.,
2010) have been discarded because they need a pre-
vious analysis of how DNS names registered for bot-
nets are generated. Our goal is to provide a generic
scheme of botnet identification without previous indi-
vidual characterization because those characteristics
can change easily in differentversions of the software.
4.4 DNS Record Age
This metric is related with the creation date of the
domain name under study. When a domain name is
registered, information about domain owner, creation
date and other characteristics are stored. This infor-
mation is accessible via WHOIS service. Therefore, a
WHOIS request is enough to obtain the creation date
of certain domain. It has been observed that botnet
domains are usually very young with one year or less
age. It is reasonable because sooner or later those do-
mains are blacklisted, old domains are not reused and
new ones have to be registered continuously to allow
normal botnet operation. Therefore, DNS record age
can be used to identify domains suspect of being as-
signed to botnets. In (Passerini et al., 2008), record
age is used to characterize suspicious domain names
extracted from emails. In our proposal, domain names
are extracted directly from all DNS requests in the
network.
For weighting this metric, domains with less than
one year of age have more importance. Our proposal
is reflected in equation 1. This ratio will be bounded
between 0 and 1. CaptureDay indicates the date when
DNS request was intercepted, and CreationDate is the
date when domain name register was created. Both
numbers in Unix Epoch format can be subtracted to
get the difference in days.
DNSrecord age =
1
1+
(CaptureDateCreationDate)1
365
(1)
Figure 2 shows the cumulative distribution func-
tion of the number of days since a domain name was
registered. Botnet domains are concentrated in the
first hundreds of days. Normal domain names are
distributed linearly the first 5-6 years as new domain
names are created continuously. Following, some
older domain names are unregistered and linearity is
lost. DNS record age is, therefore, a good indicator
of how a domain name is suspected to be a botnet.
However, it is not enough by itself because every day
normal domain names are created and most of them
do not correspond to botnets.
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
P(x<X)
Number of days since creation date
Normal domains
Botnet domains
Figure 2: Cumulative distribution function of the number of
days since a domain name was registered.
4.5 DNS e-Mail Record
Also from WHOIS information, details about domain
owner can be obtained like names, addresses, tele-
phones or emails of administrative and technical con-
tacts. Some general behavior has been observed on
the patterns used in hosting email for botnets do-
mains. Those emails are mainly free-hosting based,
like hotmail, yahoo, gmail, live, etc. Therefore, the
presence of this type of free email hosting accounts
can be used to identify botnets domains. This metric
will be 1 for domains registered with those types of
emails and 0 otherwise. Those emails can be present
also in normal domains, so results are not conclusive.
For our data, figure 3 presents the percentage of
free-hosting based emails for normal and botnet do-
mains. For some of those free-hosting, the differences
are significant and therefore usable in botnet identifi-
cation. The spenglers service corresponds to spen-
SECRYPT 2011 - International Conference on Security and Cryptography
310
0.01
0.1
1
10
100
Hotmail Yahoo Gmail Live Spenglers
Percentage of number of domains
Botnet domains
Normal domains
Figure 3: Percentage of free-hosting based emails for nor-
mal and botnet domains.
glers.biz, a specific domain name registered to gener-
ate emails. In this case, 197 conficker domain names
were registered using splengers emails.
4.6 Authoritative DNS server
A authoritative DNS server is the last DNS server re-
sponsible of resolving a domain name and its sub-
domains. Those servers can be any host running a
DNS server software. However, botnets tend to use a
concrete set of authoritative DNS servers. Therefore,
a blacklist of suspected authoritative DNS servers,
where previouslyregistered botnet domains havebeen
detected, can be created. Those authoritative DNS
servers can also host normal domain names so the
metric is not conclusive. This metric will be 1 for
those in blacklist and 0 otherwise.
Analyzing the traffic trace under study, 477 botnet
domain names and 3,108 normal domain names are
hosted in DNS serves in blacklist. This means that
48.4% of botnet domains (477 of 985) and less than
1% of normal domains are hosted in those specific
DNS servers. Therefore, this metric is also significant
in botnet identification.
4.7 DNX MX Record
Domain names can register A address records or
CNAME canonical name records, but also MX mail
exchange records which map a domain name to a list
of message transfer agents (email servers) for that do-
main. Botnet domain names usually do not regis-
ter MX records because they are not used, but nor-
mal domain names (mainly for web services) usually
have associated MX records. Therefore, the absence
of MX records is another hint to find botnet domain
names. Again, for this metric, the absence of MX
records will be scored as 1 and 0 otherwise.
In figure 4 results are summarized. 95.96% of
0
10
20
30
40
50
60
70
80
90
100
Without MX With MX
Percentage of domains
Botnet domains
Normal domains
Figure 4: Percentage of domains with MX registers for nor-
mal and botnet domains.
botnet domain names and 38.71% of normal domain
names do not have MX record.
4.8 Web Presence
As stated before, the great majority of domain names
are registered to be used in web hosting services, and
at the same time, email service is provided or subdo-
mains are defined for different tasks. As those web
hosting services are accessible through standard 80
port, the presence of this web service can be checked
actively. Therefore, the absence of response to a stan-
dard HTTP request directed to that domain will be an-
other hint to locate domains suspect to be botnets. In
figure 5 web presence results are presented for normal
and botnet domains. Around 65% of botnet domains
does not have web presence.
0
10
20
30
40
50
60
70
Don’t answer Answer
Percentage of number of domains
Web Presence
Botnet domains
Normal domains
Figure 5: Web presence for normal and botnet domains.
5 BOTNET DETECTION SYSTEM
(BDS): STRUCTURE AND
OPERATIONS
A system called BDS (Botnet Detection System)
has been implemented considering previous metrics.
BOTNET DETECTION BASED ON DNS RECORDS AND ACTIVE PROBING
311
Those metrics have advantages related with the low
processing power needed as only DNS requests have
to be analyzed. This DNS requests can be less than
1% of total traffic in a network, and therefore adapt-
able for high-speed networks. Moreover, botnet de-
tection based on DNS requests allows to detect bot-
nets in an early infection stage and to perform active
countermeasures like instructing a firewall to block
suspicious traffic.
BDS has been programmed in C language using
WHOIS client tool (Net-Whois, 2010), wget (Wget-
tool, 2009), DNS lookup utility (DiG, 2009), and perl
script (DNSDUMP, 2010). However, to make the
evaluation easier, the input is fed with traffic traces
captured previously.
Figure 6 shows modules used in BDS implementa-
tion: DNS filter, DNS processor, database, extra data
collector and evaluator. The first two modules work
in real time extracting DNS requests from network
traffic and storing this information in the database.
The last module works asynchronously, extending
the information about new domains available in the
database.
Figure 6: Botnet detection system (BDS) block diagram.
First, the input traffic trace with all packets cap-
tured in an Internet connection is filtered looking
for DNS traffic. Basic filtering based on UDP/TCP
port 53 is applied. This is the “DNS Filter” mod-
ule. Second, DNS traffic is sourced to the “DNS
Processor” module where which domain name is re-
quested each time and what response information is
answered is identified. This module parses DNS re-
quests/responses obtaining information per seen do-
main name. In request packets, extracted information
is full domain name, source IP address (host request-
ing DNS resolution), destination IP address (DNS
server answering the request) and timestamp. In re-
sponse packets, extracted information is resolved IP
addresses and domain name aliases. All this informa-
tion is stored in the “Database” module. Frequently
used domain names can be requested several times,
so all these requests will be stored in the database
with different timestamps and specific values if they
change.
Asynchronously, “Extra data collector” monitors
the database detecting when a new domain name is
added in order to obtain extra information about the
domain needed in the proposed metrics. This extra
information proceeds from the following queries:
WHOIS query: each time a new domain is de-
tected, a WHOIS query is performed in order
to get extra information about the domain name
record. This extra information is DNS record age,
DNS e-mail record and authoritative DNS server
that are stored in database.
DNS MX request: a specific DNS request ask-
ing for MX servers is performed. This will allow
to get the DNX MX record that is also stored in
database
Web checking at port 80: web presence metric is
obtained by checking the presence of a web server
listening at port 80 of IP address resolved in the
original DNS request. This metric is stored in
database.
Blacklists update: when new botnet domain
names are detected, their authoritative DNS
servers are marked in a blacklist to be considered
suspicious.
This extra information is costly to obtain (for ex-
ample, several seconds are usually needed to perform
a WHOIS query) but this process only happens when
a new domain is detected or after a timer in the range
of days to detect changes in these metrics over time.
All information is stored in the database to speed up
later repetitions of same domain names. First working
hours, new domain names are added continuously, but
after some hours or days very few domain names are
added per hour. In a new deployment, pre-calculated
data can be provided in the database with metrics
about most common domain names, because this in-
formation can be shared between different network
scenarios.
For certain domain, once extra data has been col-
lected if it was not collected before, “Evaluator” mod-
ule is in charge of conforming a suspicion rate that
measures if a domain name has probability to be a
botnet. Suspicion rate is calculated applying some
weights to a combination of previous metrics. Details
will be shown in following section. This rate will al-
low to determine if a domain name is suspect of being
a botnet or not.
6 EVALUATION
In the measured scenario, only DNS requests with
successful A record or NXDOMAIN responses are
considered, discarding not answered requests. This
means 198,357 domain names under consideration.
A botnet domain name blacklist has been conformed
SECRYPT 2011 - International Conference on Security and Cryptography
312
using information obtained from the botnet testbed
with infections of Zeus, Conficker and Kraken bot-
nets, and specific online databases as Zeus tracker
(ZeusTracker, 2011). The conformed botnet domain
name blacklist is over 1,500 domains, and 985 of
them have been identified in traffic traces of the sce-
nario.
Metrics proposed in this paper are evaluated in ta-
ble 1. This table shows identification rates of botnet
domain names based on each of proposed metrics ap-
plied independently. True positives indicate the per-
centage of botnet domain names correctly identified.
The percentage of normal domain names misidenti-
fied as botnets is titled false positives. The percentage
of botnet domain names not identified by the metric
is shown as false negatives. In DNS record age, a
threshold of 0.5 has been considered to identify bot-
nets. Although metrics like DNS MX records have
a high percentage of true positives, also the rate of
false positives is high. Most interesting metrics will
be those with a higher difference between true posi-
tives and (false negatives+false positives). Therefore,
DNS record age, DNS e-mail record or authoritative
DNS server are better metrics.
Suspect ratio is composed by a weighted sum
of proposed metrics: DNS record age, DNS e-mail
record, authoritative DNS server, DNX MX record
and web presence. These weights are obtained from
the differences in identification shown in table 1. Re-
sulting suspect ratio has been normalized. Suspect ra-
tio has been calculated for all domain names observed
in the captured traffic trace under study. Then, do-
main names have been sorted based on this suspect
ratio hoping to have botnet domain names in the first
positions.
Figure 7 shows the percentage of botnet domain
names identified correctly as more domain names are
considered in the sorted domain name list based on
suspect ratio. It can be observed that almost the first
400 domain names are classified correctly as botnets,
and later the proportion of botnet decreases, getting
around 55% botnets in the first 800 domain names.
Besides botnet domain names, domain names associ-
ated to other types of malware are detected, but they
are negligible, around 2% of the total number of bot-
net domain names.
Considering the total number of 674 botnet do-
main names and 198,357 normal domains, the per-
centage of total identification shown in figure 8 repre-
sents the percentage of domain names identified cor-
rectly/wrongly as more domain names are considered
in the domain name list sorted by suspect ratio. As
seen before, for the approximately first 400 domain
names, 70% of total number of botnet domain names
0
20
40
60
80
100
0 100 200 300 400 500 600 700 800
% Identification
Number of domain names
Figure 7: Botnet identification percentages in domain name
list sorted by suspect ratio.
in trace are identified with only 3% of false positives.
The remaining 30% botnet domain names have
inspection ratios indistinguishable from normal do-
mains. This is due to the lack of extra information
for those botnet domain names: there is no WHOIS
response and the only available metrics are DNX MX
record and web presence. Improvements in WHOIS
querying could improve results, for example, balanc-
ing queries between several WHOIS servers.
0
20
40
60
80
100
0 500 1000 1500 2000
Percentage of total identification
Number of domain names
Botnet true positives
Botnet false positives
Figure 8: Percentages of total identification in domain name
list sorted by suspect ratio.
A decision threshold has to be defined to differ-
entiate suspect ratio that identifies botnet or not. If
this threshold is too high, only a reduced percentage
of botnet domain names will be identified. If this
threshold is too low, a big percentage of false posi-
tives will result from the identification process. Fig-
ure 9 presents the cumulative distribution function of
suspect ratio for botnet domain names and normal do-
main names. Botnet domain names are concentrated
in higher suspect ratios. Most of them have a sus-
pect ratio larger than 0.75. Normal domain names are
concentrated in lower values of suspect ratio. Con-
sidering the full set of domain names, a suspect ratio
around 0.75 can be considered to identify over 95%
of botnet domain names and a low percentage of false
BOTNET DETECTION BASED ON DNS RECORDS AND ACTIVE PROBING
313
Table 1: Botnet domain names identification for each metric independently.
Metric True positives False positives False negatives
DNS record age 69.2% (464) 1.228% (3,070) 30.1% (208)
DNS e-mail record 30.6% (206) 0.954% (1,892) 69.1% (466)
Authoritative DNS server 69.1% (466) 1.57% (3,119) 30.7% (207)
DNX MX record 96.1% (648) 25.4% (50,328) 3.71% (25)
Web presence 95.1% (641) 27.03% (53,609) 4.75% (32)
positives (normal domain names considered as bot-
nets).
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Percentage of total identification
Suspect ratio
Normal domain names
Botnet domain names
Figure 9: Cumulative distribution function for suspect ratio.
Proposed BDS is compared with results obtained
by NXDOMAIN technique (Villamarn-Salomon and
Brustoloni, 2008). In this case, a ratio is calculated
for each domain that considers the number of NXDO-
MAIN responses between the total number of DNS
requests. Two cases are distinguished. First, consid-
ering the ratio only if at least one correct answer is
received. With this limitation, only active C&C chan-
nels are considered and non-active domain names are
ignored. It will be called NXDOMAIN-1. Second,
NXDOMAIN-2 will consider all domain names ac-
tive or not.
Table 2 shows the number of domains that can
be analyzed. In BDS, all domains can be processed,
but with NXDOMAIN-1 and NXDOMAIN-2 only
domains that return at least one NXDOMAIN re-
sponse can be processed. In NXDOMAIN-1, only
172 normal domains answer at least with a correct
DNS response for normal domains, and 0 for bot-
net domains. Therefore, NXDOMAIN-1 is not use-
ful. NXDOMAIN-2 increases the number of pro-
cessed botnet domains to 311 but it is a half of the
total number of botnet domain names present at the
traffic trace. NXDOMAIN-2 provides a very bad rate
of false positives making this technique not usable.
Our proposal, BDS, improves identification percent-
ages significantly, achieving 68% of botnet identifica-
tion with only 3.18% of false positives.
7 CONCLUSIONS
The expansion of botnets has increased over the last
years. Therefore, their identification has become very
important. A new technique for botnet identification
has been presented in this paper. It is based on ana-
lyzing DNS requests and responses for domain names
used in identification of Command&Control server.
Extra information is obtained for each domain from
WHOIS service, checking for MX servers availabil-
ity and checking for web services presence.
As only DNS queries have to be processed, very
high speed links can be monitored for the presence of
botnets with low processing overhead. Most domain
names are repeated by different users over time be-
cause they identify most common Internet services or
web pages. Informationabout domain names is stored
to be reused, meaning that in normal operation only
new domains have to be checked for botnets. Detec-
tion is obtained in an early stage of infection because
DNS queries are the first action that an infected com-
puter performs.
A suspect ratio is defined based on a set of met-
rics: DNS record age, DNS e-mail record, authori-
tative DNS server, DNX MX record and web pres-
ence. Achieved results are promising. In an evalu-
ation over an university Internet link, 65% of botnet
domain names are identified with only 3% of false
positives. This data outperforms results with tech-
niques in the state of the art like NXDOMAIN-based.
Improvements are possible considering correla-
tion of DNS request from the same IP addresses.
Once a computer is identified as being part of a bot-
net, following DNS requests from the same computer
havemore probability to be related to the botnet. Even
identifying only part of botnet domain names, all in-
fected computers can be identified because each of
them will request for dozens of domain names and
at least one of them can be identified. Correlation of
DNS queries between different computers can also be
used to improve identification rate.
The number of queries for each domain name can
be also used to improve the suspect ratio. However,
we have not been able to use it because of the low
number of infections in the network under study. This
SECRYPT 2011 - International Conference on Security and Cryptography
314
Table 2: Number of analyzed domains and botnet identification rate for different techniques.
Technique Number of Number of % true % false
normal domains botnet domains positives positives
BDS 250,062 674 68% 3.18%
NXDOMAIN-1 172 0 0% 0%
NXDOMAIN-2 67,427 311 20% 99.87%
would allow, for example, to ignore those misspelled
domain names because they would be requested only
once.
ACKNOWLEDGEMENTS
This work was supported by S21sec labs through
the research project SEGUR@, funded by the Span-
ish Ministry of Industry, Tourism and Trade, on
the framework of CENIT programme with reference
CENIT-2007 2004.
REFERENCES
Binkley, R. and Singh, S. (2006). An Algorithm for
Anomaly-based Botnet Detection. Computer Science,
PSU, USENIX SRUTI: ’06 2nd Workshop on Steps to
Reducing Unwanted Traffic on the Internet.
Binsalleeh, H., Ormerod, T., Boukhtouta, A., Sinha, P.,
Youssef, A., Debbabi, M., and Wang, L. (2010). On
the analysis of the zeus botnet crimeware toolkit. In
Privacy Security and Trust (PST), 2010 Eighth Annual
International Conference on, pages 31 –38.
Chiang, K. and Lloyd, L. (2007). A case study of the rus-
tock rootkit and spam bot. In Proceedings of the first
conference on First Workshop on Hot Topics in Under-
standing Botnets, pages 10–10, Berkeley, CA, USA.
USENIX Association.
Dagon, D. (2005). Botnet Detection and Response, The
Network is the Infection. In 1st DNS-OARC Work-
shop, Santa Clara, CA.
DiG (2009). Tool from the package dnsutils. http://
www.ubuntuupdates.org/packages/show/105545.
DNSDUMP (2010). Perl script that captures and displays
DNS messages. http://dns.measurement-factory.com/
tools/dnsdump/.
Feily, M., Shahrestani, A., and Ramadass, S. (2009). A Sur-
vey of Botnet and Botnet Detection. In Third Interna-
tional Conference on Emerging Security Information,
Systems and Technologies, Athens/Glyfada, Greece.
Goebel, J. and Holz, T. (2007). Rishi: Identify bot con-
taminated hosts by irc nickname evaluation. In First
USENIX Workshop on Hot Topics in Understanding
Botnets (HotBots’07), Cambridge, MA.
Grizzard, J., Sharma, V., C. Nunnery, B. K., and Dagon,
D. (2007). Peer-to-peer botnets: Overview and case
study. In First USENIX Workshop on Hot Topics
in Understanding Botnets (HotBots’07), Cambridge,
MA.
Gu, G., Perdisci, R., Zhang, J., and Lee, W. (2008a). Bot-
Miner: Clustering Analysis of Network Traffic for
Protocol-and Structure-Independent Botnet Detection.
In 17th USENIX Security Symposium (Security’08),
San Jose, CA.
Gu, G., Zhang, J., and Lee, W. (2008b). Botsniffer: Detect-
ing botnet command and control channels in network
traffic. In 15th Annual Network and Distributed Sys-
tem Security Symposium (NDSS’08), San Diego, CA.
Holz, T., Gorecki, C., Rieck, K., and Freiling, F. C. (2008).
Measuring and detecting fast-flux service networks. In
15th Annual Network and Distributed System Security
Symposium (NDSS’08), San Diego, CA.
Jae-Seo, L., HyunCheol, J., Jun-Hyung, P., Minsoo, K., and
Bong-Nam, N. (2008). The activity analysis of mali-
cious http-based botnets using degree of periodic re-
peatability. In Security Technology, 2008. SECTECH
’08. International Conference on, pages 83 –86.
John, J. P., Moshchuk, A., D.Gribble, S., and Krishna-
murthy, A. (2009). Studying spamming botnets us-
ing botlab. In Proceedings of the 6th USENIX sym-
posium on Networked systems design and implemen-
tation, pages 291–306, Berkeley, CA, USA. USENIX
Association.
Jones, J. K. and Romney, G. W. (2004). Honeynets: an edu-
cational resource for it security. In Proceedings of the
5th conference on Information technology education,
CITC5 ’04, pages 24–28, New York, NY, USA. ACM.
Net-Whois (2010). Module for parsing WHOIS informa-
tion. http://search.cpan.org/ivsokolov/Net-Whois-
Parser-0.05/.
Passerini, E., Paleari, R., Martignoni, L., and Bruschi, D.
(2008). Fluxor: Detecting and monitoring fast-flux
service networks. In Proceedings of the 5th inter-
national conference on Detection of Intrusions and
Malware, and Vulnerability Assessment, DIMVA ’08,
pages 186–206, Berlin, Heidelberg. Springer-Verlag.
Perdisci, R., Corona, I., Dagon, D., and Lee, W. (2009). De-
tecting malicious flux service networks through pas-
sive analysis of recursive dns traces. In Computer
Security Applications Conference, 2009. ACSAC ’09.
Annual, pages 311 –320.
Porras, P., Sadi, H., and Yegneswaran, V. (2009). A
foray into confickers logic and rendezvous points. In
In USENIX Workshop on Large-Scale Exploits and
Emergent Threats.
Stone-Gross, B., Cova, M., Cavallaro, L., Gilbert, B., Szyd-
lowski, M., Kemmerer, R., Kruegel, C., and Vigna, G.
(2009). Your botnet is my botnet: analysis of a botnet
takeover. In Proceedings of the 16th ACM conference
BOTNET DETECTION BASED ON DNS RECORDS AND ACTIVE PROBING
315
on Computer and communications security, CCS ’09,
pages 635–647, New York, NY, USA. ACM.
Villamarn-Salomon, R. and Brustoloni, J. (2008). Identi-
fying Botnets Using Anomaly Detection Techniques
Applied to DNS Traffic. In 5th Annual Consumer
IEEE Communications and Networking Conference
(CCNC2008).
Wget-tool (2009). GNU Wget package for retriev-
ing files using HTTP, HTTPS and FTP. http://
www.gnu.org/software/wget/.
Yadav, S., Reddy, A. K. K., Reddy, A. N., and Ranjan,
S. (2010). Detecting algorithmically generated ma-
licious domain names. In Proceedings of the 10th an-
nual conference on Internet measurement (IMC2010),
IMC ’10, pages 48–61, New York, NY, USA. ACM.
Zeljka, Z. (2009). Top 10 botnets and their impact. http://
www.net-security.org/secworld.php?id=8599.
ZeusTracker (2011). The ZeuS Tracker tracks
ZeuS Command and Control servers. https://
zeustracker.abuse.ch/.
Zhaosheng, Z., Guohan, L., Yan, C., Fu, Z., Roberts, P.,
and Keesook, H. (2008). Botnet research survey. In
Computer Software and Applications, 2008. COMP-
SAC ’08. 32nd Annual IEEE International, pages 967
–972.
SECRYPT 2011 - International Conference on Security and Cryptography
316