Malicious DNS Traffic in Tor: Analysis and Countermeasures

Michael Sonntag

Institute of Networks and Security, Johannes Kepler University, Altenbergerstr 69, A-4040 Linz, Austria

Keywords: Anonymization, Tor, DNS, Malicious Behaviour.

Abstract: Anonymization is commonly seen as useful only for people that have something to hide. Tor exit nodes are

therefore associated with malicious behaviour and especially the so-called “darknet”. While the Tor network

supports hidden services, and a large share of these serve illegal purposes, most of the traffic in the Tor net-

work exits to the normal Internet and could be, and probably is, legal. We investigate this by taking a look at

the DNS requests of a high-bandwidth exit node. We observe some malicious behaviour (especially DNS

scans), questionable targets (both widely seen as immoral as well as very likely illegal in most countries),

and careless usage. However, all these, while undoubtable undesirable, make up only a small share of the

exit traffic. We then propose some additions to reduce the detected malicious use.

1 INTRODUCTION

It is commonly claimed that the Tor anonymisation

network (Dingledine/Mathewson/Syverson, 2004) is

used for undesirable/illegal activities - but so is the

“normal” Internet. The Tor network routes traffic over

three nodes with multiple layers of encryption to

anonymize the IP address of the source. While it can

be used for any kind of TCP connection, it is over-

whelmingly used for web surfing. In this way, visitors

of websites may remain anonymous to the sites (un-

less they log in) and avoid blocks to them by their

ISP. This definitely has appeal for illegal activities -

but so it has for content which is officially labelled as

“undesirable”, e.g. in countries with strong censor-

ship.

Inspecting the Tor traffic was done e.g. by

Ling/Luo/Wu/Yu/Fu (2015), which discovered a large

amount of malicious traffic. However, only 9 % of

their alerts were related to actual malware. As we

operate a high-bandwidth exit node, we investigated

its exit traffic for signs of such undesirable (according

to several ways of classifying it as such) traffic. In

this paper we report on the results from observing the

DNS traffic of the exit node regarding malicious

behaviour, as opposed to Sonntag (2018), where we

investigated the use by country of destination and

categorization of second-level domains. Investigating

DNS traffic is especially useful, as it would allow

blocking undesirable behaviour before expending

bandwidth, which is usually in low supply for exit

nodes of the Tor network. Additionally, the DNS

traffic is public anyway to a large degree: what cannot

be answered immediately from the cache is sent to

some external DNS resolver and is observable from

the outside, e.g. the ISP of the user and the operator of

the DNS server. This could lead to additional prob-

lems for exit nodes, e.g. complaints or blocks based

on scans exiting from it.

2 DATA COLLECTION METHOD

Data was collected during five month, from 1.2.2018

until 30.6.2018 in one-hour periods. The method for

collection was to install our own DNS caching serv-

er and use this as the DNS server of the exit node.

As that computer is not used for anything else, all

DNS queries can be attributed to the exit node. The

cache logs all queries to disk. The logs are rotated

hourly and investigation takes place per hour to

better preserve privacy. To ensure as detailed data

on exit traffic as possible, the timeout this caching

server returns to the exit node is set to a very low

value of 1 minute. Note that this is not directly effec-

tive, as Tor itself sets the timeout to 5 minutes for

very small timeouts it receives (and 60 minutes for

longer ones) to protect against attacks (DefecTor:

Greschbach/Pulls/Roberts/Winter/Feamster 2017).

Because of this, we did not modify these settings.On

the Internet side of the cache, no changes are

made- whatever the upstream servers send is used.

536

Sonntag, M.

Malicious DNS Trafﬁc in Tor: Analysis and Countermeasures.

DOI: 10.5220/0007471205360543

In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 536-543

ISBN: 978-989-758-359-9

Figure 1: Histogram of DNS requests per hour.

3 SUSPICIOUS BEHAVIOUR

The simplest statistics is the number of DNS re-

quests occurring per hour. For normal traffic this

should correlate to the traffic, i.e. for each outgoing

TCP connection one DNS lookup can be expected.

Because of internal caching this number must be

reduced significantly, as e.g. a web page does not

consist of a single file only, but e.g. of several

HTML pages, stylesheets, script files, multiple pic-

tures etc. On average there were 66,542 requests,

which translates to approx. 18.5 requests per second

(see also at the end under ethical considerations). In

sum there were 237,953,608 DNS requests during

the whole observation period. However, this number

varies significantly over time. The minimum number

per hour encountered was 3,698, while the maxi-

mum was 291,472. To better understand these varia-

tions, a histogram of classes of counts was created

(see Figure 1). From this it is apparent that the varia-

tions are much lower than it appears at first, as ex-

treme outliers influence especially the maximum

value. Regarding the number of connections, in total

2,429,411,680 connections were observed during the

same period, which translates to 680,126 flows per

hour. This produces 10.2 flows per logged DNS

request (note the DNS caching; but some connec-

tions are established directly to IP addresses too). As

most of the traffic is web surfing, this looks correct.

3.1

Reverse

DNS Scanning

The single extreme outlier in Figure 1 was investi-

gated individually: this hour had 221,621 .arpa PTR

requests (i.e. about 70,000 other requests, which is

perfectly average for a single period). Generally,

very few reverse lookups are to be expected, as e.g.

web traffic (taking up almost all of the traffic), does

not need this at all. This was a reverse scan of sever-

al large networks (the names in the parentheses stem

from the WhoIS database): 158.172.0.0/16 (OR-

GANISMO AUTONOMO DE CORREOS Y; this

seems to be the Spanish postal service),

158.227.0.0/16 (Universidad del Pais Vasco),

158.42.0.0/16 (Universitat Politecnica de Valencia),

158.49.0.0/16 (Universidad de Extremadura). It was

further investigated whether there exists an associat-

ed spike in traffic: we do not have any information

on individual targets, but the whole traffic during

this hour was not different from other hours at all,

neither in number of connections nor the amount of

data transferred. Therefore, this scan was not ac-

companied by actual connections to these IP ad-

dresses, it was “merely” a reverse DNS scan.

The reason for performing such a scan via Tor is

not obvious: the targeted institutions would not note

such a scan, unless they operate their nameservers

themselves (or were specifically informed of it). As

these are class B subnets, that is however likely the

case - and was in this instance. The nameservers for

the Spanish post (193.148.159.170, .171) are within

a different network, but these addresses also belong

to the post. For the universities, at least some of the

nameservers lie within the address area scanned

(158.227.82.16; 158.42.1.5; 158.49.8.2). We can

therefore conclude that such large scans would likely

have been noticed by the targets and potentially

traced back. Performing them via Tor avoids that

possibility as any trace back to the origin would stop

at our exit node. As we did not discover any legiti-

mate (or business) cause (e.g. checking for rogue

Malicious DNS Trafﬁc in Tor: Analysis and Countermeasures

537

computers can be done from any IP address not

affiliated to the institution), less than honourable

intentions can be surmised, e.g. discovering which

systems exist and gathering information on them.

3.2 DNS Scanning

Domains asked for but not existing are a significant

portion of the queries: on average 6,577 domain

names are asked for each hour, which do not exist.

This translates to 10% of all requests. As it is unlike-

ly that humans enter that many incorrect domain

names and even notoriously non-specification-

conforming HTML usually gets the host part of links

right, a different explanation is required. After man-

ual investigation of these errors, we could identify

the following subgroups:

 These seems to be a lot of checking for existing

(or not) domains going on with very good lists or

sensible automation. Few nonsensical names are

tried, as almost all do make (some) sense. For exam-

ple, these contained (beside numerous similar oth-

ers) the following series of queries: worldwidere-

veal.com, worldwiderevenue.net, world-

widescort.com, worldwidescubatravel.com, world-

wideshoponline.com, worldwideshopspot.com,

worldwidetomatosociety.com, worldwide-

towers.com, worldwidetowinginc.com, world-

widetravelmembership.com, worldwideunderstand-

ing.com, world-wide-web-host.com, worldwideweb-

stersonline.com, worldwidewebtec.com, world-

wideweed.xyz. However there seems to be no obvi-

ous generator being used, as definitely many more

“worldwide*” names exist, multiple TLDs are used

(but with different second-level names), and e.g.

typos are perhaps also part of it (worldwidescort

should probably be worldwideescort). Also, if being

merely dictionary-based, many more combinations

than the ones above would be tried. A possible ex-

planation for this is that multiple exit nodes might be

used, so we only saw a portion of all queries. Note

that unlike the examples below, all these domain

names were only queried for a single time over the

whole observation period. This therefore seems

unlikely to be a prelude to attacks, but more search-

ing for opportunities to buy domain names, or creat-

ing respectively maintaining a list of existing top-

/second-level domains.

 Numerous non-existing domain names are que-

ried for multiple times: for example, the top one is

“geo.mozilla.org” with 37,395

queries in total over

all five months, a domain name that however did

exist in the past. The next most common one (15,929

times) is cdn.api.twitter.com, which seems to have

been a working (but non-official) server which has

since been shut down. A small amount of queries are

mistakes of websites, at least partly because of

changing/removed server names not followed by

changes in the websites.

 Some domain names are obviously simply erro-

neous, like “index.php” (3,778 queries) or “wp-

as a path and not as a host. Or “web.archive.orghttp”

(2,433 queries), “web.archive.org.https” (occurred

19 times) or “web.archive.org.localhost” (4 queries),

which are typos or signs of misconfigurations or

mistakes. Even aggregated these do not constitute a

significant number of queries in total.

 Not directly explainable are the huge amount of

queries for domains of the form “forum.*”. 714,174

such non-existing domain names were queried for.

And as each of the top names (“fo-

rum.eurostimul.com”, “forum.zawya.com”, “fo-

rum.roots-archives.com” etc) occur more than 2,900

times this cannot be merely a scan. According to

Google searches, these domain names do not exist or

existed, although there might have been forums on

these sites (e.g. “eurostimul.com/forum/

memberlist.php” is in the result list). As it is unlikely

that several thousand scans with the same lists occur,

this is looking more like an error while performing

scans.

 Apart from non-existing domain names, also

many queries receive a “no-data” reply. The tech-

nical reason is when a specific type of DNS record is

queried for and the domain name does exist, but not

this kind of record. Because of the limitation of Tor

in DNS queries (only A=IPv4, AAAA=IPv6, and

PTR=reverse lookup; are possible), the explanation

is simple: these are queries for IPv6 addresses,

where only IPv4 data exists (or potentially the re-

verse). This can be exemplified by the most common

name in this category: e13829.x.akamaiedge.net was

queried for 1,111,778 times! This domain name does

exist, but only serves IPv4, but was often queried for

its non-existing IPv6 record. The same applies to the

second largest count in this category: shops.myshop-

ify.com (363,117 queries; IPv4 data only). These

requests are therefore legitimate and not signs of a

scan, but of the increasing share of IPv6 being used.

DNS scans can also be used as attacks: little out-

going traffic causes large return traffic. Together

with falsifying the source address a DoS attack be-

comes possible. As the exit node determines the

source address of query packets,

this is not relevant

here. But the fact remains, that a DNS server must

produce a large answer (and expend computing time

for producing it), thereby, although not allowing

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

538

Figure 2: Measurements against the hour of the day.

reflection attacks, potentially supporting DoS attacks

against name servers. This would be especially

prominent in reverse scans, as these are all going to

the same nameserver(s) if a single/few TLDs are

chosen.

3.3 Ad-/Malware Domains

What is surprising too is the number of domains

queried, which are on a malware/adware blacklist

(Black). This list is a compilation of several other

lists with duplicates removed and contains slightly

below 60,000 domain names. Variants with addi-

tional categories like fake news, gambling, porn etc

exist, but these were not used as many of these ex-

tensions are legal in most countries.

Comparing all queries to this list results in 3,403

matches per hour, so about 5.1% of all requests are

on this list (again: merely containing ca. 60,000

domain names!). However, there is a possible expla-

nation for this: beside just obtaining anonymity

people use Tor also for getting around restrictions,

e.g. state or company censorship. Such measures are

typically implemented on firewalls and use similar

lists (for security purposes or to restrict non-

business-related Internet use). So while “normal”

websites can be visited directly, “forbidden” ones

are more likely to be visited through Tor - and more

likely to end up on such lists. Therefore the share of

such websites would be larger.

Another element is, that despite its name, the list

not only contains “bad” sites (malware/adware), but

also many sites which are merely advertisements or

user tracking (for example, 125 domains of the form

*.oewabox.at are on the list; this is the “Austria Web

analysis” used by most Austrian newspapers, online

shops etc).

Therefore, this rather large share of domains

found on the list is not solely a measure of ille-

gal/dangerous activity, but still noteworthy. Addi-

tionally, these sites are “problematic” in the sense of

posing dangers to visitors, so the “criminal behav-

iour” is at least often not on the party using Tor for

visiting them, but on the website operators.

3.4 Result Validation via Time-of-Day

The total DNS traffic depends on the hour of the

day, which is unsurprising as so does the total traf-

fic. The maximum is at 23 o’clock local time (Aus-

tria), i.e. 22 UTC, while the minimum is between 5

and 7 UTC (see Figure 2 ; “Total”). We can compare

this with the normal “European” traffic as evidenced

by the throughput at DE-CIX (https://www.de-

cix.net/de/locations/germany/frankfurt/statistics).

There the minimum is at 4 o’clock and the maxi-

mum at approximately 21 hours UTC. From this

comparison it is evident, that our traffic is (on aver-

age) shifted 2 hours later. This would imply that the

“average” user is slightly east of our location. If exit

nodes are selected randomly and not deliberately,

then a completely “flat” curve would have been

expected, as humans are distributed across the whole

world (except the oceans). Another factor to take

Malicious DNS Trafﬁc in Tor: Analysis and Countermeasures

539

into account is, that people might use Tor differently

than other web traffic, e.g. predominantly in the

evening or preferably during work. No definite con-

clusion is possible, but either there are proportion-

ately more users located in the western part of Rus-

sia and the middle east (or generally in Asia than

America, which seems more likely), or users prefer

Tor in the evening and shun in during the day.

As can be seen from Figure 2, the columns “NX”

are practically independent of the time. This not only

looks like this, the correlation between the total

traffic and not-existing domains is merely 0.145.

From this we can reinforce the discovery of scans

going on - these are independent of actual end-user

traffic and therefore do not rise/fall with it. Normal

users are unlikely as being their source, as these

would type wrong names in the same ratio all day.

Independence is even further reinforced when com-

paring it to “ND”, the replies that the domain exists,

but no data is present. This does vary with the hour

of the day and the correlation factor to traffic is

0.995, so IPv4/IPv6 issues are directly related to the

traffic of users.

Regarding the lines in Figure 2 it is important to

note, that these are individually scaled to be better

visible and comparable, so their values are not ac-

cording to the left axis. But what can be seen from

them is, that the number of domains found in the

Malware/Advertisement list is similar to the total

traffic. This can be explained by the fact, that many

sites use advertisements for commercialization. The

correlation between those two is however not that

strong (Malware/Ad vs Total traffic is 0,797). This

leads to the conclusion, that “problematic” sites are

visited in a larger share during the evening than

during the day (see gaps/touching lines in Figure 2).

3.5 WhoIs Scans

Domain name queries were classified according to

their third-level domain. Domain names may consist

of up to 63 labels, and often the third from the right

tells what service is being accessed (e.g.

www.company.com  “www”  website). Today

however many queries do not contains such a third

level element any more at all (like in “google.com”;

114,475,510 such queries occurred in total).

What becomes apparent from these results is,

that WWW traffic is by far the most prominent one,

especially as the classes “Server” and “CDN” will in

many cases be web elements too. But what is sur-

prising is the large number (656,752) of “WhoIs”

queries. This ties in with a previous finding showing

significant such traffic based on ports accessed

(Sonntag/Mayrhofer, 2017). One possible explana-

tion is, that this is related to the reverse domain

name requests: checking whether an IP address is

associated to a domain name and then asking for its

owner. However, verification would require detailed

investigation of individual traffic content (which

website was queried for in the WhoIs connection)

and correlation with domain queries and was there-

fore not performed. Whether this kind of traffic will

continue in the future is unclear, as e.g. according to

the EU GDPR much less data will be contained in

the WhoIs databases, and even less immediately

publicly accessible, so queries might be of less use.

3.6 Dangerous Usage

Also noteworthy are the smaller but still significant

counts of queries regarding mail servers

(mail./smtp./imap./…): 262,220 queries. Although

traffic with many of them will be encrypted, this is

not guaranteed. Also note that we do not allow port

25 (=SMTP) on our exit node, so this must be mail

retrieval, not sending. Even more surprising and

potentially dangerous are queries regarding FTP

servers (13,004). While small on comparison, this is

still a very large absolute number, where the trans-

mission of credentials would take place unencrypted.

These could be “secure” in the sense that only ano-

nymous logins to public servers are used, but wheth-

er this is the case cannot be determined without

inspecting the actual traffic.

3.7 Illegal Content

What people are looking for via Tor has been inves-

tigated via categorization of the domain names re-

quested. Categorization was performed through

“Shalla’s Blacklists” (Shalla’s Blacklists). These

lists provide categorizations or URLs and is with a

count of 1.7 million entries quite comprehensive.

This list contains both domains and URLs. While we

could easily extract the domains from the URLs, this

would be problematic, as e.g. the download link for

the microsoft.com website (classified as “Down-

loads”) does not mean that the whole of mi-

crosoft.com is purely a download site. Unfortunately

we were able to categorize only 10 % of all traffic

(89,99% is not in the classification list). But for the

10% found the results are as follows (only categories

with at least 1% are listed individually):

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

540

Table 1: Successful categorization of DN queries.

Cate

DN re

uests Share

Porn 3,400,700 14.3%

Socialnet 3,001,751 12.6%

Shopping 2,765,896 11.6%

Adv 2,074,556 8.71%

News 2,063,338 8.55%

Forums 1,504,763 6.32%

Movies 1,385,006 5.82%

Tracke

1,339,224 5.62%

Searchengine 1,264,996 5.31%

Imagehosting 797,450 3.35%

Downloads 637,854 2.68%

ISP 520,920 2.19%

Chat 355,395 1.49%

Government 352,259 1.48%

Webmail 239,451 1.01%

Othe

8.87%

While this list does not directly show “problem-

atic” or “illegal” traffic, it clearly shows that many

visits are likely legal: shopping, social networks,

news, forums etc are predominantly legal, as are

advertisements. Potentially problematic content is

porn (depending on kind and country this can be

illegal), forums/chat (depends on topic) and mov-

ies/imagehosting/downloads (a significant share of

information about files violating copyright is to be

expected - less so the files themselves because of the

limited bandwidth).

Also interesting is the large share of webmail:

using Tor to access a mail account does not guaran-

tee anonymity for the E-Mail address at all, this

requires different anonymization methods. Tor

brings here only one advantage: the association

between the user of a (free - anonymous paying is

complicated) account and an E-Mail address remains

hidden. So it seems there is a significant desire for

not only using an “anonymous” E-Mail address, but

also ensuring that this E-Mail address cannot be

traced back to the computer accessing it. But see

also above for directly accessing E-Mail servers in

section 3.6.

Potentially “problematic” categories are compar-

atively rare: downloads (2.68% of queries that could

be categorized), spyware (0.82%), warez (0.81%),

gamble (0.49%), anonvpn (0.09%; i.e. another anon-

ymisation layer on top of Tor!), hacking (0,08%),

drugs (0.07%). While not common, these are still a

relevant amount, e.g. “drugs” refers to 15,924 of 238

million queries (=0,0067% of all queries, so one in

14,946). No numbers for the “normal” internet could

be found, but this tiny part looks not very extraordi-

nary and is definitely not a major share of the total

Tor usage.

4 POSSIBLE

COUNTERMEASURES

AGAINST MALICIOUS USE

What can be done against such attacks? We are

discussing here only measures to be implemented on

exit nodes. Educating users, securing their browsers

etc are out of our scope. Similarly, existing coun-

termeasures, like removing the WhoIs Port from the

exit policy to prevent such connections completely

(countering section 3.5), are not discussed.

4.1 DNS Queries without Traffic

DNS scans (sections 3.1 and 3.2) are either trivial to

detect or very hard. If a single Tor circuit issues

numerous DNS queries but does not open any con-

nection to them, then this is technically easy to de-

tect. This would merely require defining a “mini-

mum” of actual content traffic per DNS request, as it

should be very uncommon to ask for a specific do-

main name and then not even try to send any data to

it. So a limit of 2-5 requests without data traffic

(=RELAY_RESOLVE as opposed to RE-

LAY_BEGIN; see src/feature/relay/dns.c of the Tor

source code) could be easily enforced. This comes

with a potential problem however: state storage. The

exit node would have to store this additional infor-

mation for each Tor circuit until a data connection is

at least tried, potentially allowing DoS attacks

against the exit node.

This approach would not completely prevent

DNS scans, but at least render them much more

difficult to perform as a new Tor circuit would have

to be established every few requests, creating a sig-

nificant slow-down. This would work even better for

reverse scans (PTR queries), as these are so uncom-

mon in normal traffic that any even slightly in-

creased use is very likely a misuse.

A potential problem, however, could be web

browser prefetching: requesting a DNS lookup for

domains of links on the current page, which the user

might click on later to reduce latency and browsing

speed (see Nidd/Kunz/Arik, 2000). But see above:

an average of 10 flows per DNS request point rather

in the opposite direction.

Still, a permanent prevention of scans is impos-

sible. This would require either to correlate multiple

Tor circuits (all going to the same subnet or “simi-

lar” domain names - technically difficult and requir-

ing lots of resources) or identifying that they are

originating from the same system - something the

Tor system is specifically designed to prevent.

Malicious DNS Trafﬁc in Tor: Analysis and Countermeasures

541

4.2 Delaying Responses

A softer approach would be to artificially introduce

delays. The first query of a Tor circuit is answered

immediately, but each further query without data

traffic is delayed by an additional e.g. one second

query: 2 seconds and so on) before the response

is sent back. In this way scans would be similarly

discouraged, but the countermeasure would be hard-

er to detect (which is less useful than it sounds, as

this fact would very soon become public knowledge,

both generally and specific to exit nodes).

4.3 Blacklist Filtering

Filtering with blacklists is another countermeasure

that would be possible to reduce illegal usage, espe-

cially as discussed in sections 3.6 and 3.7. However,

the problem is to define what is illegal. The exit

node can of course ban what is not allowed at its

location, but this need not be identical to illegality

where the end-user is. Additionally, blacklists are

notoriously problematic regarding their mainte-

nance: adding new sites to block and removing old

ones with changed content. There exists another

issue here: blocking can only in some cases be per-

formed based on DNS, as e.g. a site might contain

legal as well as illegal content under different URLs.

Differentiating them would only be possible by

investigating the content of the exit traffic and no

longer by DNS queries alone. As now most exit

traffic is encrypted, this is impossible anyway.

Blocking based on lists should therefore (and as well

based on general considerations about censoring,

too) be avoided.

5 ETHICAL CONSIDERATIONS

What we are investigating here is Tor exit node

traffic, i.e. intended to be anonymous. The most

important priority of research is therefore to keep it

like this. A DNS name, other than the full URL,

usually does not tell anything about the user visiting

this site by itself. However, that is not guaranteed,

like websites about specific medical problems. To-

gether with the exact time of the DN query it could

potentially be useful to deanonymize specific users

through correlation attacks. To avoid any reduction

in anonymity, even though the exit node alone will

not help without the other two nodes, the recorded

data is stored and evaluated in one-hour chunks. The

exact time of the requests, resp. replies, is removed

immediately after evaluation (and not used anyway,

but cannot be avoided in the DNS cache’s log).

We observed a minimum of 3,698 DNS requests

per hour, resulting in approx. one DNS query per

second. The average over all one-hour periods are 18

requests/second, with a maximum of 81 queries. The

timestamp precision is typically one second, there-

fore the lower boundary is close to supporting indi-

vidual identification.

Note that DNS information is not confidential:

iterative DNS requests are typically sent to the next

server in full, not merely the necessary subpart (see

QNAME minimisation for privacy improvements:

RFC 7816; Bortzmeyer 2016). Therefore, third par-

ties may observe parts or all of the information any-

way, as it is not encrypted at all (DNSSec is not

widely used and would have to be added to Tor exit

nodes via a proxy anyway). As the IP addresses of

exit nodes are publicly known, if they perform name

resolution themselves, this is obvious. In our case

the same, solely dedicated to Tor services, network

is employed, so queries can still be identified as

related to the exit node.

6 CONCLUSIONS

While we detected malicious behaviour in the DNS

traffic, it is on a very low level. Specifically, DNS-

only behaviour is scanning, both forward (asking for

IP addresses of multiple domain names) and reverse

(asking for the domain name of many IP addresses).

For both we have identified potential countermeas-

ures, where the most promising seems to be limiting

such queries per Tor circuit and/or delaying them.

While this would not prevent such scans, it would

make them more costly (continuously creating new

Tor circuits) or more suspicious (actually initiating a

connection to these hosts). Drawbacks from such

measures are not apparent but should be tested. In

this way malicious behaviour through Tor could be

reduced to some degree.

ACKNOWLEDGEMENTS

We would like to thank both the Johannes Kepler

University Linz as well as the AcoNet for supporting

this project by granting permission respectively

providing the necessary bandwidth.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

542

REFERENCES

Black, S.: Unified hosts file with base extensions.

https://github.com/StevenBlack/hosts.

Bortzmeyer, S.: DNS Query Name Minimisation to Im-

prove Privacy. RFC 7816, 2016. https://tools.ietf.org/

html/rfc7816.

Dingledine, R., Mathewson, N, Syverson, P.: Tor: The

Second-Generation Onion Router, In: Proceedings of

the 13th conference on USENIX Security Symposium -

Volume 13 (SSYM'04), Vol. 13. USENIX Association,

Berkeley (2004).

Greschbach, B., Pulls, T., Roberts, L. M., Winter, P.,

Feamster, N.: The Effect of DNS on Tor’s Anonymity.

NDSS ’17, Internet Society, San Diego (2017).

Ling, Z., Luo, J., Wu, K., Yu, W., Fu, X.: TorWard: Dis-

covery, Blocking, and Traceback of Malicious Traffic

Over Tor, IEEE Transactions on Information Foren-

sics and Security, Vol 10/12, 2515 - 2530 (2015).

Nidd, M., Kunz, T., Arik, E.: Prefetching DNS Lookup for

Efficient Wireless WWW Browsing. Proceedings of

Wireless 97, 409-414.

Shalla’s Blacklists. http://www.shallalist.de/

Sonntag, M., Mayrhofer, R.: Traffic Statistics of a High-

Bandwidth Tor Exit Node. In: Mori, P., Furnell, S.,

Camp, O. (eds.) Proceedings of 3rd International

Conference on Information Systems Security and Pri-

vacy, 270-277. SCITEPRESS (2017).

Sonntag, M.: DNS Traffic of a Tor Exit Node - An Analy-

sis. In: Wang G., Chen J., Yang L. (Eds): Security,

Privacy, and Anonymity in Computation, Communica-

tion, and Storage. SpaCCS 2018. Springer, Lecture

Notes in Computer Science, vol 11342, 33-45 (2018).

Malicious DNS Trafﬁc in Tor: Analysis and Countermeasures

543