Unlike conventional World Wide Web technologies, the Tor Darknet onion routing technologies give users a real chance to remain anonymous. Many users have jumped at this chance – some did so to protect themselves or out of curiosity, while others developed a false sense of impunity, and saw an opportunity to do clandestine business anonymously: selling banned goods, distributing illegal content, etc. However, further developments, such as the detention of the maker of the Silk Road site, have conclusively demonstrated that these businesses were less anonymous than most assumed.
Intelligence services have not disclosed any technical details of how they detained cybercriminals who created Tor sites to distribute illegal goods; in particular, they are not giving any clues how they identify cybercriminals who act anonymously. This may mean that the implementation of the Tor Darknet contains some vulnerabilities and/or configuration defects that make it possible to unmask any Tor user. In this research, we will present practical examples to demonstrate how Tor users may lose their anonymity and will draw conclusions from those examples.
How are Tor users pinned down?
The history of the Tor Darknet has seen many attempts – theoretical and practical – to identify anonymous users. All of them can be conditionally divided into two groups: attacks on the client’s side (the browser), and attacks on the connection.
Problems in Web Browsers
The leaked NSA documents tell us that intelligence services have no qualms about using exploits to Firefox, which was the basis for Tor Browser. However, as the NSA reports in its presentation, using vulnerability exploitation tools does not allow permanent surveillance over Darknet users. Exploits have a very short life cycle, so at a specific moment of time there are different versions of the browser, some containing a specific vulnerability and other not. This enables surveillance over only a very narrow spectrum of users.
The leaked NSA documents, including a review of how Tor users can be de-anonymized (Source: www.theguardian.com)
As well as these pseudo-official documents, the Tor community is also aware of other more interesting and ingenuous attacks on the client side. For instance, researchers from the Massachusetts Institute of Technology established that Flash creates a dedicated communication channel to the cybercriminal’s special server, which captures the client’s real IP address, and totally discredits the victim. However, Tor Browser’s developers reacted promptly to this problem by excluding Flash content handlers from their product.
Flash as a way to find out the victim’s real IP address (Source: http://web.mit.edu)
Another more recent method of compromising a web browser is implemented using the WebRTC DLL. This DLL is designed to arrange a video stream transmission channel supporting HTML5, and, similarly to the Flash channel described above, it used to enable the victim’s real IP address to be established. WebRTC’s so-called STUN requests are sent in plain text, thus bypassing Tor and all the ensuing consequences. However, this “shortcoming” was also promptly rectified by Tor Browser developers, so now the browser blocks WebRTC by default.
Attacks on the communication channel
Unlike browser attacks, attacks on the channel between the Tor client and a server located within or outside of the Darknet seem unconvincing. So far most of the concepts were presented by researchers in laboratory conditions and no ‘in-the-field’ proofs of concept have been yet presented.
Among these theoretical works, one fundamental text deserve a special mention – it is based on analyzing traffic employing the NetFlow protocol. The authors of the research believe that the attacker side is capable of analyzing NetFlow records on routers that are direct Tor nodes or are located near them. A NetFlow record contains the following information:
- Protocol version number;
- Record number;
- Inbound and outgoing network interface;
- Time of stream head and stream end;
- Number of bytes and packets in the stream;
- Address of source and destination;
- Port of source and destination;
- IP protocol number;
- The value of Type of Service;
- All flags observed during TCP connections;
- Gatway address;
- Masks of source and destination subnets.
In practical terms all of these identify the client.
De-anonymizing a Tor client based on traffic analysis
This kind of traffic analysis-based investigation requires a huge number of points of presence within Tor, if the attacker wants to be able to de-anonymize any user at any period of time. For this reason, these studies are of no practical interest to individual researchers unless they have a huge pool of computing resources. Also for this reason, we will take a different tack, and consider more practical methods of analyzing a Tor user’s activity.
Passive monitoring system
Every resident of the network can share his/her computing resources to arrange a Node server. A Node server is a nodal element in the Tor network that plays the role of an intermediary in a network client’s information traffic. In this Darknet, there several types of nodes: relay nodes and exit nodes. An exit node is an end link in traffic decryption operation, so they are an end point which may become the source of leaking interesting information.
Our task is very specific: we need to collect existing and, most importantly, relevant onion resources. We cannot solely rely on internal search engines and/or website catalogs, as these leave much to be required in terms of the relevance and completeness of contained information.
However, there is a straightforward solution to the problem of aggregation of relevant websites. To make a list of the onion resources that were recently visited by a Darknet user, one needs to track each instance of accessing them. As we know, an exit node is the end point of the path that encrypted packets follow within the Darknet, so we can freely intercept HTTP/HTTPS protocol packets at the moment when they are decrypted at the exit node. In other words, if the user uses the Darknet as an intermediary between his/her browser and a web resource located in the regular Internet, then Tor’s exit node is the location where packets travel in unencrypted format and can be intercepted.
We know that an HTTP packet may contain information about web resources, including onion resources that were visited earlier. This data is contained in the ‘Referrer’ request header, which may contain the URL address of the source of the request. In the regular Internet, this information helps web masters determine which search engine requests or sites direct users towards the web resource they manage.
In our case, it is enough to scan the dump of intercepted traffic with a regular expression containing the string ‘onion’.
There is a multitude of articles available on configuring exit nodes, so we will not spend time on how to configure an exit node, but instead point out a few details.
First of all, it is necessary to set an Exit Policy that allows traffic communication across all ports; this should be done in the configuration file torrc, located in the Tor installation catalog. This configuration is not a silver bullet but it does offer a chance of seeing something interesting at a non-trivial port.
>> ExitPolicy accept *:*
The field ‘Nickname’ in the torrc file does not have any special meaning to it, so the only recommendation in this case is not to use any conspicuous (e.g. ‘WeAreCapturingYourTraffic’) node names or those containing numbers (‘NodeNumber3′) that might suggest an entire network of such nodes.
After launching a Tor server, it is necessary to wait until it uploads its coordinates to the server of directories – this will help our node declare itself to all Darknet participants.
An exit node in operation
After we launched the exit mode and it began to pass Tor users’ traffic through itself, we need to launch a traffic packet sniffer and intercept the passing traffic. In this case, tshark acts as a sniffer, listening to interface #1 (occupied by Tor) and putting the dump into the file ‘dump.pcap':
>>tshark –i 1 –w dump.pcap
Tshark intercepts packets that pass through the exit node in unencrypted format
All the above actions must be done on as many servers as possible to collect as much information of interest as possible. It should be noted that the dump grows quite quickly, and it should be regularly collected for analysis.
Thus, once you receive a huge dump, it should be analyzed for onion resources. Skimming through the dump helps to categorize all resources visited by Tor users by content type.
It should be noted that 24 hours of uninterrupted traffic interception (on a weekday) produce up to 3GB of dump for a single node. Thus, it cannot be simply opened with Wireshark – it won’t be able to process it. To analyze the dump, it must be broken into smaller files, no larger than 200 MB (this value was determined empirically). To do so, the utility ‘editcamp’ is used alongside with Wireshark:
>> editcap – c 200000 input.pcap output.pcap
In this case, 200,000 represents the number of packets in a single file.
While analyzing a dump, the task is to search for strings containing the substring “.onion”. This very likely will find Tor’s internal resources. However, such passive monitoring does not enable us to de-anonymize a user in the full sense of the word, because the researcher can only analyze those data network packets that the users make available ‘of their own will’.
Active monitoring system
To find out more about a Darknet denizen we need to provoke them into giving away some data about their environment. In other words, we need an active data collection system.
An expert at Leviathan Security discovered a multitude of exit nodes and presented a vivid example of an active monitoring system at work in the field. The nodes were different from other exit nodes in that they injected malicious code into that binary files passing through them. While the client downloaded a file from the Internet, using Tor to preserve anonymity, the malicious exit node conducted a MITM-attack and planted malicious code into the binary file being downloaded.
This incident is a good illustration of the concept of an active monitoring system; however, it is also a good illustration of its flipside: any activity at an exit node (such as traffic manipulation) is quickly and easily identified by automatic tools, and the node is promptly blacklisted by the Tor community.
Let’s start over with a clean slate
- Various graphics drivers and hardware components installed on the client’s side;
- Various sets of software in the operating system and various configurations of the software environment.
The parameters of rendered images can uniquely identify a web-browser and its software and hardware environment. Based on this peculiarity, a so-called fingerprint can be created. This technique is not new – it is used, for instance, by some online advertising agencies to track users’ interests. However, not all of its methods can be implemented in Tor Browser. For example, supercookies cannot be used in Tor Browser, Flash and Java is disabled by default, font use is restricted. Some other methods display notifications that may alert the user.
Thus, our first attempts at canvas fingerprinting with the help of the getImageData()function that extracts image data, were blocked by Tor Browser:
However, some loopholes are still open at this moment, with which fingerprinting in Tor can be done without inducing notifications.
By their fonts we shall know them
Tor Browser can be identified with the help of the measureText()function, which measures the width of a text rendered in canvas:
Using measureText() to measure a font size that is unique to the operating system
If the resulting font width has a unique value (it is sometimes a floating point value), then we can identify the browser, including Tor Browser. We acknowledge that in some cases the resulting font width values may be the same for different users.
It should be noted that this is not the only function that can acquire unique values. Another such function is getBoundingClientRect(),which can acquire the height and the width of the text border rectangle.
When the problem of fingerprinting users became known to the community (it is also relevant to Tor Browser users), an appropriate request was created. However, Tor Browser developers are in no haste to patch this drawback in the configuration, stating that blacklisting such functions is ineffective.
Tor developer’s official reply to the font rendering problem
This approach was applied by a researcher nicknamed “KOLANICH”. Using both functions, measureText() and getBoundingClientRect(), he wrote a script, tested in locally in different browsers and obtained unique identifiers.
Using the same methodology, we arranged a test bed, aiming at fingerprinting Tor Browser in various software and hardware environments.
A fragment of a web-server’s log with a visible Tor Browser fingerprint
At this time, we are collecting the results of this script operating. To date, all the returned values are unique. We will publish a report about the results in due course.
Possible practical implications
- Internal onion resources and external web sites controlled by the attackers. For example, an attacker launches a ‘doorway’, or a web page specially crafted with a specific audience in view, and fingerprints all visitors.
- Internal and external websites that are vulnerable to cross-site scripting (XSS) vulnerabilities (preferably stored XSS, but this is not essential).
Objects that could fingerprint a Tor user
The last item is especially interesting. We have scanned about 100 onion resources for web vulnerabilities (these resources were in the logs of the passive monitoring system) and filtered out ‘false positives’. Thus, we have discovered that about 30% of analyzed Darknet resources are vulnerable to cross-site scripting attacks.
The process of de-anonymizing a Tor user
Following this approach, the attacker could, in theory, find out, for instance, sites on which topics are of interest to the user with the unique fingerprint ‘c2c91d5b3c4fecd9109afe0e’, and on which sites that user logs in. As a result, the attacker knows the user’s profile on a web resource, and the user’s surfing history.
In place of a conclusion