How is a digital footprint used to track you online

One of the main appeals of the internet is the chance of anonymity that it offers. Although some may believe that anonymity and privacy are only crucial for those who are trying to hide something, the truth is that everyone has the right to keep their online conversations and activities protected from prying eyes. Now that we know that government organizations have implemented sophisticated and extensive surveillance programs and that they can target anyone, the need for keeping our privacy protected has increased significantly, even if you only want to stop them from seeing your family photos.

The challenges of anonymity footprints

Both anonymity and security have multiple layers. Although your IP address can be used to trace you, it is possible to hide it using a variety of technologies. Those who try to defeat your attempts to remain anonymous, have found other ways to do so. There are different methods to uncover your true identity and they involve the footprints that are left behind when you use the internet. They include browser fingerprinting, IP address leakage, temporary and cache files, as well as DNS and WebRTC queries and styleometry. Operating systems like TAILS, Whonix and Qubes may offer protection against IP address leakage, traffic correlation and other issues that could put your privacy at risk. However, they won’t be effective when it comes to a styleometry analysis.

Temporary and cache files

Web browsers store a high amount of data including images and even entire pages, which helps to make the content load faster whenever you visit a website. In order to provide authentication and preferences data to websites, cookies are also kept. When the data remains on the device you used, it is possible to use it to identify the sites that you have visited. You can find two main kinds of memory in a computer. One is the temporary memory or RAM (random Access Memory), which is used by the operating system to open applications and process data. When the computer is shit down, the RAM is removed.

In order to store user files, web browser temporary files and other data, permanent memory is required. This can be obtained from hard disk drive, memory cards or USB drives. Permanent memory remains in the computer, even after a restart. In order to prevent files from remaining in the permanent memory, you can rely on a Live CD. This is an operating system that can be booted into using a CD, instead of having to install it on a system. Live CD uses RAM Disk, which is simply a disk in RAM. The Live CD takes some RAM from the computer and sets up a virtual hard drive from it. The Live CD uses the disk like a hard disk drive to store the data that is more permanent. Still, given that it actually is just RAM, the contents disappear once the system is shut down.

You can configure some Live CDs to store the data on true persistent devices like USB sticks. However, it is not advisable to do this if you are concerned about anonymity. Furthermore, in general, the data in RAM is not erased on purpose when a Live CD is shut down. Instead, the memory is simply released so that the operating system can reassign it. This means that no matter what was last in that memory space, it will remain there until it is used again. In theory, the data can be read, in spite of the fact that it is no longer addressed. TAILS is an example of Live CD distribution and it overwrites the RAM space used on purpose, before releasing it.

IP address leakage

In the internet, every single request made requires an IP address that lets the recipient knows where to send the response. Those IPs can be logged and traced back to individual users, without major hassle. This is why any technology that focuses on anonymity should be able of disguising your IP address. Secure operating systems like Whonix, TAILS and Qubes support Tor, which allows you to mask your real IP address.

In-the-clear DNS and WebRTC queries

One of the tasks completed by your computer in the Internet is to request data from remote servers. The majority of online communications use domain names, rather than IP addresses. However, since IP addresses are key for the internet, it is important to have a process that related domain name with an IP address. This is what DNS or Domain Name System takes care of. When a DNS query is issued by your computer, other parties can find out what website you are going to visit, even if the actual communication has been encrypted using a technology such as a VPN.

WebRTC stands for Web Real Time Communication and it refers to a set of protocols that enable real time communication on the web. The issue is that these protocols may also leak data such as your IP address, even if your DNS is being routed through a secure channel. In order to enhance the security of your browser, it is important to ensure that it refuses WebRTC connections. You may also route your DNS through Tor. In order to confirm that everything is secured, you can carry out DNS leaks and WebRTC leaks tests.

Traffic correlation

This is a technique that refers to traffic patters that can track multiple activities to a single user. It should be noted that it requires a significant investment in time and resources. For instance, when it comes to the Tor network, the request use at least three nodes. To be able to correlate an encrypted request from the Tor entry node, to the same request from a Tor exit node, it would be necessary to check several Tor nodes. While it is a time consuming and challenging process, it could still be done. In fact, the correlation may be easier when different applications are run through the same Tor circuit. Each individual request will keep its own anonymity, but when it is seen in its entirety, the large amount of disparate traffic can help to identify the user. To make this kind of analysis more difficult, you can rely on Tor Stream Isolation.

Browser fingerprinting

Traffic pattern analysis can be used in combination with browser fingerprinting, which is a technique that can be applied to identify specific users. Web browser requests include additional data like the browser that is in use, known as the User Agent, as well as what link you clicked to get at the site. If Javascript is enabled, there is a lot of information about your browser and operating system that can be obtained. Although at first sight this information seems harmless, the truth is that it can be used to match up several browser characteristics to establish that multiple requests come from the same user. In order to find out how vulnerable you are to fingerprinting, you can try Panopticlick, a project run by the EFF (Electronic Frontier Foundation).

Accidental clearnet leakage

When you use a Live CD such as TAILS, your traffic is less likely to be leaked. All the traffic is routed via Tor or ip2, which is difficult to bypass.

Styleometry

This is the technique that aims to identify someone by analyzing writing style and grammar. Anonymity is crucial for journalists, whistleblowers and activists that use the internet, but their written communications or published works can be used to correlate writing styles in order to identify them. Even authors of anonymous writings could be identified using this technique, and secure operating system and other technologies can’t do much to prevent this.