How the commercial mass surveillance companies collect your data and map your life.

Commercial mass surveillance: The technology behind the data collection

The tech giants follow every step you take regardless of whether or not you use their services. But how does it actually work when they steal your behavior and place it in huge AI and machine learning systems to build a profile of you? Here are the methods behind the surveillance.

What techniques do the tech giants like Meta and Google use to collect data on essentially all of the world’s internet users? Before we answer that question, we need to make couple of observations. 1) If you use the tech giants’ services, it equates to voluntarily giving your data away. For example, if you use Facebook, Meta collects your activity there. If you use Chrome, Google registers every step you take in the web browser. And no, incognito mode doesn’t save you. 2) You don’t even need to use the tech giants’ services for them to keep track of how you behave online. They reach far beyond their own user base when they collect data. Here you can read more about the extent to which they track you on sites you visit and how much data they have on the searches you do.

Now let’s take a look at how the data is collected. And it’s point 2 we’ll be focusing on. Because this type of mass surveillance takes place without people being conscious of it, and without them having given their consent to it.

We’ll go through the technologies used to check that it’s you visiting a certain site or doing a particular search. These tools are essential for the tech giants to collect data. They have to keep track of the fact that it’s you and nobody else who comes to a particular site, they have to be certain it was you that did that last Google search to add it to the right pile. Identification is the key to being able to build a profile of you. Once they know it’s you out there browsing, they start up the heavy machinery: everything you do goes into huge AI and machine learning systems that register, categorize and analyze your behavior. So they can predict what you will do next, so they can try to influence you in a particular direction for commercial or political gain. Let’s start with the most commonly used identification technique: your IP address.

Your IP address: the most common and simplest way of identifying you.

Everyone who has internet access has been allocated an IP address by their internet provider. This is part of the internet’s basic structure. Every website you visit also has an IP address, and it’s the IP addresses that make sure the traffic goes to the right place when it’s sent back and forth. This is good (you want the internet to work), but it also means we each have a digital ID card that the internet service providers can use to register all the sites you visit. They are forced to carry out this logging by law in many countries. The idea is that it should be possible to reveal details about internet traffic and information about who is behind a particular IP address in case an authority asks for it (for example if the police require it during an investigation). But it doesn’t stop there. Depending on what country you’re in, it’s more or less likely that in practice the internet service providers give the authorities continuous access to traffic regardless of whether or not a crime has been committed. Or even sell your online behavior to make money.

There are also other reasons for concealing your IP address (via a VPN) because IP addresses are used in several other contexts to identify, track and map your activity. The state uses IP addresses when it eavesdrops on all of our traffic by quite simply connecting to the large internet cables that physically run between countries. And of course there are always the tech companies that use IP addresses when they carry out mass surveillance for commercial purposes.

When tech giants and data brokers employ different techniques to pursue you from one site to another and map your movement patterns on the internet, one of the things they use to identify you is your IP address. The same thing applies when they study in detail what you do on each site (which texts you read, which images you stop at, which purchases you make, which products you quickly skim past, which videos you watch and so on). IP addresses are used to link the activity and the person.

We can’t be sufficiently clear here: Your IP address equates to sticking up your hand and shouting ‘Here I am’. It’s the easiest way to track you on the internet. And the only way to conceal your IP address, and to discard your digital ID card, is to use a trustworthy VPN (or the Tor Network). This is the reason why Mullvad was started once upon a time (in 2009, to be precise).

Third-party cookies: tracking that you accept (because you actually have no choice).

Just like IP addresses, cookies have long been part of how the internet is constructed. Cookies are on websites so the site can remember things about you – and in fact so that the site works at all. For example: you visit an e-retailer and add a product to your shopping cart. A cookie remembers the product is there when you click to go to the checkout. It’s thanks to a cookie that you can stay logged into a site over time. When you choose a language on a website it’s the same thing; tiny text files (which is what cookies are) are saved locally on your computer or phone and make sure the same language is used next time you visit. Cookies make the internet a comfortable place to visit. So why is there such a fuss about cookies? Well, because there are different types of cookies.

There are cookies placed on the site by whoever owns it, so that the website is user-friendly. This type of cookie is known as a first-party cookie. They’re there to give functionality to the visitor. But then there are cookies that are placed on the site for another purpose: to register your visit for somebody other than the site owner. These are called third-party cookies and they’re often linked to the tech giants such as Meta and Google (or to data brokers). And because these third-party cookies are placed on the majority of websites in the world, this type of cookie makes it possible for them to monitor your movement patterns. When you hop from a news site to an e-commerce site to a streaming service, the tech giants are there every time with their cookies. And that’s all they need to be able to build a single huge list of the sites you visit, and then, with the help of AI and machine learning, to build a profile of your online behavior. This type of cookie is why ads stalk you online. This type of cookie is what maps your life.

You can say No to cookies, but sometimes that doesn't even help. There are what are called 'essential cookies' that work even if you click 'Reject all'. These include cookies from the tech giants.

You can say No to cookies. Everybody who’s ever been online knows that you have to click Accept, Manage or Reject cookies the first time you visit a site. The problem is that the infrastructure is constructed in such a way that you actually don’t have a choice. There’s widespread cookie fatigue that means we routinely click Accept to move on. Nobody can be bothered to read the almost endless user terms and conditions that appear when you click Manage cookies. And the cookie warnings are also designed for you to press Accept. The concept of dark patterns means that Accept is often a large, bold green button and that Manage cookies and Reject cookies are more or less hidden or incredibly complicated to use.

Still worse, even if you click Reject cookies, you can’t be sure your visit won’t be registered by a third party. There are cookies that are ‘necessary’. You’ve undoubtedly seen the choice Accept only essential cookies.You may think ‘essential cookies’ are the same thing as functional cookies, but that’s not true. If you click through and start to read the apparently endless terms and conditions, you often find big tech companies listed under ‘essential cookies’. And in the small print, you can also see that this type of cookies can often kick in even if you choose Reject all cookies. Because the site owner has an essential collaboration with the tech giants that you don’t even have the option to reject. And here’s one more detail before we move on: if a website only uses functional cookies, the ones the website needs to work as it’s supposed to work, you don’t even need to provide a cookie warning. And so you don’t even need to have the visitor click Accept. That’s why you don’t have to go through that process when you visit Mullvad’s site.

So what can you do to prevent third-party cookies from following you wherever you go? The easiest thing is to run a web browser like Mullvad Browser, which handles that and many other things for you (cookies and IP addresses are, as you’ll see if you read on, not the only way to track you). But otherwise, all you can do is be persistent and clear out your cookies (and cache) every time you’ve used your web browser. You can also use many different plug-ins and extensions that block third-party cookies.

Third-party cookies have become the very symbol of how big tech and data brokers map a whole world of internet users. And the focus on this particular type of data collection has led to Google being sued for hundreds of millions of euros for violating the GDPR. Now Google has finally begun to look for an alternative. In early 2024, they rolled out their new tracking system (for one percent of Chrome’s users) that isn’t based on third-party cookies but on data collection via the Chrome browser. Google has been working on this for several years and has postponed the rollout over and over again. When they launched the new solution for one percent of their users, they talked about completing their new ecosystem by the end of 2024. Just a few months later, they have admitted that this timeline won’t be accurate either. It’s doubtful whether Google will ever be able to fully launch this, but even if they replaced all third-hand data with first-hand data (since they own the world’s most widely used browser, Chrome), the fundamental problem remains.

Because the data collection is the problem, not exactly how it’s done. The fact that the world’s largest data collector removes third-party cookies from the world’s most used browser and replaces them with a system where only they (once again, the world’s largest data collector) have access to the data collection – well, it’s not exactly leading to a world where people can enjoy their privacy in peace. Google can continue to feed its huge advertising network and to make money from internet behaviors. But this is certainly a move that could potentially shake up the infrastructure that\‘s all about monetizing personal data. There may be some other players who don’t feel like they want to agree to Google’s new business model, where Google has all the power. Perhaps this will lead to more actors wanting to own their own data. Perhaps it will lead us to replacing cookie-hell (where we’re forced to accept cookies on every website we visit) with a login-hell instead (where all services require a login and thus gain access to first-hand data that they then share with each other). It’s worth considering why a site – if it isn’t a paid for service, a subscription, and thus requires a login – would suddenly start asking its users to log in. If they aren’t charging you, it’s obvious that you and your behavior on the site are the payment. Even subscription services are worth looking at more closely. If you pay for a service, why should you agree to pay extra in the form of your behavioral data being collected and sold on or shared with so-called ‘business partners’?

Regardless of the direction in which the development is moving, it’s worth remembering that third-party cookies aren’t the only way for big tech and data brokers to collect data. A major problem with today’s data collection is that it isn’t enough to mask your IP address and make sure you block cookies. It makes no difference if third-party cookies disappear unless the business model on which the internet is now based is fundamentally rebuilt from the ground up. As long as the collection of behavioral data is permitted, as long as it isn’t illegal for companies to collect data about people and to share it with others, no change will take place – the only thing that will change is how the data is collected.

Because even if you mask your IP address and make sure you block or clear all of your cookies from time to time, there are other ways to track you via your web browser. Even if third-party cookies are banned, this is just one of many tracking technologies. When cookies disappear as a tracking method, it’s not unthinkable that what’s known as browser fingerprinting will take over.

What makes fingerprinting a threat to online privacy? It is pretty simple. There is no need to ask for permissions to collect all this information.

The Tor Project

Browser fingerprinting: tracking technology that works in the shadows.

When you visit a website, the site uses technology to ask a number of questions of your web browser: this could be the version of web browser you’re using, whether you’re visiting on mobile or desktop, which language you have set, the time zone you’re in, the different plug-ins and fonts you have installed, your screen resolution and so on. Many of the questions are also about your hardware: for example how fast your processor is and what graphics card you have installed. These are questions asked to allow the web browser to present the site in the best possible way. Just like cookies, this is part of the basic fabric of the internet that allows it to be as user-friendly as it is. But the problem is that questions are also asked that have nothing to do with functionality, but which are only there to identify and track you. The number of questions asked and the combination of answers makes it possible to take a unique fingerprint of you as a visitor. You can read more about how browser fingerprinting works here. Let’s conclude by saying that in a time where third-party cookies are under legal pressure, browser fingerprinting plays by completely different rules. It’s quite simply technology that you can’t dismiss by clicking Reject all. Because the tracking takes place completely in the shadows. And when the world begins to set restrictions on how the tech giants monitor people via cookies and IP addresses, it’s not a wild guess to expect them to use fingerprinting to an even greater extent in the future. You can read about how Mullvad Browser counteracts fingerprinting here.

Surveillance via third-party scripts: how they keep track of exactly what you do online.

Most websites use scripts (tiny fragments of JavaScript code) to work. These scripts mean that the websites work very well, but they can also be used to monitor visitors. Just like third-party cookies, this is a major problem when somebody other than the site owner is involved. If a website uses Google Analytics, there’s a script on the site from Google. If a site uses a special font, there’s a script from the font developer. If the site you visit uses Meta Pixel to maximize its ad revenues via Facebook, Meta has placed a script there. And when there are external scripts on the site, that’s when these actors can work out exactly what you’re doing.

A cookie can only identify you when you visit a site. If a cookie from the same third-party actor turns up on the next site you visit, they can start to follow you online and build a profile of how you move. The same is true with the IP address. It’s a unique ID card to make sure it’s you on the site. When it comes to scripts, things are a little different. They can be used to construct a browser fingerprint of you and so identify you. But above all, they can be used to take a closer look at exactly what you’re doing on the site. Scripts can find out exactly which minutes of the video you watch (and not just that you’re visiting YouTube again). Scripts can read how you scroll on a site, which ads you stop at and whether you’ve read the whole article or moved on after just half of it. It was scripts that Facebook used to collect what people had written in comment fields but then deleted and never posted. Just collecting metadata – in other words the data that, together, build a profile of how you move online – is enough to map someone’s life. But scripts add an extra layer. You can read more about how much the tech giants record using this technology here.

As we mentioned above, you can block third-party scripts, and Mullvad Browser has technology to do just that. But it’s important to remember that if a data collector succeeds in recording exactly what you’re doing on a site via scripts, they still need to identify that it’s you visiting for it to have any effect. If you mask your IP address using a trustworthy VPN and use a web browser that makes sure it’s hard to identify you via cookies and fingerprints, it doesn’t matter how accurately they can measure which parts of a YouTube video you most enjoyed – they still don’t know that it’s you.

Sophisticated AI technology poses new threats

Using a trustworthy VPN and a privacy-focused browser is an easy way to counteract the data collection that takes place through the methods we have mentioned in this text. But you should remember that things are developing quickly and that those interested in mass surveillance are constantly working on new technologies. One growing threat is traffic analysis.

When you visit a website, network packets are exchanged. These data packets are sent back and forth between you and the websites you visit. This is how the internet is fundamentally constructed. And the fact that the packet is sent, how often they are sent, and the actual size of those packets – all this is something that’s visible to your ISP, whether or not you’re using a VPN (or Tor).

Every website generates a specific pattern of data packets that are sent back and forth (depending on how the site is constructed with images, text blocks, and videos), which means that your internet service provider (or anyone who has access to your internet service provider) can look at this pattern of data packets and try to analyze it to work out what websites you visit, but also to find out who you are communicating with by using what’s known as a correlation attack (you sending a message with a special pattern at a given time and someone receiving that particular traffic pattern at the same time).

These are advanced attacks, but given the speed with which artificial technology is evolving, and its ability to analyze large amounts of data, it’s a growing threat.

In response, Mullvad has developed DAITA (Defense against AI-guided Traffic Analysis), which, as the name implies, is a defense against this type of traffic analysis using AI. We have worked together with researchers at Karlstad University to develop a technology that can be turned on in our VPN app and which makes sure the data packets sent are always the same size – and also sends out fake packets.

We have also worked with researchers to develop VPN tunnels that can withstand the quantum computers of the future, which potentially could be able to crack encryption. We don’t know how this type of technology will be used for mass surveillance of entire populations in the future, and so we must work on countermeasures today.

Okay, so the technology exists. But what do they actually use it for? Read about a business model that involves carrying out mass surveillance of every human on the planet.

How much do they actually know about you using this technology? Take a look at the absurd quantity of data the tech giants collect.

Already tired of the tech giants following every step you take? Read more about how a credible VPN + Mullvad Browser stop them at the door.

The tech companies say metadata is anonymous. We say your phone only needs to reveal four places you’ve been to identify you.