Learning more about the GFW's active probing system

This blog post is also available in Chinese, translated by our friends from GreatFire.org.

Roya, David, Nick, nweaver, Vern, and I just finished a research project in which we revisited the Great Firewall of China's (GFW) active probing system. This system was brought to life several years ago to reactively probe and block circumvention proxies, including Tor. You might remember an earlier blog post that gave us some first insight into how the active probing system works. Several questions, however, remained. For example, we were left wondering what the system's physical infrastructure looked like. Is the GFW using dedicated machines behind their thousands of probing IP addresses? Does the GFW even "own" all these IP addresses? Rumour had it that the GFW was hijacking IP addresses for a short period of time, but there was no conclusive proof. As a result, we teamed up and set out to answer these, and other questions.

Because this was a network measurement project, we started by compiling datasets. We created three datasets, comprising hours (a Sybil-like experiment to attract many probes), months (an experiment to measure reachability for clients in China), and even years (log files of a long-established server) worth of active probing data. Together, these datasets allow us to look at the GFW's active probing system from different angles, illuminating aspects we wouldn't be able to observe with just a single dataset. We are able to share two of our datasets, so you are very welcome to reproduce our work, or do your own analysis.

We now want to give you an overview of our most interesting findings.

Generally, once a bridge is detected and blocked by the GFW, it remains blocked. But does this mean that the bridge is entirely unreachable? We measured the blocking effectiveness by continuously making a set of virtual private systems in China connect to a set of bridges under our control. We found that every 25 hours, for a short period of time, our Tor clients in China were able to connect to our bridges. This is illustrated in the diagram shown below. Every point represents one connection attempt, meaning that our client in China was trying to connect to our bridge outside of China. Note the curious periodic availability pattern for both Unicom and CERNET (the two ISPs in China we measured from). Sometimes, network security equipment goes into "fail open" mode while it updates its rule set, but it is not clear if this is happening here.

We were able to find patterns in the TCP headers of active probes that suggest that all these thousands of IP addresses are, in fact, controlled by a single source. Check out the initial sequence number (ISN) pattern in the diagram below. It shows the value of ISNs (y-axis) over time (x-axis). Every point in the graph represents the SYN segment of one active probing connection. If all probing connections would have come from independent computers, we would have expected a random distribution of points. That's because ISNs are typically chosen randomly to protect against off-path attackers. Instead, we see a clear linear pattern across IP addresses. We believe that active probes derive their ISN from the current time.

We discovered that Tor is not the only victim of active probing attacks; the GFW is targeting other circumvention systems, namely SoftEther and GoAgent. This highlights the modular nature of the active probing system. It appears to be easy for GFW engineers to add new probing modules to react to emerging, proxy-based circumvention tools.
The GFW is able to (partially) speak the vanilla Tor protocol, obfs2, and obfs3 to probe bridges. Interestingly, node-Tor—a JavaScript implementation of the Tor protocol—is immune to active probing because it implements the Tor protocol differently, which seems to confuse active probes. We were also able to resist active probes by modifying a bridge of ours to ignore old VERSIONS Tor cells. This is unlikely to be a sustainable circumvention technique, though.
Back in 2012, the system worked in 15-minute-queues. These days, it seems to be able to scan bridges in real-time. On average, it takes only half a second after a bridge connection for an active probe to show up.
Using a number of traceroute experiments, we could show that the GFW's sensor is stateful and seems unable to reassemble TCP streams.

Luckily, we now have several pluggable transports that can defend against active probing. ScrambleSuit and its successor, obfs4, defend against probing attacks by relying on a shared secret that is distributed out of band. Meek tunnels traffic over cloud infrastructure, which does not prevent active probing, but greatly increases collateral damage when blocked. While we keep developing and maintaining circumvention tools, we need to focus more on usability. A powerful and carefully-engineered circumvention tool is of little use if folks find it too hard to use. That's why projects like the UX sprint are so important.

Finally, you can find our research paper as well as our datasets and code on our project page. And don't hesitate to get in touch with us if you have any questions or feedback!

What security problem to TBB

What security problem to TBB I should care about if I get TBB running in the command line:

sh -c '"/home/uuss/tor-browser_en-US/Browser/start-tor-browser" --detach || ([ ! -x "/home/uuss/tor-browser_en-US/Browser/start-tor-browser" ] && "$(dirname "$*")"/Browser/start-tor-browser --detach)' dummy %k

This is a good question for

This is a good question for the Q&A site: https://tor.stackexchange.com/.

So wait, you're wanting to

So wait, you're wanting to run it as root? Don't do that. If the browser gets comprised, which is really not that difficult, it will be able to fuck with your system badly. A non exhaustive list of how root can take over the kernel:
- write to MSRs that allow arbitrary memory access
- load kernel modules
- exploit privileged subsystems like perf
- boot into a malicious kernel with kexec
- replace the kernel image in /boot
- issue iopems and iopls that allow arbitrary write access
- send code to the GPU, which has direct access to memory
- disable many access controls, and bypass permissions
- and much more

Do you REALLY want JavaScript or anything else you come across on the web to run in a context which can do that? Unless you are a masochist, I should think not.

Judging by your understanding of command line though, I should think you would already know that running things like web browsers as root is a bad idea...

Recent Updates

Removing End-Of-Life Relays from the Network

Today, the Tor network is composed of more than 6000 relays.

New alpha release: Tor 0.4.2.2-alpha

There's a new alpha release available for download. If you build Tor from source, you can download the source code for 0.4.2.2-alpha from the download page on the website. Packages should be available over the coming weeks, with a new alpha Tor Browser release likely in the next couple of weeks.

Remember, this is an alpha release: you should only run this if you'd like to find and report more bugs than usual.

This release fixes several bugs from the previous alpha release, and from earlier versions. It also includes a change in authorities, so that they begin to reject the currently unsupported release series.

Changes in version 0.4.2.2-alpha - 2019-10-07

Major features (directory authorities):
- Directory authorities now reject relays running all currently deprecated release series. The currently supported release series are: 0.2.9, 0.3.5, 0.4.0, 0.4.1, and 0.4.2. Closes ticket 31549.
Major bugfixes (embedded Tor):
- Avoid a possible crash when restarting Tor in embedded mode and enabling a different set of publish/subscribe messages. Fixes bug 31898; bugfix on 0.4.1.1-alpha.

New Release: Tor Browser 9.0a7

Tor Browser 9.0a7 is now available from the Tor Browser Alpha download page and also from our

Honoring Translators

Photo: Thai Translation Sprint with Localization Lab