The Home Office constantly insists that trafffic data is not about the content of the pages you look at, but about the sites you visit.
This would have made some sense in 1999 when RIPA was first being debated, but technology has moved on and new open data sources are now available. This allows for vastly more invasive tracking in 2012 than was envisaged in 2000. We’ve done a little bit of work on how…
The English Wikipedia contains 4 million articles, which contain 18 million links out to other websites.
We’ve run an analysis on those articles and links, and looked at how many of the outbound hostnames uniquely identify the page that you were looking at (a hostname is the www.privacyinternational.org part of a web address). Of the 4 million articles on Wikipedia, 1.3m of them - i.e. one in three - contain a link that is enough to identify the wikipedia page you were looking at, simply because only one page on wikipedia contains a link to that domain.