The Web never forgets: Persistent tracking mechanisms in the wild is the first large-scale study of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies.

Read the paper »

The study is a collaboration between researchers Gunes Acar1, Christian Eubank2, Steven Englehardt2, Marc Juarez1, Arvind Narayanan2, Claudia Diaz1
1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium {gunes.acar, marc.juarez, claudia.diaz}@esat.kuleuven.be
2 Princeton University {cge,ste,arvindn}@cs.princeton.edu

Reference: G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS 2014, Nov. 2014. (Forthcoming)

Canvas Fingerprinting

Different images printed to canvas by various canvas fingerprinting scripts

Background

Canvas fingerprinting is a type of browser or device fingerprinting technique that was first presented by Mowery and Shacham in 2012. The authors found that by using the Canvas API of modern browsers, one can exploit the subtle differences in the rendering of the same text to extract a consistent fingerprint that can easily be obtained in a fraction of a second without user's awareness.

Different images printed to canvas by various canvas fingerprinting scripts

Results

By crawling the homepages of the top 100,000 sites we found that more than 5.5% of the crawled sites include canvas fingerprinting scripts. Although the overwhelming majority (95%) of the scripts belong to a single provider (addthis.com), we discovered a total of 20 canvas fingerprinting provider domains, active on 5542 of the top 100,000 sites.

On the right, collage of the images printed to canvas by various fingerprinting scripts discovered during the study. The images are intercepted using a modified browser (by instrumenting the ToDataURL method). Some blank space was cropped from images to save space.


Canvas Fingerprinting Scripts

The below table shows the summary of canvas fingerprinting scripts found on the homepages of top 100K Alexa sites.

Full list of sites using Canvas Fingerprinting »

Fingerprinting script Number of 
including sites
Text drawn into the canvas
ct1.addthis.com/static/r07/core130.js (and 17 others) 5282 Cwm fjordbank glyphs vext quiz
i.ligatus.com/script/fingerprint.min.js 115 http://valve.github.io
src.kitcode.net/fp2.js 68 http://valve.github.io
admicro1.vcmedia.vn/fingerprint/figp.js 31 http://admicro.vn/
amazonaws.com/af-bdaz/bquery.js 26 Centillion
*.shorte.st/js/packed/smeadvert-intermediate-ad.js 14 http://valve.github.io
stat.ringier.cz/js/fingerprint.min.js 4 http://valve.github.io
cya2.net/js/STAT/89946.js 3 ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/
images.revtrax.com/RevTrax/js/fp/fp.min.jsp 3 http://valve.github.io
pof.com 2 http://www.plentyoffish.com
*.rackcdn.com/mongoose.fp.js 2 http://api.gonorthleads.com
9 others* 9 (Various)
TOTAL 5559
(5542 unique1)

*: Some URLs are truncated or omitted for brevity.
1: Some sites include canvas fingerprinting scripts from more than one domain.

Evercookies & Respawning

Steps of respawning

Background

Evercookies are designed to overcome the "shortcomings" of the traditional tracking mechanisms. By utilizing multiple storage vectors that are less transparent to users and may be more difficult to clear, evercookies provide an extremely resilient tracking mechanism, and have been found to be used by many popular sites to circumvent deliberate user actions1, 2, 3.

Results

We detected respawning by Flash cookies on 10 of the 200 most popular sites and found 33 different Flash cookies were used to respawn over 175 HTTP cookies on 107 of the top 10,000 sites. The below table shows the 10 top-ranked websites found to include respawning based on Flash cookies.
Country: The country where the website is based.
3rd*: The domains that are different from the first-party but registered for the same company in the WHOIS database.

Global rankSiteCountryRespawning (Flash) domainFlash cookie name1st/3rd Party
16 sina.com.cn China simg.sinajs.cn stonecc_suppercookie.sol 3rd*
17 yandex.ru Russia kiks.yandex.ru fuid01.sol 1st
27 weibo.com China simg.sinajs.cn stonecc_suppercookie.sol 3rd*
41 hao123.com China ar.hao123.com $hao123$.sol 1st
52 sohu.com China tv.sohu.com vmsuser.sol 1st
64 ifeng.com Hong Kong y3.ifengimg.com www.ifeng.com.sol 3rd*
69 youku.com China irs01.net mt_adtracker.sol 3rd
178 56.com China irs01.net mt_adtracker.sol 3rd
196 letv.com China irs01.net mt_adtracker.sol 3rd
197 tudou.com China irs01.net mt_adtracker.sol 3rd

Cookie Syncing

Background

Cookie synchronization or cookie syncing is the practice of tracker domains passing pseudonymous IDs associated with a given user, typically stored in cookies, amongst each other.

Results

The below table shows the number of IDs known by the top 10 parties involved in cookie sync under both the policy of allowing all cookies and blocking third-party cookies.

All Cookies Allowed No 3P Cookies
Domain # IDs Domain # IDs
gemius.pl 33 gemius.pl 36
doubleclick.net 32 2o7.net 27
2o7.net 27 omtrdc.net 27
rubiconproject.com 25 cbsi.com 26
omtrdc.net 24 parsely.com 16
cbsi.com 24 marinsm.com 14
adnxs.com 22 gravity.com 14
openx.net 19 cxense.com 13
cloudfront.net 18 cloudfront.net 10
rlcdn.com 17 doubleclick.net 10

The table presents the comparison of high-level cookie syncing statistics when allowing and disallowing third-party cookies (top 3,000 Alexa domains).

Statistic Third party cookie policy
Allow Block
# IDs 1308 938
# ID cookies 1482 953
# IDs in sync 435 347
# ID cookies in sync 596 353
# (First*) Parties in sync (407) 730 (321) 450
# IDs known per party 1 / 2.0 / 1 / 33 1 / 1.8 / 1 / 36
# Parties knowing an ID 2 / 3.4 / 2 / 43 2 / 2.3 / 2 / 22

The format of the bottom two rows is minimum/mean/median/maximum.
*Here we define a firstparty as a site which was visited in the first-party context at any point in the crawl.