ABSTRACT
This work presents a systematic study of UID smuggling, an emerging tracking technique that is designed to evade browsers' privacy protections. Browsers are increasingly attempting to prevent cross-site tracking by partitioning the storage where trackers store user identifiers (UIDs). UID smuggling allows trackers to synchronize UIDs across sites by inserting UIDs into users' navigation requests. Trackers can thus regain the ability to aggregate users' activities and behaviors across sites, in defiance of browser protections.
In this work, we introduce CrumbCruncher, a system for measuring UID smuggling in the wild by crawling the Web. Crumb-Cruncher provides several improvements over prior work on identifying UIDs and measuring tracking via Web crawling, including in distinguishing UIDs from session IDs, handling dynamic Web content, and synchronizing multiple crawlers. We use CrumbCruncher to measure the frequency of UID smuggling on the Web, and find that UID smuggling is present on more than eight percent of all navigations that we made. Furthermore, we perform an analysis of the entities involved in UID smuggling, and discuss their methods and possible motivations. We discuss how our findings can be used to protect users from UID smuggling, and release both our complete dataset and our measurement pipeline to aid in protection efforts.
Supplemental Material
- Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, and Claudia Diaz. 2014. The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 674--689.Google ScholarDigital Library
- Assel Aliyeva and Manuel Egele. 2021. Oversharing Is Not Caring: How CNAME Cloaking Can Expose Your Session Cookies. In Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIA CCS). 123--134.Google ScholarDigital Library
- aslushnikov. 2018. Intercept target creation: Issue #3667. https://github.com/puppeteer/puppeteer/issues/3667Google Scholar
- berstend. 2018. Target creation event listeners are sometimes not executed early enough: Issue #2669. https://github.com/puppeteer/puppeteer/issues/2669Google Scholar
- Chetna Bindra. 2021. Building a privacy-first future for web advertising. https://blog.google/products/ads-commerce/2021-01-privacy-sandbox/Google Scholar
- Paul E. Black. 2021. Ratcliff/Obershelp pattern recognition. https://www.nist.gov/dads/HTML/ratcliffObershelp.htmlGoogle Scholar
- Bennett Cyphers. 2018. Privacy Badger Rolls Out New Ways to Fight Facebook Tracking. https://www.eff.org/deeplinks/2018/05/privacy-badger-rolls-out-new-ways-fight-facebook-trackingGoogle Scholar
- Bennett Cyphers. 2021. Google's FLoC Is a Terrible Idea. https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-ideaGoogle Scholar
- Ha Dao, Johan Mazel, and Kensuke Fukuda. 2020. Characterizing CNAME cloaking-based tracking on the web. In Proceedings of the IFIP/IEEE Traffic Measurement Analysis Conference (TMA).Google Scholar
- Yana Dimova, Gunes Acar, Lukasz Olejnik, Wouter Joosen, and Tom Van Goethem. 2021. The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS). 394--412.Google ScholarCross Ref
- Peter Eckersley. 2010. How Unique is Your Web Browser?. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS). 1--18.Google ScholarCross Ref
- Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 1388--1401.Google ScholarDigital Library
- Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, and Edward W Felten. 2015. Cookies That Give You Away: The Surveillance Implications of Web Tracking. In Proceedings of the International Conference on World Wide Web (WWW). 289--299.Google ScholarDigital Library
- Fanboy, MonztA, Famlam, and Khrin. 2022. EasyList - Overview. https://easylist.to/Google Scholar
- Imane Fouad, Nataliia Bielova, Arnaud Legout, and Natasa Sarafijanovic-Djukic. 2020. Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS). 499--518.Google ScholarCross Ref
- Electronic Frontier Foundation. 2022. Privacy Badger. https://privacybadger.org/Google Scholar
- Vinay Goel. 2022. Get to know the new Topics API for Privacy Sandbox. https://blog.google/products/chrome/get-know-new-topics-api-privacy-sandbox/Google Scholar
- Peter Hamilton. 2012. Server-to-Server Tracking Basics (Web-Based Affiliate Marketing). https://www.tune.com/blog/server-side-tracking-basics/Google Scholar
- Tim Huang, Johann Hofmann, and Arthur Edelstein. 2022. Firefox 86 Introduces Total Cookie Protection. https://blog.mozilla.org/security/2021/02/23/total-cookie-protectionGoogle Scholar
- IAB. 2022. IAB Tech Lab Content Taxonomy. https://www.iab.com/guidelines/iab-tech-lab-content-taxonomy/Google Scholar
- Apple Inc. 2022. Prevent cross-site tracking in Safari on Mac. https://support.apple.com/guide/safari/prevent-cross-site-tracking-sfri40732/macGoogle Scholar
- Disconnect Inc. 2022. Entity List. https://github.com/mozilla-services/shavar-prod-lists/blob/master/disconnect-entitylist.jsonGoogle Scholar
- Disconnect Inc. 2022. Tracker Protection Lists. https://github.com/disconnectme/disconnect-tracking-protectionGoogle Scholar
- Umar Iqbal, Steven Englehardt, and Zubair Shafiq. 2021. Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 1143--1161.Google ScholarCross Ref
- Brian Johnson, Ivan Efremov, and Peter Snyder. 2021. Ephemeral Third-party Site Storage. https://brave.com/privacy-updates/7-ephemeral-storage/Google Scholar
- Martin Koop, Erik Tews, and Stefan Katzenbeisser. 2020. In-Depth Evaluation of Redirect Tracking and Link Usage.. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS). 394--413.Google ScholarCross Ref
- Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2021. A research-oriented top sites ranking hardened against manipulation - Tranco. https://tranco-list.eu/Google Scholar
- Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Mark Felegyhazi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. 2011. Click Trajectories: End-to-End Analysis of the Spam Value Chain. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 431--446.Google ScholarDigital Library
- Sports Reference LLC. 2022. Sports Reference | Sports Stats, fast, easy, and up-to-date. https://www.sports-reference.com/Google Scholar
- Mozilla. 2022. Enhanced Tracking Protection in Firefox for desktop. https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-desktopGoogle Scholar
- Jared Newman. 2021. The incredibly sneaky way websites sidestep privacy tools to spy on you. https://www.fastcompany.com/90663878/bounce-tracking-privacy-browsers-brave-firefox-safari-edgeGoogle Scholar
- Nick Nikiforakis, Alexandros Kapravelos, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2013. Cookieless Monster: Exploring the Ecosystem of Web-Based Device Fingerprinting. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 541--555.Google ScholarDigital Library
- Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos Markatos. 2019. Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask. In Proceedings of the World Wide Web Conference (WWW). 1432--1442.Google ScholarDigital Library
- Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P. Markatos. 2018. Exclusive: How the (synced) Cookie Monster breached my encrypted VPN session. In Proceedings of the European Workshop on Systems Security (EuroSec). 1--6.Google Scholar
- Tongwei Ren, Alexander Wittman, Lorenzo De Carli, and Drew Davidson. 2021. An Analysis of First-Party Cookie Exfiltration due to CNAME Redirections. In Proceedings of the Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb).Google ScholarCross Ref
- Chromium Git repository. 2022. User Data Directory. https://chromium.googlesource.com/chromium/src.git/+/HEAD/docs/user_data_dir.mdGoogle Scholar
- Sam Schechner, Patience Haggin, and Tripp Mickle. 2022. Google Overhauls Cookie Replacement Plan After Privacy Critiques - WSJ. https://www.wsj.com/articles/google-overhauls-cookie-replacement-plan-after-privacy-critiques-11643115603Google Scholar
- Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In Proceedings of the ACM Internet Measurement Conference (IMC). 478--493.Google ScholarDigital Library
- Justin Schuh. 2020. Building a more private web: A path towards making third party cookies obsolete. https://blog.chromium.org/2020/01/building-more-private-web-path-towards.htmlGoogle Scholar
- Peter Snyder. 2021. Debouncing. https://brave.com/privacy-updates/11-debouncing/Google Scholar
- Peter Snyder and Jeffrey Yasskin. 2022. Navigational-Tracking Mitigations. https://privacycg.github.io/nav-tracking-mitigations/Google Scholar
- Brave Software. 2022. adblock-lists/brave-lists/debounce.json. https://github.com/brave/adblock-lists/blob/1453e599881854f970ab9164a104104ea9ec139f/brave-lists/debounce.jsonGoogle Scholar
- Statista. 2022. Digital advertising spending worldwide from 2021 to 2026. https://www.statista.com/statistics/237974/online-advertising-spending-worldwide/Google Scholar
- Brave Privacy Team. 2022. "Unlinkable Bouncing" for More Protection Against Bounce Tracking. https://brave.com/privacy-updates/16-unlinkable-bouncing/Google Scholar
- Top Draw Team. 2021. Online Advertising Costs In 2021 | Top Draw. https://www.topdraw.com/insights/is-online-advertising-expensive/Google Scholar
- Tobias Urban, Dennis Tatang, Martin Degeling, Thorsten Holz, and Norbert Pohlmann. 2018. The Unwanted Sharing Economy: An Analysis of Cookie Syncing and User Transparency under GDPR. arXiv preprint arXiv:1811.08660.Google Scholar
- Tobias Urban, Dennis Tatang, Martin Degeling, Thorsten Holz, and Norbert Pohlmann. 2020. Measuring the Impact of the GDPR on Data Sharing in Ad Networks. In Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIA CCS). 222--235.Google ScholarDigital Library
- Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. In Proceedings of the Workshop on Measurements, Attacks, and Defenses for the Web (MAD Web). San Diego, CA.Google ScholarCross Ref
- Jane Wakefield. 2022. Google slammed over ad-cookie replacement flip-flop. BBC News (26 Jan. 2022). https://www.bbc.com/news/technology-60138876Google Scholar
- WebKit. 2019. Tracking Prevention Policy. https://webkit.org/tracking-prevention-policy/Google Scholar
- Webshrinker. 2022. IAB Categories. https://docs.webshrinker.com/v3/iab-website-categories.html#iab-categoriesGoogle Scholar
- Webshrinker. 2022. Webshrinker Website. https://www.webshrinker.com/Google Scholar
- David P. Wiggins. 2022. Xvfb---virtual framebuffer X server for X Version 11. https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtmlGoogle Scholar
- John Wilander. 2019. Intelligent Tracking Prevention 2.3. https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/Google Scholar
- John Wilander. 2020. Bounce Tracking Protection • Issue #6 • privacycg/proposals. https://github.com/privacycg/proposals/issues/6Google Scholar
Index Terms
- Measuring UID smuggling in the wild
Recommendations
Identifying queries in the wild, wild web
IIiX '10: Proceedings of the third symposium on Information interaction in contextIdentifying user querying behavior is an important problem for information seeking and retrieval research. Query-related studies typically rely on server-side logs taken from a single search engine, but a comprehensive view of user querying behaviors ...
Identifying smuggling vessels with artificial neural network and logistics regression in criminal intelligence using vessels smuggling case data
ACIIDS'12: Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part IIIn spite of the gradual increase of the academic studies on smuggling crime, they seldom focus on the subject of applying data mining to crime prevention. Artificial Neural Networks and Logistic Regression are used to conduct classification and ...
Measuring the web crawler ethics
WWW '10: Proceedings of the 19th international conference on World wide webWeb crawlers are highly automated and seldom regulated manually. The diversity of crawler activities often leads to ethical problems such as spam and service attacks. In this research, quantitative models are proposed to measure the web crawler ethics ...
Comments