Privacy Badger was created to protect users from pervasive non-consensual tracking, and to do so automatically, without relying on human-edited lists of known trackers. While our goals remain the same, our approach is changing. It is time for Privacy Badger to evolve.
Thanks to disclosures from Google Security Team, we are changing the way Privacy Badger works by default in order to protect you better. Privacy Badger used to learn about trackers as you browsed the Web. Now, we are turning “local learning” off by default, as it may make you more identifiable to websites or other actors. If you wish, you can still choose to opt in to local learning and have the exact same Badger experience as before. Regardless, all users will continue to benefit from Privacy Badger’s up-to-date knowledge of trackers in the wild, as well as its other privacy-preserving features like outgoing link protection and widget replacement.
Google Security Team reached out to us in February with a set of security disclosures related to Privacy Badger’s local learning function. The first was a serious security issue; we removed the relevant feature immediately. The team also alerted us to a class of attacks that were enabled by Privacy Badger’s learning. Essentially, since Privacy Badger adapts its behavior based on the way that sites you visit behave, a dedicated attacker could manipulate the way Privacy Badger acts: what it blocks and what it allows. In theory, this can be used to identify users (a form of fingerprinting) or to extract some kinds of information from the pages they visit. This is similar to the set of vulnerabilities that Safari’s Intelligent Tracking Prevention feature disclosed and patched late last year.
To be clear: the disclosures Google’s team shared with us are purely proof-of-concept, and we have seen no evidence that any Privacy Badger users have had these techniques used against them in the wild. But as a precaution, we have decided to turn off Privacy Badger’s local learning feature by default.
From now on, Privacy Badger will rely solely on its “Badger Sett” pre-trained list of tracking domains to perform blocking by default. Furthermore, Privacy Badger’s tracker database will be refreshed periodically with the latest pre-trained definitions. This means, moving forward, all Privacy Badgers will default to relying on the same learned list of trackers for blocking.
How does Privacy Badger learn?
From the beginning, Privacy Badger has recognized trackers by their sneaky, privacy-invading behavior. Privacy Badger is programmed to look for tracking heuristics—specific actions that indicate someone is trying to identify and track you. Currently, the things Privacy Badger looks for are third-party cookies, HTML5 local storage “supercookies” and canvas fingerprinting. When local learning is enabled, Privacy Badger looks at each site you visit as you browse the Web and asks itself, “Does anything here look like a tracker?” If so, it logs the domain of the tracker and the domain of the website where the tracker was seen. If Privacy Badger sees the same tracker on three different sites, it starts blocking that tracker.
But for some time now, Privacy Badger hasn’t just learned in your browser: it also came preloaded with data about common trackers on the Web. Badger Sett is an automated version of Privacy Badger that we use daily to visit thousands of the most popular sites on the Web. Each new installation of Privacy Badger comes with the list of trackers collected from the latest Badger Sett scan. This way, when you install it for the first time, it immediately starts blocking known trackers.
What were the disclosures?
The first Google Security Team disclosure was a security vulnerability based on a feature we added in July 2019: detection of first-to-third-party cookie sharing (pixel cookie sharing). Because of the way Privacy Badger checked first-party cookie strings against outgoing third-party request URLs, it would have been possible in certain circumstances for an attacker to extract first-party cookie values by issuing thousands of consecutive requests to a set of attacker-controlled third-party domains. We immediately removed the first-to-third-party cookie heuristic from Privacy Badger’s local learning in order to patch the vulnerability. (We have continued using that heuristic for pre-training in Badger Sett, where it does not expose any sensitive information.)
The second set of disclosures described a set of attacks that can be carried out against any kind of heuristic learning blocker. These attacks hinge on an adversary having the ability to force a particular user’s instance of Privacy Badger to identify arbitrary domains as trackers (setting state), as well as the ability to determine which domains a user’s Privacy Badger has learned to block (reading back the state). The disclosures were similar to the ones Google previously reported about Apple’s Intelligent Tracking Protection (ITP) feature.
One attack could go something like this: a Privacy Badger user visits a malicious webpage. The attacker then uses a script to cause the user’s Privacy Badger to learn to block a unique combination of domains like fp-1-example.com and fp-24-example.com. If the attacker can embed code on other websites, they can read back this fingerprint to track the user on those websites.
In some cases, the ability to detect whether a particular domain has been blocked (like a dedicated content server for a particular bank) could reveal whether a user has visited particular sites, even if the attacker doesn’t run code on those sites.
More information on this style of attack can be found in the researchers’ paper. Since Privacy Badger learns in much the same way that Safari’s ITP did, it was vulnerable to the same class of attack.
What is changing?
Since the act of blocking requests is inherently observable by websites (it’s just how the Web works), the best way to prevent this class of attacks is for Privacy Badger to disable local learning by default and use the same block list for all of its users. Websites will always be able to detect whether a given domain was blocked or not during your visit. However, websites should not be able to set Privacy Badger state, nor should they be able to distinguish between individual Privacy Badger users by default.
Before today, every Privacy Badger user would start with a set of known trackers (courtesy of Badger Sett), then continue finding information about new trackers over time. A new installation of Privacy Badger would start with data from the most recent Badger Sett scan before its release, but future updates would not modify the tracker list in any way.
Now, by default, Privacy Badger will no longer learn about new trackers based on your browsing. All users (with the default settings) will use the same tracker-blocking list, generated by Badger Sett. To accomplish this, in future updates to Privacy Badger we plan to replace these users’ tracker lists with new data compiled by Badger Sett. That means users who do not opt in to local learning will share the same block list and will continue receiving information about new trackers we discover, keeping their Badgers up-to-date.
For anyone who opts back in to local learning, Privacy Badger will work exactly as it has in the past. In addition, to improve protection for users who opt back in, we are looking into continually combining their tracker lists with new data from Badger Sett. To opt back in to local learning, visit Privacy Badger’s options page and look for the “Learn to block new trackers from your browsing” checkbox in the Advanced section.
The trackers included in the pre-trained Badger Sett list are compiled using the same techniques Privacy Badger has always used: browsing to real websites, observing the behavior of third-party domains on those sites, and logging the trackers among them. Regardless of how you choose to use Privacy Badger, it will continue to adapt to the state of trackers in the wild.
Why is local learning still an option?
Privacy Badger is meant to be a no-configuration-necessary, mostly install-and-forget kind of tool. We feel comfortable turning off local learning because we believe the majority of Privacy Badger’s protection is already captured by the pre-trained list, and we don’t want to expose users to any potential risk without informed opt-in. But we’re leaving local learning as an option because we think it presents a reasonable tradeoff that users should be able to make for themselves.
The main risk of enabling local learning is that a bad actor can manipulate Privacy Badger’s state in order to create a unique identifier, a kind of Privacy Badger-specific fingerprint. A tracker that does this can then identify the user across sites where the tracker can run JavaScript. Additionally, local learning enables a limited form of history sniffing where the attacker can try to determine whether a Privacy Badger user had previously visited a particular website by seeing how many strikes it takes for Privacy Badger to learn to block a (legitimate) third-party domain that appears only on that website. We see these as serious concerns but not showstoppers to local learning altogether.
There are already many other kinds of information the browser discloses that can be used for fingerprinting. Most common fingerprinters use a combination of techniques, often wrapped up in a single script (such as FingerprintJS). Detecting any one of the techniques in use is enough for Privacy Badger to flag the domain as a fingerprinter. Compared with existing methods available to bad actors, fingerprinting Privacy Badger’s local learning is likely to be less reliable, more resource-intensive, and more visible to users. Going forward, it will only apply to the small subset of Web users who have Privacy Badger installed and local learning enabled. Furthermore, if caught, companies will face reputational damage for exploiting users’ privacy protections.
The risk of history sniffing is also not unique to Privacy Badger. Known history sniffing attacks remain in both Firefox and Chrome. Exploiting Privacy Badger to ascertain bits of users’ history will be limited to Privacy Badger users with local learning enabled, and to websites which use unique third-party domains. This is then further limited by Privacy Badger’s pre-training (did the user visit the domain, or was the domain visited in pre-training?) and Privacy Badger’s list of domains that belong to the same entity (domains on that list will always be seen as first party by Privacy Badger and thus immune to this exploit). Existing browser history sniffing attacks are not bound by these limitations.
Some users might want to opt back in to local learning. The pre-trained list is designed to learn about the trackers present on thousands of the most popular sites on the Web, but it does not capture the “long tail” of tracking on websites that are less popular. If you regularly browse websites overlooked by ad/tracker blocker lists, or if you prefer a more hands-on approach, you may want to visit your Badger’s options page and mark the checkbox for learning to block new trackers from your browsing.
The future
Privacy Badger still comes with all of its existing privacy benefits like outgoing link tracking protections on Google and Facebook and click-to-activate replacements for potentially useful third-party widgets.
In the coming months, we will work on expanding the reach of Badger Sett beyond U.S.-centric websites to capture more trackers in our pre-trained lists. We will keep improving widget replacement, and we will add new tracker detection mechanisms.
In the longer term, we will be looking into privacy-preserving community learning. Community learning would allow users to share the trackers their Badgers learn about locally to improve the tracker list for all Privacy Badger users.
Thanks again to Artur Janc, Krzysztof Kotowicz, Lukas Weichselbaum and Roberto Clapis of Google Security Team for responsibly disclosing these issues.
This article has been updated on October 14th 2020 to clarify what will happen to tracker lists going forward.