Opened 5 months ago
Last modified 31 hours ago
#32117 new project
Understand and document BridgeDB bot scraping attempts
Reported by: | cohosh | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | |
Component: | Circumvention/BridgeDB | Version: | |
Severity: | Normal | Keywords: | |
Cc: | dcf, phw, cohosh | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description
We are aware of automated attempts to enumerate bridges in BridgeDB, but lack a more rigorous understanding of the problem.
We have detected bot requests from bridgeDB's web interface and deployed some defences by forbidding requests with headers that are commonly associated with bots, and handing out fake bridges to suspected bot requests (#31252), and
We also suspect that these bots are solving our CAPTCHAs more accurately than users (#24607).
After a recent campaign to get more volunteer bridges, we set up an experiment to test the reachability of a subset of these new bridges from a probe site in Beijing and found all new bridges in our sample to be blocked (most were blocked from the very start of the experiment): #31701
This ticket is for documenting bot behaviour and brainstorming ways to detect and analyze the automatic scraping of BridgeDB from censor-owned bots.
We should dig deeper into the analysis over here. In particular, why is the CAPTCHA success rate for users from the U.S. higher for vanilla bridges than for obfs4 bridges?