Jump to content

CAPTCHA: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Replaced error.
No edit summary
 
(43 intermediate revisions by 30 users not shown)
Line 1: Line 1:
{{Short description|Challenge–response test to determine whether a user is human}}
{{Short description|Test to determine whether a user is human}}
{{Pp-vandalism|small=yes}}
{{Cleanup rewrite|date=November 2022|It feels like an essay criticising CAPTCHA|section=no}}
{{Use dmy dates|date=October 2022}}
{{Use dmy dates|date=October 2022}}
[[File:captcha.jpg|upright=1.35|thumb|This CAPTCHA ([[reCAPTCHA v1]]) of "smwm" obscures its message from computer interpretation by twisting the letters and adding a slight background color gradient.]]
{{pp-vandalism|small=yes}}
[[File:captcha.jpg|upright=1.35|thumb|This CAPTCHA (Version 1) of "smwm" obscures its message from computer interpretation by twisting the letters and adding a slight background color gradient.]]
{{Multiple issues|{{Cleanup rewrite|date=November 2022|it feels like an essay criticising CAPTCHA|section=no}}
{{Split|date=November 2022|Bypassing of CAPTCHA|Accessibility of CAPTCHA|CAPTCHA}}}}
A '''CAPTCHA''' ({{IPAc-en|ˈ|k|æ|p|.|tʃ|ə}} {{respell|KAP|chə}}, a [[contrived acronym]] for "Completely Automated Public [[Turing test]] to tell Computers and Humans Apart"<ref>{{Cite web |title=What is CAPTCHA? |url=https://support.google.com/a/answer/1217728 |website=Google Support |publisher=[[Google]] |access-date=2022-09-09 |quote=CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a [...]}}</ref>) is a type of [[challenge–response authentication|challenge–response]] test used in [[computing]] to determine whether the user is human.<ref>{{Cite web|title=The reCAPTCHA Project – Carnegie Mellon University CyLab|url=https://www.cylab.cmu.edu/partners/success-stories/recaptcha.html|url-status=dead|archive-url=https://web.archive.org/web/20171027203659/https://www.cylab.cmu.edu/partners/success-stories/recaptcha.html|archive-date=2017-10-27|access-date=2017-01-13|website=www.cylab.cmu.edu}}</ref>


A '''CAPTCHA''' ({{IPAc-en|ˈ|k|æ|p|.|tʃ|ə}} {{respell|KAP|chə}}) is a type of [[challenge–response authentication|challenge–response]] test used in [[computing]] to determine whether the user is human in order to deter bot attacks and spam.<ref>{{Cite web|title=The reCAPTCHA Project – Carnegie Mellon University CyLab|url=https://www.cylab.cmu.edu/partners/success-stories/recaptcha.html|url-status=dead|archive-url=https://web.archive.org/web/20171027203659/https://www.cylab.cmu.edu/partners/success-stories/recaptcha.html|archive-date=2017-10-27|access-date=2017-01-13|website=www.cylab.cmu.edu}}</ref>
The term was coined in 2003 by [[Luis von Ahn]], [[Manuel Blum]], Nicholas J. Hopper, and [[John Langford (computer scientist)|John Langford]].<ref name="abhl" /> The most common type of CAPTCHA (displayed as Version 1.0) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires entering a sequence of letters or numbers in a distorted image. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a [[reverse Turing test]].<ref>{{cite web|url=http://isyou.info/jowua/papers/jowua-v4n3-3.pdf|title=Reverse Turing Test using Touchscreens and CAPTCHA∗|author1=Mayumi Takaya|author2=Yusuke Tsuruta2|author3=Akihiro Yamamura1|publisher=Akita University}}</ref>This test has received many criticisms, from people with disabilities, but also many websites use it to prevent bot spamming and raiding, and it works effectively, and its usage is widespread. Most websites use hCaptcha or reCAPTCHA.<ref>{{Cite web |title=Websites using hCaptcha |url=https://trends.builtwith.com/websitelist/hCaptcha |access-date=2022-11-10 |website=trends.builtwith.com}}</ref><ref>{{Cite web |last=Sulgrove |first=Jonathan |date=2022-07-07 |title=reCAPTCHA: What It Is and Why You Should Use It on Your Website - TSTS |url=https://www.tsts.com/blog/recaptcha-what-it-is-and-why-you-should-use-it-on-your-website/ |access-date=2022-11-10 |website=Twin State Technical Services |language=en-US}}</ref> It takes the average person approximately 10 seconds to solve a typical CAPTCHA.<ref>{{cite journal|last1=Bursztein|first1=Elie|last2=Bethard|first2=Steven|last3=Fabry|first3=Celine|last4=Mitchell|first4=John C.|last5=Jurafsky|first5=Dan|access-date=March 30, 2018|url=https://web.stanford.edu/~jurafsky/burszstein_2010_captcha.pdf|year=2010|title=How Good are Humans at Solving CAPTCHAs? A Large Scale Evaluation|journal=Proceedings of the 2010 IEEE Symposium on Security and Privacy|pages=399–413|doi=10.1109/SP.2010.31|citeseerx=10.1.1.164.7848|isbn=978-1-4244-6894-2|s2cid=14204454}}</ref>

The term was coined in 2003 by [[Luis von Ahn]], [[Manuel Blum]], Nicholas J. Hopper, and [[John Langford (computer scientist)|John Langford]].<ref name="abhl" /> It is a [[contrived acronym]] for "Completely Automated Public [[Turing test]] to tell Computers and Humans Apart."<ref>{{Cite web |title=What is CAPTCHA? |url=https://support.google.com/a/answer/1217728 |url-status=live |archive-url=https://web.archive.org/web/20200806173938/https://support.google.com/a/answer/1217728 |archive-date=6 August 2020 |access-date=2022-09-09 |website=Google Support |publisher=Google Inc. |quote=CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a [...]}}</ref> A historically common type of CAPTCHA (displayed as [[reCAPTCHA v1]]) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires entering a sequence of letters or numbers in a distorted image. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, CAPTCHAs are sometimes described as [[reverse Turing test|reverse Turing tests]].<ref>{{Cite journal |last=Mayumi Takaya |last2=Yusuke Tsuruta |last3=Akihiro Yamamura |date=2013-09-30 |title=Reverse Turing Test using Touchscreens and CAPTCHA |url=http://isyou.info/jowua/papers/jowua-v4n3-3.pdf |journal=Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications |volume=4 |issue=3 |pages=41–57 |doi=10.22667/JOWUA.2013.09.31.041|archive-date=22 August 2017|archive-url=https://web.archive.org/web/20170822001858/http://isyou.info/jowua/papers/jowua-v4n3-3.pdf|url-status=live}}</ref>

Two widely used CAPTCHA services are [[Google]]'s [[reCAPTCHA]]<ref>{{Cite web |title=What is reCAPTCHA? –?reCAPTCHA Help |url=https://support.google.com/recaptcha/answer/6080904?hl=en |access-date=2023-07-20 |website=support.google.com |archive-date=20 July 2023 |archive-url=https://web.archive.org/web/20230720192427/https://support.google.com/recaptcha/answer/6080904?hl=en |url-status=live }}</ref><ref>{{Cite web |last=Sulgrove |first=Jonathan |date=2022-07-07 |title=reCAPTCHA: What It Is and Why You Should Use It on Your Website – TSTS |url=https://www.tsts.com/blog/recaptcha-what-it-is-and-why-you-should-use-it-on-your-website/ |access-date=2022-11-10 |website=Twin State Technical Services |language=en-US |archive-date=10 November 2022 |archive-url=https://web.archive.org/web/20221110020410/https://www.tsts.com/blog/recaptcha-what-it-is-and-why-you-should-use-it-on-your-website/ |url-status=live }}</ref> and the independent hCaptcha.<ref>{{Cite web |title=Websites using hCaptcha |url=https://trends.builtwith.com/websitelist/hCaptcha |access-date=2022-11-10 |website=trends.builtwith.com |archive-date=10 November 2022 |archive-url=https://web.archive.org/web/20221110020408/https://trends.builtwith.com/websitelist/hCaptcha |url-status=live }}</ref><ref>{{Cite web |title=hCaptcha – About Us |url=https://www.hcaptcha.com/about |access-date=2023-07-20 |website=www.hcaptcha.com |language=en |archive-date=20 July 2023 |archive-url=https://web.archive.org/web/20230720192429/https://www.hcaptcha.com/about |url-status=live }}</ref> It takes the average person approximately 10 seconds to solve a typical CAPTCHA.<ref>{{cite book|last1=Bursztein|first1=Elie|last2=Bethard|first2=Steven|last3=Fabry|first3=Celine|last4=Mitchell|first4=John C.|last5=Jurafsky|first5=Dan|title=2010 IEEE Symposium on Security and Privacy |chapter=How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation |access-date=March 30, 2018|chapter-url=https://web.stanford.edu/~jurafsky/burszstein_2010_captcha.pdf|year=2010|pages=399–413|doi=10.1109/SP.2010.31|citeseerx=10.1.1.164.7848|isbn=978-1-4244-6894-2|s2cid=14204454|archive-date=8 August 2018|archive-url=https://web.archive.org/web/20180808033552/https://web.stanford.edu/~jurafsky/burszstein_2010_captcha.pdf|url-status=live}}</ref>

== Purpose ==
CAPTCHAs' purpose is to prevent spam on websites, such as promotion spam, registration spam, and data scraping, and bots are less likely to abuse websites with spamming if those websites use CAPTCHA. Many websites use CAPTCHA effectively to prevent bot raiding. CAPTCHAs are designed so that humans can complete them, while most robots cannot.<ref>{{Cite web |last=Stec |first=Albert |date=2022-06-12 |title=What is CAPTCHA and How Does It Work? |url=https://www.baeldung.com/cs/captcha-intro |url-status=live |archive-url=https://web.archive.org/web/20221101005730/https://www.baeldung.com/cs/captcha-intro |archive-date=1 November 2022 |access-date=2022-11-01 |website=Baeldung on Computer Science |language=en-US}}</ref> Newer CAPTCHAs look at the user's behaviour on the internet, to prove that they are a human.<ref>{{Cite web |date=November 1, 2022 |title=What is a CAPTCHA? |url=https://www.cloudflare.com/learning/bots/how-captchas-work/ |url-status=live |archive-url=https://web.archive.org/web/20221027061629/https://www.cloudflare.com/learning/bots/how-captchas-work/ |archive-date=27 October 2022 |access-date=November 1, 2022 |website=Cloudflare}}</ref> A normal CAPTCHA test only appears if the user acts like a bot, such as when they request webpages, or click links too fast.


== History ==
== History ==
Since the early days of the [[Internet]]{{Until when|date=October 2022}}, users have wanted to make text illegible to computers.<ref name=":1" /> The first such people were [[Hacker culture|hackers]], posting about sensitive topics to [[Internet forum]]s they thought were being automatically monitored on keywords. To circumvent such filters, they replaced a word with look-alike characters. ''HELLO'' could become {{nowrap|<code>{{!}}-{{!}}3{{!}}_{{!}}_()</code>}} or {{nowrap|<code>)-(3££0</code>}}, as well as numerous other variants, such that a filter could not detect ''all'' of them. This later became known as [[leet]]speak.<ref>{{Cite web|title=h2g2 – An Explanation of l33t Speak – Edited Entry|url=http://www.bbc.co.uk/dna/h2g2/A787917|access-date=2015-06-03|website=h2g2}}</ref>
Since the 1980s–1990s, users have wanted to make text illegible to computers.<ref name=":1" /> The first such people were [[Hacker culture|hackers]], posting about sensitive topics to [[Internet forum]]s they thought were being automatically monitored on keywords. To circumvent such filters, they replaced a word with look-alike characters. ''HELLO'' could become {{nowrap|<code>{{!}}-{{!}}3{{!}}_{{!}}_()</code>}} or {{nowrap|<code>)-(3££0</code>}}, and others, such that a filter could not detect ''all'' of them. This later became known as [[leet]]speak.<ref>{{Cite web|title=h2g2 – An Explanation of l33t Speak – Edited Entry|url=http://www.bbc.co.uk/dna/h2g2/A787917|access-date=2015-06-03|website=h2g2|date=16 August 2002 |archive-date=6 September 2011|archive-url=https://web.archive.org/web/20110906114613/http://www.bbc.co.uk/dna/h2g2/A787917|url-status=live}}</ref>


One of the earliest commercial uses of CAPTCHAs was in the '''Gausebeck–Levchin test'''<!-- per [[WP:MOSBOLD]] since the redirect points to this article -->. In 2000, idrive.com began to protect its signup page<ref>{{Cite web|title=idrive turing signup page|url=https://drive.google.com/open?id=0BzbOLm20p6CrUE1SSXp5Zjl2MW8|access-date=2017-05-19|website=Google Drive}}</ref> with a CAPTCHA and prepared to file a patent.<ref name=":1">{{Cite web|url=https://drive.google.com/open?id=0BzbOLm20p6CrOS1mWEhITGJ4d2s|title=idrive turing patent application|access-date=2017-05-19}}</ref> In 2001, [[PayPal]] used such tests as part of a fraud prevention strategy in which they asked humans to "retype distorted text that programs have difficulty recognizing."<ref name=stringham2015>{{cite book |last1=Stringham|first1=Edward P |title=Private Governance : creating order in economic and social life |publisher=[[Oxford University Press]] |year=2015 |page=105 |isbn=978-0-19-936516-6 |oclc=5881934034 }}</ref> PayPal cofounder and CTO [[Max Levchin]] helped commercialize this use.
One of the earliest commercial uses of CAPTCHAs was in the Gausebeck–Levchin test. In 2000, idrive.com began to protect its signup page<ref>{{Cite web|title=idrive turing signup page|url=https://drive.google.com/open?id=0BzbOLm20p6CrUE1SSXp5Zjl2MW8|access-date=2017-05-19|website=Google Drive|archive-date=15 March 2023|archive-url=https://web.archive.org/web/20230315233241/https://accounts.google.com/v3/signin/identifier?dsh=S-569764738%3A1678923161301090&continue=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0BzbOLm20p6CrUE1SSXp5Zjl2MW8&followup=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0BzbOLm20p6CrUE1SSXp5Zjl2MW8&ifkv=AWnogHfb-QQLSi-KGh4vgzje6iZGJ1BZZvpaKSlXZLsXVSfSHlafPjo8v6B9qJTV2nuxzahDQYGTtw&osid=1&passive=1209600&service=wise&flowName=GlifWebSignIn&flowEntry=ServiceLogin|url-status=live}}</ref> with a CAPTCHA and prepared to file a patent.<ref name=":1">{{Cite web|url=https://drive.google.com/open?id=0BzbOLm20p6CrOS1mWEhITGJ4d2s|title=idrive turing patent application|access-date=2017-05-19|archive-date=15 March 2023|archive-url=https://web.archive.org/web/20230315233244/https://accounts.google.com/v3/signin/identifier?dsh=S956306740%3A1678923164278227&continue=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0BzbOLm20p6CrOS1mWEhITGJ4d2s&followup=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0BzbOLm20p6CrOS1mWEhITGJ4d2s&ifkv=AWnogHfWh9qH38C8IGelcYVq9WSJcqP6Q30eP1Bba6t1EcfIlDb1n3eZMtAJSv1IRxdTdxgTsu8r0A&osid=1&passive=1209600&service=wise&flowName=GlifWebSignIn&flowEntry=ServiceLogin|url-status=live}}</ref> In 2001, [[PayPal]] used such tests as part of a fraud prevention strategy in which they asked humans to "retype distorted text that programs have difficulty recognizing."<ref name=stringham2015>{{cite book |last1=Stringham|first1=Edward P |title=Private Governance : creating order in economic and social life |publisher=[[Oxford University Press]] |year=2015 |page=105 |isbn=978-0-19-936516-6 |oclc=5881934034 }}</ref> PayPal co founder and CTO [[Max Levchin]] helped commercialize this use.


A popular deployment of CAPTCHA technology, [[reCAPTCHA]], was acquired by Google in 2009.<ref>{{cite web|title=Teaching computers to read: Google acquires reCAPTCHA|url=https://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html|website=Google Official Blog}}</ref> In addition to preventing bot fraud for its users, Google used reCAPTCHA and CAPTCHA technology to digitize the archives of ''[[The New York Times]]'' and books from Google Books in 2011.<ref>{{cite news|title=Deciphering Old Texts, One Woozy, Curvy Word at a Time|url=https://www.nytimes.com/2011/03/29/science/29recaptcha.html|website=The New York Times|date=28 March 2011|last1=Gugliotta|first1=Guy}}</ref>
A popular deployment of CAPTCHA technology, [[reCAPTCHA]], was acquired by Google in 2009.<ref>{{cite web |title=Teaching computers to read: Google acquires reCAPTCHA |url=https://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html |url-status=live |archive-url=https://web.archive.org/web/20190831195346/https://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html |archive-date=31 August 2019 |access-date=29 October 2018 |website=Google Official Blog}}</ref> In addition to preventing bot fraud for its users, Google used reCAPTCHA and CAPTCHA technology to digitize the archives of ''[[The New York Times]]'' and books from Google Books in 2011.<ref>{{cite news |last1=Gugliotta |first1=Guy |date=28 March 2011 |title=Deciphering Old Texts, One Woozy, Curvy Word at a Time |website=The New York Times |url=https://www.nytimes.com/2011/03/29/science/29recaptcha.html |url-status=live |access-date=29 October 2018 |archive-url=https://web.archive.org/web/20171117172409/http://www.nytimes.com/2011/03/29/science/29recaptcha.html |archive-date=17 November 2017}}</ref>


=== Invention ===
=== Invention ===
Eran Reshef, [[Gili Raanan]] and Eilon Solan<ref name=":0">{{cite patent |country=US|number=2005/0114705 A1|status= |title= Method and system for discriminating a human action from a computerized action|pubdate=26 May 2005|gdate= |pridate=11 December 1997|fdate=1 March 2004|inventor1-first=Eran|inventor1-last=Reshef|inventor2-first=Gil|inventor2-last=Raanan|inventorlink2=Gili Raanan|inventor3-first=Eilon|inventor3-last=Solan|url=https://patentimages.storage.googleapis.com/9c/fc/21/1188d59d94d268/US20050114705A1.pdf}}</ref> who worked at [[Sanctum (company)|Sanctum]] on [[Web application firewall|Application Security Firewall]] first patented CAPTCHA in 1997. Their patent application details that "The invention is based on applying human advantage in applying sensory and cognitive skills to solving simple problems that prove to be extremely hard for computer software. Such skills include, but are not limited to processing of sensory information such as identification of objects and letters within a noisy graphical environment".
Eran Reshef, [[Gili Raanan]] and Eilon Solan, who worked at [[Sanctum (company)|Sanctum]] on [[Web application firewall|Application Security Firewall]], first patented CAPTCHA in 1997. Their patent application details that "The invention is based on applying human advantage in applying sensory and cognitive skills to solving simple problems that prove to be extremely hard for computer software. Such skills include, but are not limited to processing of sensory information such as identification of objects and letters within a noisy graphical environment, signals and speech within an auditory signal, patterns and objects within a video or animation sequence".<ref name=":0">{{cite patent|country=US|number=2005/0114705 A1|status=|title=Method and system for discriminating a human action from a computerized action|pubdate=26 May 2005|gdate=|pridate=11 December 1997|fdate=1 March 2004|inventor1-first=Eran|inventor1-last=Reshef|inventor2-first=Gil|inventor2-last=Raanan|inventorlink2=Gili Raanan|inventor3-first=Eilon|inventor3-last=Solan|url=https://patentimages.storage.googleapis.com/9c/fc/21/1188d59d94d268/US20050114705A1.pdf}} {{Webarchive|url=https://web.archive.org/web/20190224001924/https://patentimages.storage.googleapis.com/9c/fc/21/1188d59d94d268/US20050114705A1.pdf |date=24 February 2019 }}</ref>

== Positives ==
CAPTCHAs' purpose is to prevent spam on websites, such as promotion spam, registration spam, and data scraping, and bots are less likely to abuse websites with spamming if those websites use CAPTCHA. Many websites use CAPTCHA to prevent bot raiding, and it works effectively. CAPTCHA's design is that humans can complete CAPTCHAs, while most robots can't.<ref>{{Cite web |last=Stec |first=Albert |date=2022-06-12 |title=What is CAPTCHA and How Does It Work? {{!}} Baeldung on Computer Science |url=https://www.baeldung.com/cs/captcha-intro |access-date=2022-11-01 |website=www.baeldung.com |language=en-US}}</ref> New CAPTCHAs look at the user's behaviour on the internet, to prove that they are a human.<ref>{{Cite web |date=November 1, 2022 |title=What is a CAPTCHA? |url=https://www.cloudflare.com/learning/bots/how-captchas-work/ |url-status=live |access-date=November 1, 2022 |website=Cloudflare}}</ref> A normal CAPTCHA test only appears if the user acts like a bot, such as when they request webpages, or click links too fast.


== Characteristics ==
== Characteristics ==
CAPTCHAs are automated, requiring little human maintenance or intervention to administer, producing benefits in cost and reliability.<ref>{{Cite web |title=How CAPTCHAs work {{!}} What does CAPTCHA mean? |url=https://www.cloudflare.com/learning/bots/how-captchas-work/ |url-status=live |access-date=October 27, 2022 |website=Cloudflare}}</ref>
CAPTCHAs are automated, requiring little human maintenance or intervention to administer, producing benefits in cost and reliability.<ref>{{Cite web |title=How CAPTCHAs work {{!}} What does CAPTCHA mean? |url=https://www.cloudflare.com/learning/bots/how-captchas-work/ |url-status=live |access-date=October 27, 2022 |website=Cloudflare |archive-date=27 October 2022 |archive-url=https://web.archive.org/web/20221027061629/https://www.cloudflare.com/learning/bots/how-captchas-work/ }}</ref>

The algorithm used to create the CAPTCHA must be public, though it may be covered by a patent. This is done to demonstrate that breaking it requires the solution to a difficult problem in the field of artificial intelligence (AI) rather than just the discovery of the (secret) algorithm, which could be obtained through [[reverse engineering]] or other means.<ref name="Justie2020">{{Cite journal |last1=Justie |first1=Brian |year=2020 |title=Little history of CAPTCHA |journal=Internet Histories |volume=5 |pages=30–47 |doi=10.1080/24701475.2020.1831197 |s2cid=228834122}}</ref>


Modern text-based CAPTCHAs are designed such that they require the simultaneous use of three separate abilities—invariant recognition, [[image segmentation|segmentation]], and parsing to complete the task.<ref>{{cite journal|last1=Chellapilla|first1=Kumar|first2=Kevin|last2=Larson|first3=Patrice|last3=Simard|first4=Mary|last4=Czerwinski|title=Designing Human Friendly Human Interaction Proofs (HIPs)|journal=Microsoft Research|url=https://research.microsoft.com/pubs/101726/HIPSCHI2005.pdf|archive-url=https://web.archive.org/web/20150410195118/http://research.microsoft.com/pubs/101726/HIPSCHI2005.pdf|archive-date=10 April 2015}}</ref>
Modern text-based CAPTCHAs are designed such that they require the simultaneous use of three separate abilities—invariant recognition, [[image segmentation|segmentation]], and parsing to complete the task.<ref>{{cite journal|last1=Chellapilla|first1=Kumar|first2=Kevin|last2=Larson|first3=Patrice|last3=Simard|first4=Mary|last4=Czerwinski|title=Designing Human Friendly Human Interaction Proofs (HIPs)|journal=Microsoft Research|url=https://research.microsoft.com/pubs/101726/HIPSCHI2005.pdf|archive-url=https://web.archive.org/web/20150410195118/http://research.microsoft.com/pubs/101726/HIPSCHI2005.pdf|archive-date=10 April 2015}}</ref>


* Invariant recognition refers to the ability to recognize letters despite a large amount of variation in their shapes.<ref>{{Cite journal |last=Karimi-Rouzbahani |first=Hamid |last2=Bagheri |first2=Nasour |last3=Ebrahimpour |first3=Reza |date=2017-10-31 |title=Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models |url=https://www.nature.com/articles/s41598-017-13756-8 |journal=Scientific Reports |language=en |volume=7 |issue=1 |pages=14402 |doi=10.1038/s41598-017-13756-8 |issn=2045-2322}}</ref>
* Invariant recognition refers to the ability to recognize letters despite a large amount of variation in their shapes.<ref>{{Cite journal |last1=Karimi-Rouzbahani |first1=Hamid |last2=Bagheri |first2=Nasour |last3=Ebrahimpour |first3=Reza |date=2017-10-31 |title=Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models |journal=Scientific Reports |language=en |volume=7 |issue=1 |pages=14402 |doi=10.1038/s41598-017-13756-8 |pmid=29089520 |pmc=5663844 |bibcode=2017NatSR...714402K |issn=2045-2322}}</ref>
* Segmentation is the ability to separate one letter from another, made difficult in CAPTCHAs.
* Segmentation is the ability to separate one letter from another, made difficult in CAPTCHAs.
* Context: The CAPTCHA must be understood holistically to correctly identify each character.<ref>{{Cite web |title=Making CAPTCHAs Expensive Again: If You're Using Text-Based CAPTCHAs, You're Doing It Wrong {{!}} Tripwire |url=https://www.tripwire.com/state-of-security/youre-using-text-based-captchas-youre-wrong-making-captchas-expensive |access-date=2022-10-28 |website=www.tripwire.com}}</ref>
* Parsing refers to the ability to understand the CAPTCHA holistically, in order to correctly identify each character.<ref>{{Cite web |title=Making CAPTCHAs Expensive Again: If You're Using Text-Based CAPTCHAs, You're Doing It Wrong {{!}} Tripwire |url=https://www.tripwire.com/state-of-security/youre-using-text-based-captchas-youre-wrong-making-captchas-expensive |access-date=2022-10-28 |website=www.tripwire.com |archive-date=28 October 2022 |archive-url=https://web.archive.org/web/20221028040010/https://www.tripwire.com/state-of-security/youre-using-text-based-captchas-youre-wrong-making-captchas-expensive |url-status=live }}</ref>


Each of these problems poses a significant challenge for a computer, even in isolation. These three techniques make CAPTCHAs hard.<ref name=bursz>{{cite book|last1=Bursztein|first1=Elie|first2=Matthieu|last2=Martin|first3=John C.|last3=Mitchell |chapter= Text-based CAPTCHA Strengths and Weaknesses |title=ACM Computer and Communication Security 2011 (CSS'2011)|year=2011 |chapter-url=https://www.elie.net/publication/text-based-captcha-strengths-and-weaknesses}}</ref><!-- What about image CAPTCHAs? -->
Each of these problems poses a significant challenge for a computer, even in isolation. Therefore, these three techniques in tandem make CAPTCHAs difficult for computers to solve.<ref name=bursz>{{cite book|last1=Bursztein|first1=Elie|first2=Matthieu|last2=Martin|first3=John C.|last3=Mitchell|chapter=Text-based CAPTCHA Strengths and Weaknesses|title=ACM Computer and Communication Security 2011 (CSS'2011)|year=2011|chapter-url=https://www.elie.net/publication/text-based-captcha-strengths-and-weaknesses|access-date=5 April 2016|archive-date=24 November 2015|archive-url=https://web.archive.org/web/20151124055747/https://www.elie.net/publication/text-based-captcha-strengths-and-weaknesses|url-status=live}}</ref>


Whilst primarily used for security reasons, CAPTCHAs can also serve as a benchmark task for artificial intelligence technologies. According to an article by Ahn, Blum and Langford,<ref name=Ahn2003>{{Cite book | chapter-url=https://link.springer.com/content/pdf/10.1007/3-540-39200-9_18.pdf | doi=10.1007/3-540-39200-9_18| chapter=CAPTCHA: Using Hard AI Problems for Security| title=Advances in Cryptology — EUROCRYPT 2003| volume=2656| pages=294–311| series=Lecture Notes in Computer Science| year=2003| last1=von Ahn| first1=Luis| last2=Blum| first2=Manuel| last3=Hopper| first3=Nicholas J.| last4=Langford| first4=John| isbn=978-3-540-14039-9}}</ref> "any program that passes the tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem."<ref>Moy G, N Jones and C Harkless (2004) "[http://www.cs.duke.edu/courses/cps296.3/spring07/breaking_captchas.pdf Distortion estimation techniques in solving visual CAPTCHAs]", Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.</ref> They argue that the advantages of using [[AI-complete|hard AI]] problems as a means for security are twofold. Either the problem goes unsolved and there remains a reliable method for distinguishing humans from computers, or the problem is solved and a difficult AI problem is resolved along with it.<ref name="Ahn2003" />
Whilst primarily used for security reasons, CAPTCHAs can also serve as a benchmark task for artificial intelligence technologies. According to an article by Ahn, Blum and Langford,<ref name=Ahn2003>{{Cite book| chapter-url=https://link.springer.com/content/pdf/10.1007/3-540-39200-9_18.pdf| doi=10.1007/3-540-39200-9_18| chapter=CAPTCHA: Using Hard AI Problems for Security| title=Advances in Cryptology—EUROCRYPT 2003| volume=2656| pages=294–311| series=Lecture Notes in Computer Science| year=2003| last1=von Ahn| first1=Luis| last2=Blum| first2=Manuel| last3=Hopper| first3=Nicholas J.| last4=Langford| first4=John| isbn=978-3-540-14039-9| s2cid=5658745| access-date=30 August 2019| archive-date=4 May 2019| archive-url=https://web.archive.org/web/20190504115630/https://link.springer.com/content/pdf/10.1007%2F3-540-39200-9_18.pdf| url-status=live}}</ref> "any program that passes the tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem."<ref>{{Cite conference |last=Moy |first=Gabriel |last2=Jones |first2=Nathan |last3=Harkless |first3=Curt |last4=Potter |first4=Randall |date=2004 |title=Distortion estimation techniques in solving visual CAPTCHAs |url=http://www.cs.duke.edu/courses/cps296.3/spring07/breaking_captchas.pdf |publisher=IEEE |volume=2 |pages=23–28 |doi=10.1109/CVPR.2004.1315140 |isbn=978-0-7695-2158-9|archiveurl=https://web.archive.org/web/20200729175253/https://www2.cs.duke.edu/courses/cps296.3/spring07/breaking_captchas.pdf |archivedate=29 July 2020|conference=[[Conference on Computer Vision and Pattern Recognition|Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition]]}}</ref> They argue that the advantages of using [[AI-complete|hard AI]] problems as a means for security are twofold. Either the problem goes unsolved and there remains a reliable method for distinguishing humans from computers, or the problem is solved and a difficult AI problem is resolved along with it.<ref name="Ahn2003" />


== Accessibility ==
== Accessibility ==
{{See also|Web accessibility}}
{{See also|Web accessibility}}
[[File:FancyCaptcha screenshot.png|left|thumb|260px|Many websites require typing a CAPTCHA when creating an account to prevent spam.]]
[[File:FancyCaptcha screenshot.png|left|thumb|260px|Many websites require typing a CAPTCHA when creating an account to prevent spam. This image contains a user trying to type the CAPTCHA word "sepalbeam" to protect against automated spam.]]


CAPTCHAs based on reading text — or other visual-perception tasks — prevent [[blindness|blind]] or [[visual impairment|visually impaired]] users from accessing the protected resource.<ref name="w3c_inaccessibility">{{cite web |url=http://www.w3.org/TR/turingtest/ |title=Inaccessibility of CAPTCHA |date=2005-11-23 |access-date=2015-04-27 |publisher=[[W3C]] |author=May, Matt}}</ref> However, CAPTCHAs do not have to be visual. Any hard [[artificial intelligence]] problem, such as [[speech recognition]], can be used as CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA, such as reCAPTCHA, though a 2011 paper demonstrated a technique for defeating the popular schemes at the time.<ref>{{cite journal|last1=Bursztein|first1=Elie|first2=Romain|last2=Beauxis|first3=Hristo|last3=Perito|first4=Daniele|last4=Paskov|last5=fabry|first5=Celine|last6=Mitchell|first6=John C.|title=The failure of noise-based non-continuous audio captchas|journal= IEEE Symposium on Security and Privacy (S&P), 2011|pages=19–31|year=2011|url=https://www.elie.net/publication/the-failure-of-noise-based-non-continuous-audio-captchas|doi=10.1109/SP.2011.14|isbn=978-1-4577-0147-4|s2cid=6933726}}</ref>
CAPTCHAs based on reading text—or other visual-perception tasks—prevent [[blindness|blind]] or [[visual impairment|visually impaired]] users from accessing the protected resource.<ref name="w3c_inaccessibility">{{cite web |url=http://www.w3.org/TR/turingtest/ |title=Inaccessibility of CAPTCHA |date=2005-11-23 |access-date=2015-04-27 |publisher=[[W3C]] |author=May, Matt |archive-date=21 May 2012 |archive-url=https://web.archive.org/web/20120521023537/http://www.w3.org/TR/turingtest/ |url-status=live }}</ref><ref>{{cite magazine |author=Shea, Michael |url=http://www.theskinny.co.uk/tech/features/captcha-spambots-ebooks-and-the-turing-test |title=CAPTCHA: Spambots, eBooks and the Turing Test |date=19 November 2015 |magazine=[[The Skinny (magazine)|The Skinny]] |access-date=9 January 2016 |archive-date=27 January 2016 |archive-url=https://web.archive.org/web/20160127043239/http://www.theskinny.co.uk/tech/features/captcha-spambots-ebooks-and-the-turing-test |url-status=live }}</ref> Because CAPTCHAs are designed to be unreadable by machines, common [[assistive technology]] tools such as [[screen readers]] cannot interpret them. The use of CAPTCHA thus excludes a small percentage of users from using significant subsets of such common Web-based services as PayPal, Gmail, Orkut, Yahoo!, many forum and weblog systems, etc.<ref>{{Cite web|title=Inaccessibility of CAPTCHA|url=https://www.w3.org/TR/2019/NOTE-turingtest-20191209/Overview.html|access-date=2020-10-31|website=www.w3.org|archive-date=4 November 2020|archive-url=https://web.archive.org/web/20201104011109/https://www.w3.org/TR/2019/NOTE-turingtest-20191209/Overview.html|url-status=live}}</ref> In certain jurisdictions, site owners could become targets of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with [[Section 508]] in the United States.


CAPTCHAs do not have to be visual. Any hard [[artificial intelligence]] problem, such as [[speech recognition]], can be used as CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA, such as reCAPTCHA, though a 2011 paper demonstrated a technique for defeating the popular schemes at the time.<ref>{{cite book|last1=Bursztein|first1=Elie|first2=Romain|last2=Beauxis|first3=Hristo|last3=Perito|first4=Daniele|last4=Paskov|last5=fabry|first5=Celine|last6=Mitchell|first6=John C.|title=2011 IEEE Symposium on Security and Privacy |chapter=The Failure of Noise-Based Non-continuous Audio Captchas |pages=19–31|year=2011|chapter-url=https://www.elie.net/publication/the-failure-of-noise-based-non-continuous-audio-captchas|doi=10.1109/SP.2011.14|isbn=978-1-4577-0147-4|s2cid=6933726|access-date=5 April 2016|archive-date=16 April 2016|archive-url=https://web.archive.org/web/20160416221427/https://www.elie.net/publication/the-failure-of-noise-based-non-continuous-audio-captchas|url-status=live}}</ref>
Blind or visually impaired people have problems with CAPTCHAs.<ref>{{cite magazine |author=Shea, Michael |url=http://www.theskinny.co.uk/tech/features/captcha-spambots-ebooks-and-the-turing-test |title=CAPTCHA: Spambots, eBooks and the Turing Test |date=19 November 2015 |magazine=[[The Skinny (magazine)|The Skinny]] |access-date=9 January 2016}}</ref> Because CAPTCHAs are designed to be unreadable by machines, common [[assistive technology]] tools such as [[screen readers]] cannot interpret them. Since sites may use CAPTCHAs as part of the initial registration process, or even every login, this challenge can block access. In certain jurisdictions, site owners could become targets of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with [[Section 508]] in the United States.


A method of improving CAPTCHA to ease the work with it was proposed by ProtectWebForm and named "Smart CAPTCHA".<ref>{{cite web|date=2006-10-08|title=Smart Captcha|url=http://www.protectwebform.com/smartcaptcha|url-status=dead|archive-url=https://web.archive.org/web/20161104163541/http://protectwebform.com/smartcaptcha|archive-date=2016-11-04|access-date=2017-09-15|publisher=Protect Web Form .COM}}</ref> Developers are advised to combine CAPTCHA with JavaScript. Since it is hard for most bots to parse and execute JavaScript, a combinatory method which fills the CAPTCHA fields and hides both the image and the field from human eyes was proposed.<ref>{{Cite web |title=Invisible reCAPTCHA |url=https://developers.google.com/recaptcha/docs/invisible |access-date=2022-10-28 |website=Google Developers |language=en |archive-date=16 January 2020 |archive-url=https://web.archive.org/web/20200116133416/https://developers.google.com/recaptcha/docs/invisible |url-status=live }}</ref>
The use of CAPTCHA thus excludes a small percentage of users from using significant subsets of such common Web-based services as PayPal, Gmail, Orkut, Yahoo!, many forum and weblog systems, etc.<ref>{{Cite web|title=Inaccessibility of CAPTCHA|url=https://www.w3.org/TR/2019/NOTE-turingtest-20191209/Overview.html|access-date=2020-10-31|website=www.w3.org}}</ref>


One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for blind users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = "mathematical"). However, these may be difficult for users with a cognitive disorder, such as [[dyscalculia]].<ref>{{Cite web |title=Inaccessibility of CAPTCHA |url=https://www.w3.org/TR/turingtest/Overview.html |access-date=2022-10-27 |website=www.w3.org |archive-date=27 October 2022 |archive-url=https://web.archive.org/web/20221027095016/https://www.w3.org/TR/turingtest/Overview.html |url-status=live }}</ref>
Even for individuals who aren't blind, new generations of graphical CAPTCHAs, designed to overcome sophisticated recognition software, can be very hard or impossible to read.<ref>{{Cite web |title=Why are CAPTCHAs so hard to read? |url=https://www.quora.com/Why-are-CAPTCHAs-so-hard-to-read |access-date=2022-10-27 |website=Quora |language=en}}</ref>


Challenges such as a logic puzzle, or trivia question can also be used as a CAPTCHA. There is research into their resistance against countermeasures.<ref>{{Cite journal |last1=Gao |first1=Song |last2=Mohamed |first2=Manar |last3=Saxena |first3=Nitesh |last4=Zhang |first4=Chengcui |date=June 23, 2017 |title=Emerging-Image Motion CAPTCHAs: Vulnerabilities of Existing Designs, and Countermeasures |journal=IEEE Transactions on Dependable and Secure Computing |type=Website |language=English |edition=1st |volume=16 |issue=6 |pages=1040–1053 |doi=10.1109/TDSC.2017.2719031 |s2cid=41097185 |issn=1941-0018 |doi-access=free }}</ref>
A method of improving CAPTCHA to ease the work with it was proposed by ProtectWebForm and named "Smart CAPTCHA".<ref>{{cite web|date=2006-10-08|title=Smart Captcha|url=http://www.protectwebform.com/smartcaptcha|url-status=dead|archive-url=https://web.archive.org/web/20161104163541/http://protectwebform.com/smartcaptcha|archive-date=2016-11-04|access-date=2017-09-15|publisher=Protect Web Form .COM}}</ref> Developers are advised to combine CAPTCHA with JavaScript. Since it is hard for most bots to parse and execute JavaScript, a combinatory method which fills the CAPTCHA fields and hides both the image and the field from human eyes was proposed.<ref>{{Cite web |title=Invisible reCAPTCHA |url=https://developers.google.com/recaptcha/docs/invisible |access-date=2022-10-28 |website=Google Developers |language=en}}</ref>

One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for blind users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = "mathematical"). However, these may be difficult for users with a cognitive disorder, such as [[dyscalculia]].<ref>{{Cite web |title=Inaccessibility of CAPTCHA |url=https://www.w3.org/TR/turingtest/Overview.html |access-date=2022-10-27 |website=www.w3.org}}</ref>

Other kinds of challenges, such as those that require understanding the meaning of some text (e.g., a logic puzzle, trivia question, or instructions on how to create a password) can also be used as a CAPTCHA. There is research into their resistance against countermeasures.<ref>{{Cite journal |last=Gao |first=Song |last2=Mohamed |first2=Manar |last3=Saxena |first3=Nitesh |last4=Zhang |first4=Chengcui |date=June 23, 2017 |title=Emerging-Image Motion CAPTCHAs: Vulnerabilities of Existing Designs, and Countermeasures |url=https://ieeexplore.ieee.org/document/7956254 |url-status=live |journal=IEEE Transactions on Dependable and Secure Computing |type=Website |language=English |edition=1st |volume=16 |issue=6 |pages=1040–1053 |doi=10.1109/TDSC.2017.2719031 |issn=1941-0018 |url-access=registration |access-date=October 27, 2022 |via=IEEEXplore}}</ref>


== Circumvention ==
== Circumvention ==


Two main ways to bypass CAPTCHA include using cheap human labor to recognize them, and using [[machine learning]] to build an automated solver.<ref>{{cite book |last=Jakobsson |first=Markus |title=The death of the Internet|url=http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1118062418.html|access-date=4 April 2016|date=August 2012}}</ref> According to former Google "[[click fraud]] czar" [[Shuman Ghosemajumder]], there are numerous services which solve CAPTCHAs automatically.<ref name=ai-security>{{cite news |last=Ghosemajumder |first=Shuman |title=The Imitation Game: The New Frontline of Security |url=http://www.infoq.com/presentations/ai-security |agency=InfoQ| access-date=8 December 2015 |newspaper=InfoQ |date=8 December 2015}}</ref>
Two main ways to bypass CAPTCHA include using cheap human labor to recognize them, and using [[machine learning]] to build an automated solver.<ref>{{cite book|last=Jakobsson|first=Markus|title=The death of the Internet|url=http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1118062418.html|access-date=4 April 2016|date=August 2012|archive-date=15 October 2014|archive-url=https://web.archive.org/web/20141015182639/http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1118062418.html|url-status=live}}</ref> According to former Google "''[[click fraud]] czar''" [[Shuman Ghosemajumder]], there are numerous services which solve CAPTCHAs automatically.<ref name=ai-security>{{cite news |last=Ghosemajumder |first=Shuman |title=The Imitation Game: The New Frontline of Security |url=http://www.infoq.com/presentations/ai-security |agency=InfoQ |access-date=8 December 2015 |newspaper=InfoQ |date=8 December 2015 |archive-date=23 March 2019 |archive-url=https://web.archive.org/web/20190323061742/https://www.infoq.com/presentations/ai-security |url-status=live }}</ref>


=== Machine learning-based attacks ===
=== Machine learning-based attacks ===
There was not a systematic methodology for designing or evaluating early CAPTCHAs.<ref name=bursz /> As a result, there were many instances in which CAPTCHAs were of a fixed length and therefore automated tasks could be constructed to successfully make educated guesses about where segmentation should take place. Other early CAPTCHAs contained limited sets of words, which made the test much easier to game. Still others{{Example needed|date=October 2022}} made the mistake of relying too heavily on background confusion in the image. In each case, algorithms were created that were successfully able to complete the task by exploiting these design flaws. However, light changes to the CAPTCHA could thwart them. Modern CAPTCHAs like [[reCAPTCHA]] no longer rely just on fixed patterns but instead present variations of characters that are often collapsed together, making segmentation almost impossible. These newest iterations have been much more successful at warding off automated tasks.<ref name=bursz2 />
There was not a systematic methodology for designing or evaluating early CAPTCHAs.<ref name=bursz /> As a result, there were many instances in which CAPTCHAs were of a fixed length and therefore automated tasks could be constructed to successfully make educated guesses about where segmentation should take place. Other early CAPTCHAs contained limited sets of words, which made the test much easier to game<!-- This sentence makes no sense! -->. Still others{{Example needed|date=October 2022}} made the mistake of relying too heavily on background confusion in the image. In each case, algorithms were created that were successfully able to complete the task by exploiting these design flaws. However, light changes to the CAPTCHA could thwart them. Modern CAPTCHAs like [[reCAPTCHA]] rely on present variations of characters that are collapsed together, making them hard to segment, and they have warded off automated tasks.<ref name=bursz2 />


[[File:Modern-captcha.jpg|thumb|An example of a [[reCAPTCHA]] challenge from 2007, containing the words "following finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.]]
[[File:Modern-captcha.jpg|thumb|An example of a [[reCAPTCHA]] challenge from 2007, containing the words "following finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.]]
[[File:Captchacat.png|thumb|A CAPTCHA usually has a text box directly underneath where the user should fill out the text that they see. In this case, "sclt ..was here".]]
[[File:Captchacat.png|thumb|A CAPTCHA usually has a text box directly underneath where the user should fill out the text that they see. In this case, "sclt ..was here".]]


In October 2013, artificial intelligence company [[Vicarious (Company)|Vicarious]] claimed that it had developed a generic CAPTCHA-solving algorithm that was able to solve modern CAPTCHAs with character recognition rates of up to 90%.<ref>{{cite web|last=Summers|first=Nick|title=Vicarious claims its AI software can crack up to 90% of CAPTCHAs offered by Google, Yahoo and PayPal|url=https://thenextweb.com/insider/2013/10/28/vicarious-claims-ai-software-can-now-crack-90-captchas-google-yahoo-paypal/ |publisher=TNW}}</ref> However, [[Luis von Ahn]], a pioneer of early CAPTCHA and founder of reCAPTCHA, said: "It's hard for me to be impressed since I see these every few months." 50 similar claims to that of Vicarious had been made since 2003.<ref>{{cite web |last=Hof |first=Robert |title=AI Startup Vicarious Claims Milestone In Quest To Build A Brain: Cracking CAPTCHA|url=https://www.forbes.com/sites/roberthof/2013/10/28/ai-startup-vicarious-claims-milestone-in-quest-to-build-a-brain-craking-captcha/|work=Forbes}}</ref>
In October 2013, artificial intelligence company [[Vicarious (Company)|Vicarious]] claimed that it had developed a generic CAPTCHA-solving algorithm that was able to solve modern CAPTCHAs with character recognition rates of up to 90%.<ref>{{cite web|last=Summers|first=Nick|title=Vicarious claims its AI software can crack up to 90% of CAPTCHAs offered by Google, Yahoo and PayPal|url=https://thenextweb.com/insider/2013/10/28/vicarious-claims-ai-software-can-now-crack-90-captchas-google-yahoo-paypal/|publisher=TNW|access-date=19 June 2018|archive-date=15 September 2018|archive-url=https://web.archive.org/web/20180915002117/https://thenextweb.com/insider/2013/10/28/vicarious-claims-ai-software-can-now-crack-90-captchas-google-yahoo-paypal/|url-status=live}}</ref> However, [[Luis von Ahn]], a pioneer of early CAPTCHA and founder of reCAPTCHA, said: "It's hard for me to be impressed since I see these every few months." 50 similar claims to that of Vicarious had been made since 2003.<ref>{{cite web|last=Hof|first=Robert|title=AI Startup Vicarious Claims Milestone In Quest To Build A Brain: Cracking CAPTCHA|url=https://www.forbes.com/sites/roberthof/2013/10/28/ai-startup-vicarious-claims-milestone-in-quest-to-build-a-brain-craking-captcha/|work=Forbes|access-date=25 August 2017|archive-date=15 September 2018|archive-url=https://web.archive.org/web/20180915002819/https://www.forbes.com/sites/roberthof/2013/10/28/ai-startup-vicarious-claims-milestone-in-quest-to-build-a-brain-craking-captcha/|url-status=live}}</ref>


In August 2014 at Usenix WoOT conference, [[Elie Bursztein|Bursztein]] et al. presented the first generic CAPTCHA-solving algorithm based on reinforcement learning and demonstrated its efficiency against many popular CAPTCHA schemas.<ref name="bursz2" />
In August 2014 at Usenix WoOT conference, [[Elie Bursztein|Bursztein]] et al. presented the first generic CAPTCHA-solving algorithm based on reinforcement learning and demonstrated its efficiency against many popular CAPTCHA schemas.<ref name="bursz2" />


In October 2018 at [[Association for Computing Machinery|ACM]] CCS'18 conference, Ye et al. presented a deep learning-based attack that could solve all 11 text captcha schemes used by the top-50 popular websites in 2018 with a high success rate. An effective CAPTCHA solver can be trained using as few as 500 real CAPTCHAs.<ref>{{cite journal |title=Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach |publisher=25th ACM Conference on Computer and Communications Security (CCS), 2018|doi = 10.1145/3243734.3243754|s2cid=53106794|url=https://eprints.lancs.ac.uk/id/eprint/126984/1/ccs18.pdf}}</ref>
In October 2018 at [[Association for Computing Machinery|ACM]] CCS'18 conference, Ye et al. presented a deep learning-based attack that could consistently solve all 11 text captcha schemes used by the top-50 popular websites in 2018. An effective CAPTCHA solver can be trained using as few as 500 real CAPTCHAs.<ref>{{cite journal|title=Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach|periodical=25th ACM Conference on Computer and Communications Security (CCS), 2018|doi=10.1145/3243734.3243754|s2cid=53106794|url=https://eprints.lancs.ac.uk/id/eprint/126984/1/ccs18.pdf|access-date=16 March 2020|archive-date=29 October 2020|archive-url=https://web.archive.org/web/20201029202241/https://eprints.lancs.ac.uk/id/eprint/126984/1/ccs18.pdf|url-status=live}}</ref>


=== Human labor ===
=== Human labor ===
It is possible to subvert CAPTCHAs by relaying them to a [[sweatshop]] of human operators who are employed to decode CAPTCHAs. A 2005 paper from a [[W3C]] working group stated that such an operator could verify hundreds per hour.<ref name="w3c_inaccessibility" /> In 2010, the [[University of California, San Diego|University of California at San Diego]] conducted a large scale study of CAPTCHA farms and found out that the retail price for solving one million CAPTCHAs was as low as $1,000.<ref name="motoyama" />
It is possible to subvert CAPTCHAs by relaying them to a [[sweatshop]] of human operators who are employed to decode CAPTCHAs. A 2005 paper from a [[W3C]] working group said that they could verify hundreds per hour.<ref name="w3c_inaccessibility" /> In 2010, the [[University of California, San Diego|University of California at San Diego]] conducted a large scale study of CAPTCHA farms. The retail price for solving one million CAPTCHAs was as low as $1,000.<ref name="motoyama" />


Another technique consists of using a script to re-post the target site's CAPTCHA as a CAPTCHA to the attacker's site, which unsuspecting humans visit and solve within a short while for the script to use.<ref>{{cite web|url=http://www.boingboing.net/2004/01/27/solving_and_creating.html |title=Solving and creating captchas with free porn |last=Doctorow |first=Cory |author-link=Cory Doctorow |date=2004-01-27 |work=Boing Boing |archive-url=https://web.archive.org/web/20060209040456/http://www.boingboing.net/2004/01/27/solving_and_creating.html |archive-date=2006-02-09 |access-date=2015-04-27 |url-status=dead }}</ref><ref>{{cite web | url = http://petmail.lothar.com/design.html#auto35 | title = Hire People To Solve CAPTCHA Challenges | access-date = 2015-04-27 | date = 2005-07-21 | work = Petmail Design}}</ref>
Another technique consists of using a script to re-post the target site's CAPTCHA as a CAPTCHA to the attacker's site, which unsuspecting humans visit and solve within a short while for the script to use.<ref>{{cite web|url=http://www.boingboing.net/2004/01/27/solving_and_creating.html |title=Solving and creating captchas with free porn |last=Doctorow |first=Cory |author-link=Cory Doctorow |date=2004-01-27 |work=Boing Boing |archive-url=https://web.archive.org/web/20060209040456/http://www.boingboing.net/2004/01/27/solving_and_creating.html |archive-date=2006-02-09 |access-date=2015-04-27 |url-status=dead }}</ref><ref>{{cite web | url = http://petmail.lothar.com/design.html#auto35 | title = Hire People To Solve CAPTCHA Challenges | access-date = 2015-04-27 | date = 2005-07-21 | work = Petmail Design | archive-date = 18 September 2020 | archive-url = https://web.archive.org/web/20200918050055/http://petmail.lothar.com/design.html#auto35 | url-status = live }}</ref>

In 2023, the generative AI chatbot [[ChatGPT]], tricked a [[Taskrabbit|TaskRabbit]] worker to solve a CAPTCHA by telling the worker it was not a robot and had impaired vision.<ref>{{cite web |last1=Hurler |first1=Kevin |title=Chat-GPT Pretended to Be Blind and Tricked a Human Into Solving a CAPTCHA |url=https://gizmodo.com/gpt4-open-ai-chatbot-task-rabbit-chatgpt-1850227471 |website=Gizmodo |access-date=11 April 2023 |archive-date=11 April 2023 |archive-url=https://web.archive.org/web/20230411200745/https://gizmodo.com/gpt4-open-ai-chatbot-task-rabbit-chatgpt-1850227471 |url-status=live }}</ref>


=== Outsourcing to paid services ===
=== Outsourcing to paid services ===
There are multiple Internet companies like 2Captcha and DeathByCaptcha that offer human and machine backed CAPTCHA solving services for as low as US$0.50 per 1000 solved CAPTCHAs.<ref>{{cite web | url = http://www.prowebscraper.com/blog/top-10-captcha-solving-services-compared/ | title = Top 10 Captcha Solving Services Compared | access-date = 2018-12-10}}</ref> These services offer APIs and libraries that enable users to integrate CAPTCHA circumvention into the tools that CAPTCHAs were designed to block in the first place.<ref>{{Cite web |title=How Cybercriminals Bypass CAPTCHA |url=https://www.f5.com/company/blog/how-cybercriminals-bypass-captcha |access-date=2022-10-27 |website=www.f5.com |language=en-US}}</ref>
There are multiple Internet companies like ''2Captcha'' and ''DeathByCaptcha'' that offer human and machine backed CAPTCHA solving services for as low as US$0.50 per 1000 solved CAPTCHAs.<ref>{{cite web | url = http://www.prowebscraper.com/blog/top-10-captcha-solving-services-compared/ | title = Top 10 Captcha Solving Services Compared | access-date = 2018-12-10 | archive-date = 15 December 2018 | archive-url = https://web.archive.org/web/20181215172409/http://www.prowebscraper.com/blog/top-10-captcha-solving-services-compared/ | url-status = live }}</ref> These services offer APIs and libraries that enable users to integrate CAPTCHA circumvention into the tools that CAPTCHAs were designed to block in the first place.<ref>{{Cite web |title=How Cybercriminals Bypass CAPTCHA |url=https://www.f5.com/company/blog/how-cybercriminals-bypass-captcha |access-date=2022-10-27 |website=www.f5.com |language=en-US |archive-date=27 October 2022 |archive-url=https://web.archive.org/web/20221027095027/https://www.f5.com/company/blog/how-cybercriminals-bypass-captcha |url-status=live }}</ref>


=== Insecure implementation ===
=== Insecure implementation ===
Howard Yeend has identified two implementation issues with poorly designed CAPTCHA systems:<ref>{{cite web | url = http://www.puremango.co.uk/cm_breaking_captcha_115.php | archive-url = https://web.archive.org/web/20170625165854/http://www.puremango.co.uk/2005/11/breaking_captcha_115/ | archive-date = 2017-06-25 | title = Breaking CAPTCHAs Without Using OCR | access-date = 2006-08-22 | year = 2005 | work = (pureMango.co.uk)|first=Howard |last=Yeend }}</ref>reusing the session ID of a known CAPTCHA image, and CAPTCHAs residing on shared servers.
Howard Yeend has identified two implementation issues with poorly designed CAPTCHA systems:<ref>{{cite web | url = http://www.puremango.co.uk/cm_breaking_captcha_115.php | archive-url = https://web.archive.org/web/20170625165854/http://www.puremango.co.uk/2005/11/breaking_captcha_115/ | archive-date = 2017-06-25 | title = Breaking CAPTCHAs Without Using OCR | access-date = 2006-08-22 | year = 2005 | work = (pureMango.co.uk)|first=Howard |last=Yeend }}</ref> reusing the session ID of a known CAPTCHA image, and CAPTCHAs residing on shared servers.

Sometimes, if part of the software generating the CAPTCHA is [[client-side]] (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the un-rendered text. Some CAPTCHA systems use [[MD5]] hashes stored client-side, which may leave the CAPTCHA vulnerable to a [[brute-force attack]].<ref>{{Cite web |title=CTFtime.org / #kksctf open 2019 / Kackers blockchained notes / Writeup |url=https://ctftime.org/writeup/17833 |access-date=2022-10-27 |website=ctftime.org}}</ref>


Sometimes, if part of the software generating the CAPTCHA is [[client-side]] (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the un-rendered text. Some CAPTCHA systems use [[MD5]] hashes stored client-side, which may leave the CAPTCHA vulnerable to a [[brute-force attack]].<ref>{{Cite web |title=CTFtime.org / #kksctf open 2019 / Kackers blockchained notes / Writeup |url=https://ctftime.org/writeup/17833 |access-date=2022-10-27 |website=ctftime.org |archive-date=27 October 2022 |archive-url=https://web.archive.org/web/20221027095023/https://ctftime.org/writeup/17833 |url-status=live }}</ref>
=== Notable attacks ===
Mori et al. published a paper in [[Institute of Electrical and Electronics Engineers|IEEE]] CVPR'03 detailing a method for defeating one of the most popular CAPTCHAs, EZ-Gimpy, which was tested as being 92% accurate in defeating it.<ref>{{cite web |url=http://www.cs.berkeley.edu/~mori/gimpy/mori_gimpy.pdf |archive-url=https://web.archive.org/web/20050403213029/http://www.cs.berkeley.edu/~mori/gimpy/mori_gimpy.pdf |archive-date=2005-04-03 |title=Breaking a Visual CAPTCHA |publisher=Cs.berkeley.edu |date=2002-12-10 |access-date=2017-09-15 |url-status=dead }}</ref> The same method was also shown to defeat the more complex and less-widely deployed Gimpy program 33% of the time. PWNtcha has made progress in defeating commonly used CAPTCHAs, which caused them to be more sophisticated.<ref>{{cite web|url=http://sam.zoy.org/pwntcha/ |title=PWNtcha – Caca Labs |publisher=Sam.zoy.org |date=2009-12-04 |access-date=2013-09-28}}</ref> Podec, a trojan discovered by the security company Kaspersky, forwards CAPTCHA requests to an online human translation service that converts the image to text, fooling the system. Podec targets Android mobile devices.<ref>{{Cite news|url=https://www.scmagazineuk.com/kaspersky-discovers-captcha-duping-podec-malware/article/1478464|title=Kaspersky discovers CAPTCHA-duping Podec malware|date=2015-03-11|newspaper=SC Magazine UK|access-date=2016-11-18}}</ref>


== Alternative CAPTCHAs ==
== Alternative CAPTCHAs ==
Some researchers have proposed alternatives including image recognition CAPTCHAs<!-- Isn't this already being implemented in most websites? --> which require users to identify simple objects in the images presented. The argument in favor of these schemes is that tasks like object recognition are more complex to perform than text recognition and therefore should be more resilient to machine learning based attacks.
Some researchers have proposed alternatives including image recognition CAPTCHAs<!-- Isn't this already being implemented in most websites? --> which require users to identify simple objects in the images presented. The argument in favor of these schemes is that tasks like object recognition are more complex to perform than text recognition and therefore should be more resilient to machine learning based attacks.


Chew et al. published their work in the 7th International Information Security Conference, ISC'04, proposing three different versions of image recognition CAPTCHAs, and validating the proposal with user studies. It is suggested that one of the versions, the anomaly CAPTCHA, is best with 100% of human users being able to pass an anomaly CAPTCHA with at least 90% probability in 42 seconds.<ref>{{cite web |url=http://www.cs.berkeley.edu/~tygar/papers/Image_Recognition_CAPTCHAs/imagecaptcha.pdf |title=Image Recognition CAPTCHAs |publisher=Cs.berkeley.edu |access-date=2013-09-28 |archive-url=https://web.archive.org/web/20130510022240/http://www.cs.berkeley.edu/~tygar/papers/Image_Recognition_CAPTCHAs/imagecaptcha.pdf |archive-date=2013-05-10 |url-status=dead }}</ref> Datta et al. published their paper in the [[Association for Computing Machinery|ACM]] [[Multimedia]] '05 Conference, named IMAGINATION (IMAge Generation for INternet AuthenticaTION), proposing a systematic way to image recognition CAPTCHAs. Images are distorted in such a way that state-of-the-art image recognition approaches (which are potential attack technologies) fail to recognize them.<ref>{{cite web|url=http://infolab.stanford.edu/~wangz/project/imsearch/IMAGINATION/ACM05/ |title=Imagination Paper |publisher=Infolab.stanford.edu |access-date=2013-09-28}}</ref>
Chew et al. published their work in the 7th International Information Security Conference, ISC'04, proposing three different versions of image recognition CAPTCHAs, and validating the proposal with user studies. It is suggested that one of the versions, the anomaly CAPTCHA, is best with 100% of human users being able to pass an anomaly CAPTCHA with at least 90% probability in 42 seconds.<ref>{{cite web |url=http://www.cs.berkeley.edu/~tygar/papers/Image_Recognition_CAPTCHAs/imagecaptcha.pdf |title=Image Recognition CAPTCHAs |publisher=Cs.berkeley.edu |access-date=2013-09-28 |archive-url=https://web.archive.org/web/20130510022240/http://www.cs.berkeley.edu/~tygar/papers/Image_Recognition_CAPTCHAs/imagecaptcha.pdf |archive-date=2013-05-10 |url-status=dead }}</ref> Datta et al. published their paper in the [[Association for Computing Machinery|ACM]] [[Multimedia]] '05 Conference, named IMAGINATION (IMAge Generation for INternet AuthenticaTION), proposing a systematic way to image recognition CAPTCHAs. Images are distorted so image recognition approaches cannot recognise them.<ref>{{cite web |url=http://infolab.stanford.edu/~wangz/project/imsearch/IMAGINATION/ACM05/ |title=Imagination Paper |publisher=Infolab.stanford.edu |access-date=2013-09-28 |archive-date=2 October 2013 |archive-url=https://web.archive.org/web/20131002170726/http://infolab.stanford.edu/~wangz/project/imsearch/IMAGINATION/ACM05/ |url-status=live }}</ref>


Microsoft (Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul) claim to have developed Animal Species Image Recognition for Restricting Access (ASIRRA) which ask users to distinguish cats from dogs. Microsoft had a beta version of this for websites to use.<ref>{{cite web |url=https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/ |archive-url=https://web.archive.org/web/20081215032402/http://research.microsoft.com/en-us/um/redmond/projects/asirra/ |archive-date=15 December 2008 |title=Asirra is a human interactive proof that asks users to identify photos of cats and dogs |website=[[Microsoft]] |url-status=dead }}</ref> They claim "Asirra is easy for users; it can be solved by humans 99.6% of the time in under 30 seconds. Anecdotally, users seemed to find the experience of using Asirra much more enjoyable than a text-based CAPTCHA." This solution was described in a 2007 paper to Proceedings of 14th ACM Conference on Computer and Communications Security (CCS).<ref>{{cite web|url=https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/ |title=Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization|website=[[Microsoft]]}}</ref> However, this project was closed in October 2014 and is no longer available.<ref>{{cite web |url=http://research.microsoft.com/en-us/um/redmond/projects/asirra/installation.aspx |archive-url=https://web.archive.org/web/20090112032323/http://research.microsoft.com/en-us/um/redmond/projects/asirra/installation.aspx |archive-date=12 January 2009 |title=Microsoft's Asirra project closed. |url-status=dead }}</ref>
Microsoft (Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul) claim to have developed Animal Species Image Recognition for Restricting Access (ASIRRA) which ask users to distinguish cats from dogs. Microsoft had a beta version of this for websites to use.<ref>{{cite web |url=https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/ |archive-url=https://web.archive.org/web/20081215032402/http://research.microsoft.com/en-us/um/redmond/projects/asirra/ |archive-date=15 December 2008 |title=Asirra is a human interactive proof that asks users to identify photos of cats and dogs |website=[[Microsoft]] |url-status=dead }}</ref> They claim "Asirra is easy for users; it can be solved by humans 99.6% of the time in under 30 seconds. Anecdotally, users seemed to find the experience of using Asirra much more enjoyable than a text-based CAPTCHA." This solution was described in a 2007 paper to Proceedings of 14th ACM Conference on Computer and Communications Security (CCS).<ref>{{Cite conference |last=Elson |first=Jeremy |last2=Douceur |first2=John |last3=Howell |first3=Jon |last4=Saul |first4=Jared |date=October 2007 |title=Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization |url=https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/ |conference=Proceedings of 14th ACM Conference on Computer and Communications Security |publisher=[[Microsoft]] |archive-url=https://web.archive.org/web/20081215032402/http://research.microsoft.com/en-us/um/redmond/projects/asirra/ |archive-date=15 December 2008 |access-date=15 September 2017 |url-status=live}}</ref> It was closed in October 2014.<ref>{{Cite web |title=After 8 years of operation, Asirra is shutting down effective October 1, 2014. Thank you to all of our users! |url=https://research.microsoft.com/en-us/projects/asirra/default.aspx |url-status=dead |archive-url=https://web.archive.org/web/20150207180225/https://research.microsoft.com/en-us/projects/asirra/default.aspx |archive-date=2015-02-07 |publisher=[[Microsoft]]}}</ref>


== See also ==
== See also ==
* [[Bot prevention]]
* [[Defense strategy (computing)]]
* [[Defense strategy (computing)]]
* [[NuCaptcha]]
* [[NuCaptcha]]
* [[Proof of personhood]]
* [[Proof of personhood]]
* [[Proof-of-work system]]
* [[Proof of work]]
* [[reCAPTCHA]]
* [[reCAPTCHA]]


Line 112: Line 108:
| last4 = Langford
| last4 = Langford
| first4 = John
| first4 = John
| title = Advances in Cryptology — EUROCRYPT 2003
| title = Advances in Cryptology—EUROCRYPT 2003
|date=May 2003
| date = May 2003
| chapter = CAPTCHA: Using Hard AI Problems for Security
| chapter = CAPTCHA: Using Hard AI Problems for Security
| series = Lecture Notes in Computer Science
| series = Lecture Notes in Computer Science
| volume = 2656
| volume = 2656
| pages = 294–311
| pages = 294–311
| conference= EUROCRYPT 2003: International Conference on the Theory and Applications of Cryptographic Techniques
| conference = EUROCRYPT 2003: International Conference on the Theory and Applications of Cryptographic Techniques
| chapter-url = https://link.springer.com/content/pdf/10.1007/3-540-39200-9_18.pdf
| chapter-url = https://link.springer.com/content/pdf/10.1007/3-540-39200-9_18.pdf
| doi = 10.1007/3-540-39200-9_18
| doi = 10.1007/3-540-39200-9_18
| isbn = 978-3-540-14039-9
| isbn = 978-3-540-14039-9
| doi-access = free
| doi-access = free
| access-date = 30 August 2019
| archive-date = 4 May 2019
| archive-url = https://web.archive.org/web/20190504115630/https://link.springer.com/content/pdf/10.1007%2F3-540-39200-9_18.pdf
| url-status = live
}}</ref>
}}</ref>
<ref name="bursz2">{{cite conference
<ref name="bursz2">{{cite conference
Line 135: Line 135:
| date = August 2014
| date = August 2014
| title = The End is Nigh: Generic Solving of Text-based CAPTCHAs
| title = The End is Nigh: Generic Solving of Text-based CAPTCHAs
| conference= WoOT 2014: Usenix Workshop on Offensive Security
| conference = WoOT 2014: Usenix Workshop on Offensive Security
| url = https://www.elie.net/publication/the-end-is-nigh-generic-solving-of-text-based-captchas
| url = https://www.elie.net/publication/the-end-is-nigh-generic-solving-of-text-based-captchas
| access-date = 5 April 2016
| archive-date = 16 April 2016
| archive-url = https://web.archive.org/web/20160416212924/https://www.elie.net/publication/the-end-is-nigh-generic-solving-of-text-based-captchas
| url-status = live
}}</ref>
}}</ref>
<ref name="motoyama">{{cite conference
<ref name="motoyama">{{cite conference
Line 153: Line 157:
| date = August 2010
| date = August 2010
| title = Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context.s
| title = Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context.s
| conference= USENIX Security Symposium, 2010
| conference = USENIX Security Symposium, 2010
| url = http://static.usenix.org/event/sec10/tech/full_papers/Motoyama.pdf
| url = http://static.usenix.org/event/sec10/tech/full_papers/Motoyama.pdf
| access-date = 5 April 2016
| archive-date = 29 May 2016
| archive-url = https://web.archive.org/web/20160529023244/http://static.usenix.org/event/sec10/tech/full_papers/Motoyama.pdf
| url-status = live
}}</ref>
}}</ref>
}}
}}


==Further references==
==Further references==
* von Ahn, L; M. Blum and J. Langford. (2004) "[http://www.cs.cmu.edu/afs/cs/Web/People/aladdin/papers/pdfs/y2004/captcha_cacm.pdf Telling humans and computers apart (automatically)]". ''Communications of the ACM'', '''47'''(2):57–60.
* von Ahn, L; M. Blum and J. Langford. (2004) "[http://www.cs.cmu.edu/afs/cs/Web/People/aladdin/papers/pdfs/y2004/captcha_cacm.pdf Telling humans and computers apart (automatically)]". ''Communications of the ACM'', '''47'''(2):57–60.


== External links ==
== External links ==
Line 170: Line 178:
* [https://web.archive.org/web/20170915204258/https://pdfs.semanticscholar.org/692a/31f65e29ea3667de46933245f53bda55a65b.pdf Reverse Engineering CAPTCHAs] Abram Hindle, Michael W. Godfrey, Richard C. Holt, 2009-08-24
* [https://web.archive.org/web/20170915204258/https://pdfs.semanticscholar.org/692a/31f65e29ea3667de46933245f53bda55a65b.pdf Reverse Engineering CAPTCHAs] Abram Hindle, Michael W. Godfrey, Richard C. Holt, 2009-08-24


{{CAPTCHAs}}
{{Authority control}}
{{Authority control}}



Latest revision as of 15:19, 19 June 2024

This CAPTCHA (reCAPTCHA v1) of "smwm" obscures its message from computer interpretation by twisting the letters and adding a slight background color gradient.

A CAPTCHA (/ˈkæp.ə/ KAP-chə) is a type of challenge–response test used in computing to determine whether the user is human in order to deter bot attacks and spam.[1]

The term was coined in 2003 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford.[2] It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart."[3] A historically common type of CAPTCHA (displayed as reCAPTCHA v1) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires entering a sequence of letters or numbers in a distorted image. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, CAPTCHAs are sometimes described as reverse Turing tests.[4]

Two widely used CAPTCHA services are Google's reCAPTCHA[5][6] and the independent hCaptcha.[7][8] It takes the average person approximately 10 seconds to solve a typical CAPTCHA.[9]

Purpose

CAPTCHAs' purpose is to prevent spam on websites, such as promotion spam, registration spam, and data scraping, and bots are less likely to abuse websites with spamming if those websites use CAPTCHA. Many websites use CAPTCHA effectively to prevent bot raiding. CAPTCHAs are designed so that humans can complete them, while most robots cannot.[10] Newer CAPTCHAs look at the user's behaviour on the internet, to prove that they are a human.[11] A normal CAPTCHA test only appears if the user acts like a bot, such as when they request webpages, or click links too fast.

History

Since the 1980s–1990s, users have wanted to make text illegible to computers.[12] The first such people were hackers, posting about sensitive topics to Internet forums they thought were being automatically monitored on keywords. To circumvent such filters, they replaced a word with look-alike characters. HELLO could become |-|3|_|_() or )-(3££0, and others, such that a filter could not detect all of them. This later became known as leetspeak.[13]

One of the earliest commercial uses of CAPTCHAs was in the Gausebeck–Levchin test. In 2000, idrive.com began to protect its signup page[14] with a CAPTCHA and prepared to file a patent.[12] In 2001, PayPal used such tests as part of a fraud prevention strategy in which they asked humans to "retype distorted text that programs have difficulty recognizing."[15] PayPal co founder and CTO Max Levchin helped commercialize this use.

A popular deployment of CAPTCHA technology, reCAPTCHA, was acquired by Google in 2009.[16] In addition to preventing bot fraud for its users, Google used reCAPTCHA and CAPTCHA technology to digitize the archives of The New York Times and books from Google Books in 2011.[17]

Invention

Eran Reshef, Gili Raanan and Eilon Solan, who worked at Sanctum on Application Security Firewall, first patented CAPTCHA in 1997. Their patent application details that "The invention is based on applying human advantage in applying sensory and cognitive skills to solving simple problems that prove to be extremely hard for computer software. Such skills include, but are not limited to processing of sensory information such as identification of objects and letters within a noisy graphical environment, signals and speech within an auditory signal, patterns and objects within a video or animation sequence".[18]

Characteristics

CAPTCHAs are automated, requiring little human maintenance or intervention to administer, producing benefits in cost and reliability.[19]

Modern text-based CAPTCHAs are designed such that they require the simultaneous use of three separate abilities—invariant recognition, segmentation, and parsing to complete the task.[20]

  • Invariant recognition refers to the ability to recognize letters despite a large amount of variation in their shapes.[21]
  • Segmentation is the ability to separate one letter from another, made difficult in CAPTCHAs.
  • Parsing refers to the ability to understand the CAPTCHA holistically, in order to correctly identify each character.[22]

Each of these problems poses a significant challenge for a computer, even in isolation. Therefore, these three techniques in tandem make CAPTCHAs difficult for computers to solve.[23]

Whilst primarily used for security reasons, CAPTCHAs can also serve as a benchmark task for artificial intelligence technologies. According to an article by Ahn, Blum and Langford,[24] "any program that passes the tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem."[25] They argue that the advantages of using hard AI problems as a means for security are twofold. Either the problem goes unsolved and there remains a reliable method for distinguishing humans from computers, or the problem is solved and a difficult AI problem is resolved along with it.[24]

Accessibility

Many websites require typing a CAPTCHA when creating an account to prevent spam. This image contains a user trying to type the CAPTCHA word "sepalbeam" to protect against automated spam.

CAPTCHAs based on reading text—or other visual-perception tasks—prevent blind or visually impaired users from accessing the protected resource.[26][27] Because CAPTCHAs are designed to be unreadable by machines, common assistive technology tools such as screen readers cannot interpret them. The use of CAPTCHA thus excludes a small percentage of users from using significant subsets of such common Web-based services as PayPal, Gmail, Orkut, Yahoo!, many forum and weblog systems, etc.[28] In certain jurisdictions, site owners could become targets of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with Section 508 in the United States.

CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA, such as reCAPTCHA, though a 2011 paper demonstrated a technique for defeating the popular schemes at the time.[29]

A method of improving CAPTCHA to ease the work with it was proposed by ProtectWebForm and named "Smart CAPTCHA".[30] Developers are advised to combine CAPTCHA with JavaScript. Since it is hard for most bots to parse and execute JavaScript, a combinatory method which fills the CAPTCHA fields and hides both the image and the field from human eyes was proposed.[31]

One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for blind users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = "mathematical"). However, these may be difficult for users with a cognitive disorder, such as dyscalculia.[32]

Challenges such as a logic puzzle, or trivia question can also be used as a CAPTCHA. There is research into their resistance against countermeasures.[33]

Circumvention

Two main ways to bypass CAPTCHA include using cheap human labor to recognize them, and using machine learning to build an automated solver.[34] According to former Google "click fraud czar" Shuman Ghosemajumder, there are numerous services which solve CAPTCHAs automatically.[35]

Machine learning-based attacks

There was not a systematic methodology for designing or evaluating early CAPTCHAs.[23] As a result, there were many instances in which CAPTCHAs were of a fixed length and therefore automated tasks could be constructed to successfully make educated guesses about where segmentation should take place. Other early CAPTCHAs contained limited sets of words, which made the test much easier to game. Still others[example needed] made the mistake of relying too heavily on background confusion in the image. In each case, algorithms were created that were successfully able to complete the task by exploiting these design flaws. However, light changes to the CAPTCHA could thwart them. Modern CAPTCHAs like reCAPTCHA rely on present variations of characters that are collapsed together, making them hard to segment, and they have warded off automated tasks.[36]

An example of a reCAPTCHA challenge from 2007, containing the words "following finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.
A CAPTCHA usually has a text box directly underneath where the user should fill out the text that they see. In this case, "sclt ..was here".

In October 2013, artificial intelligence company Vicarious claimed that it had developed a generic CAPTCHA-solving algorithm that was able to solve modern CAPTCHAs with character recognition rates of up to 90%.[37] However, Luis von Ahn, a pioneer of early CAPTCHA and founder of reCAPTCHA, said: "It's hard for me to be impressed since I see these every few months." 50 similar claims to that of Vicarious had been made since 2003.[38]

In August 2014 at Usenix WoOT conference, Bursztein et al. presented the first generic CAPTCHA-solving algorithm based on reinforcement learning and demonstrated its efficiency against many popular CAPTCHA schemas.[36]

In October 2018 at ACM CCS'18 conference, Ye et al. presented a deep learning-based attack that could consistently solve all 11 text captcha schemes used by the top-50 popular websites in 2018. An effective CAPTCHA solver can be trained using as few as 500 real CAPTCHAs.[39]

Human labor

It is possible to subvert CAPTCHAs by relaying them to a sweatshop of human operators who are employed to decode CAPTCHAs. A 2005 paper from a W3C working group said that they could verify hundreds per hour.[26] In 2010, the University of California at San Diego conducted a large scale study of CAPTCHA farms. The retail price for solving one million CAPTCHAs was as low as $1,000.[40]

Another technique consists of using a script to re-post the target site's CAPTCHA as a CAPTCHA to the attacker's site, which unsuspecting humans visit and solve within a short while for the script to use.[41][42]

In 2023, the generative AI chatbot ChatGPT, tricked a TaskRabbit worker to solve a CAPTCHA by telling the worker it was not a robot and had impaired vision.[43]

Outsourcing to paid services

There are multiple Internet companies like 2Captcha and DeathByCaptcha that offer human and machine backed CAPTCHA solving services for as low as US$0.50 per 1000 solved CAPTCHAs.[44] These services offer APIs and libraries that enable users to integrate CAPTCHA circumvention into the tools that CAPTCHAs were designed to block in the first place.[45]

Insecure implementation

Howard Yeend has identified two implementation issues with poorly designed CAPTCHA systems:[46] reusing the session ID of a known CAPTCHA image, and CAPTCHAs residing on shared servers.

Sometimes, if part of the software generating the CAPTCHA is client-side (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the un-rendered text. Some CAPTCHA systems use MD5 hashes stored client-side, which may leave the CAPTCHA vulnerable to a brute-force attack.[47]

Alternative CAPTCHAs

Some researchers have proposed alternatives including image recognition CAPTCHAs which require users to identify simple objects in the images presented. The argument in favor of these schemes is that tasks like object recognition are more complex to perform than text recognition and therefore should be more resilient to machine learning based attacks.

Chew et al. published their work in the 7th International Information Security Conference, ISC'04, proposing three different versions of image recognition CAPTCHAs, and validating the proposal with user studies. It is suggested that one of the versions, the anomaly CAPTCHA, is best with 100% of human users being able to pass an anomaly CAPTCHA with at least 90% probability in 42 seconds.[48] Datta et al. published their paper in the ACM Multimedia '05 Conference, named IMAGINATION (IMAge Generation for INternet AuthenticaTION), proposing a systematic way to image recognition CAPTCHAs. Images are distorted so image recognition approaches cannot recognise them.[49]

Microsoft (Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul) claim to have developed Animal Species Image Recognition for Restricting Access (ASIRRA) which ask users to distinguish cats from dogs. Microsoft had a beta version of this for websites to use.[50] They claim "Asirra is easy for users; it can be solved by humans 99.6% of the time in under 30 seconds. Anecdotally, users seemed to find the experience of using Asirra much more enjoyable than a text-based CAPTCHA." This solution was described in a 2007 paper to Proceedings of 14th ACM Conference on Computer and Communications Security (CCS).[51] It was closed in October 2014.[52]

See also

References

  1. ^ "The reCAPTCHA Project – Carnegie Mellon University CyLab". www.cylab.cmu.edu. Archived from the original on 27 October 2017. Retrieved 13 January 2017.
  2. ^ von Ahn, Luis; Blum, Manuel; Hopper, Nicholas J.; Langford, John (May 2003). "CAPTCHA: Using Hard AI Problems for Security" (PDF). Advances in Cryptology—EUROCRYPT 2003. EUROCRYPT 2003: International Conference on the Theory and Applications of Cryptographic Techniques. Lecture Notes in Computer Science. Vol. 2656. pp. 294–311. doi:10.1007/3-540-39200-9_18. ISBN 978-3-540-14039-9. Archived (PDF) from the original on 4 May 2019. Retrieved 30 August 2019.
  3. ^ "What is CAPTCHA?". Google Support. Google Inc. Archived from the original on 6 August 2020. Retrieved 9 September 2022. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a [...]
  4. ^ Mayumi Takaya; Yusuke Tsuruta; Akihiro Yamamura (30 September 2013). "Reverse Turing Test using Touchscreens and CAPTCHA" (PDF). Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 4 (3): 41–57. doi:10.22667/JOWUA.2013.09.31.041. Archived (PDF) from the original on 22 August 2017.
  5. ^ "What is reCAPTCHA? –?reCAPTCHA Help". support.google.com. Archived from the original on 20 July 2023. Retrieved 20 July 2023.
  6. ^ Sulgrove, Jonathan (7 July 2022). "reCAPTCHA: What It Is and Why You Should Use It on Your Website – TSTS". Twin State Technical Services. Archived from the original on 10 November 2022. Retrieved 10 November 2022.
  7. ^ "Websites using hCaptcha". trends.builtwith.com. Archived from the original on 10 November 2022. Retrieved 10 November 2022.
  8. ^ "hCaptcha – About Us". www.hcaptcha.com. Archived from the original on 20 July 2023. Retrieved 20 July 2023.
  9. ^ Bursztein, Elie; Bethard, Steven; Fabry, Celine; Mitchell, John C.; Jurafsky, Dan (2010). "How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation" (PDF). 2010 IEEE Symposium on Security and Privacy. pp. 399–413. CiteSeerX 10.1.1.164.7848. doi:10.1109/SP.2010.31. ISBN 978-1-4244-6894-2. S2CID 14204454. Archived (PDF) from the original on 8 August 2018. Retrieved 30 March 2018.
  10. ^ Stec, Albert (12 June 2022). "What is CAPTCHA and How Does It Work?". Baeldung on Computer Science. Archived from the original on 1 November 2022. Retrieved 1 November 2022.
  11. ^ "What is a CAPTCHA?". Cloudflare. 1 November 2022. Archived from the original on 27 October 2022. Retrieved 1 November 2022.
  12. ^ a b "idrive turing patent application". Archived from the original on 15 March 2023. Retrieved 19 May 2017.
  13. ^ "h2g2 – An Explanation of l33t Speak – Edited Entry". h2g2. 16 August 2002. Archived from the original on 6 September 2011. Retrieved 3 June 2015.
  14. ^ "idrive turing signup page". Google Drive. Archived from the original on 15 March 2023. Retrieved 19 May 2017.
  15. ^ Stringham, Edward P (2015). Private Governance : creating order in economic and social life. Oxford University Press. p. 105. ISBN 978-0-19-936516-6. OCLC 5881934034.
  16. ^ "Teaching computers to read: Google acquires reCAPTCHA". Google Official Blog. Archived from the original on 31 August 2019. Retrieved 29 October 2018.
  17. ^ Gugliotta, Guy (28 March 2011). "Deciphering Old Texts, One Woozy, Curvy Word at a Time". The New York Times. Archived from the original on 17 November 2017. Retrieved 29 October 2018.
  18. ^ US 2005/0114705 A1, Reshef, Eran; Raanan, Gil & Solan, Eilon, "Method and system for discriminating a human action from a computerized action", published 26 May 2005  Archived 24 February 2019 at the Wayback Machine
  19. ^ "How CAPTCHAs work | What does CAPTCHA mean?". Cloudflare. Archived from the original on 27 October 2022. Retrieved 27 October 2022.
  20. ^ Chellapilla, Kumar; Larson, Kevin; Simard, Patrice; Czerwinski, Mary. "Designing Human Friendly Human Interaction Proofs (HIPs)" (PDF). Microsoft Research. Archived from the original (PDF) on 10 April 2015.
  21. ^ Karimi-Rouzbahani, Hamid; Bagheri, Nasour; Ebrahimpour, Reza (31 October 2017). "Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models". Scientific Reports. 7 (1): 14402. Bibcode:2017NatSR...714402K. doi:10.1038/s41598-017-13756-8. ISSN 2045-2322. PMC 5663844. PMID 29089520.
  22. ^ "Making CAPTCHAs Expensive Again: If You're Using Text-Based CAPTCHAs, You're Doing It Wrong | Tripwire". www.tripwire.com. Archived from the original on 28 October 2022. Retrieved 28 October 2022.
  23. ^ a b Bursztein, Elie; Martin, Matthieu; Mitchell, John C. (2011). "Text-based CAPTCHA Strengths and Weaknesses". ACM Computer and Communication Security 2011 (CSS'2011). Archived from the original on 24 November 2015. Retrieved 5 April 2016.
  24. ^ a b von Ahn, Luis; Blum, Manuel; Hopper, Nicholas J.; Langford, John (2003). "CAPTCHA: Using Hard AI Problems for Security" (PDF). Advances in Cryptology—EUROCRYPT 2003. Lecture Notes in Computer Science. Vol. 2656. pp. 294–311. doi:10.1007/3-540-39200-9_18. ISBN 978-3-540-14039-9. S2CID 5658745. Archived (PDF) from the original on 4 May 2019. Retrieved 30 August 2019.
  25. ^ Moy, Gabriel; Jones, Nathan; Harkless, Curt; Potter, Randall (2004). Distortion estimation techniques in solving visual CAPTCHAs (PDF). Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. IEEE. pp. 23–28. doi:10.1109/CVPR.2004.1315140. ISBN 978-0-7695-2158-9. Archived from the original (PDF) on 29 July 2020.
  26. ^ a b May, Matt (23 November 2005). "Inaccessibility of CAPTCHA". W3C. Archived from the original on 21 May 2012. Retrieved 27 April 2015.
  27. ^ Shea, Michael (19 November 2015). "CAPTCHA: Spambots, eBooks and the Turing Test". The Skinny. Archived from the original on 27 January 2016. Retrieved 9 January 2016.
  28. ^ "Inaccessibility of CAPTCHA". www.w3.org. Archived from the original on 4 November 2020. Retrieved 31 October 2020.
  29. ^ Bursztein, Elie; Beauxis, Romain; Perito, Hristo; Paskov, Daniele; fabry, Celine; Mitchell, John C. (2011). "The Failure of Noise-Based Non-continuous Audio Captchas". 2011 IEEE Symposium on Security and Privacy. pp. 19–31. doi:10.1109/SP.2011.14. ISBN 978-1-4577-0147-4. S2CID 6933726. Archived from the original on 16 April 2016. Retrieved 5 April 2016.
  30. ^ "Smart Captcha". Protect Web Form .COM. 8 October 2006. Archived from the original on 4 November 2016. Retrieved 15 September 2017.
  31. ^ "Invisible reCAPTCHA". Google Developers. Archived from the original on 16 January 2020. Retrieved 28 October 2022.
  32. ^ "Inaccessibility of CAPTCHA". www.w3.org. Archived from the original on 27 October 2022. Retrieved 27 October 2022.
  33. ^ Gao, Song; Mohamed, Manar; Saxena, Nitesh; Zhang, Chengcui (23 June 2017). "Emerging-Image Motion CAPTCHAs: Vulnerabilities of Existing Designs, and Countermeasures". IEEE Transactions on Dependable and Secure Computing (Website). 16 (6) (1st ed.): 1040–1053. doi:10.1109/TDSC.2017.2719031. ISSN 1941-0018. S2CID 41097185.
  34. ^ Jakobsson, Markus (August 2012). The death of the Internet. Archived from the original on 15 October 2014. Retrieved 4 April 2016.
  35. ^ Ghosemajumder, Shuman (8 December 2015). "The Imitation Game: The New Frontline of Security". InfoQ. InfoQ. Archived from the original on 23 March 2019. Retrieved 8 December 2015.
  36. ^ a b Bursztein, Elie; Aigrain, Johnathan; Mosciki, Angelika; Michell, John C. (August 2014). The End is Nigh: Generic Solving of Text-based CAPTCHAs. WoOT 2014: Usenix Workshop on Offensive Security. Archived from the original on 16 April 2016. Retrieved 5 April 2016.
  37. ^ Summers, Nick. "Vicarious claims its AI software can crack up to 90% of CAPTCHAs offered by Google, Yahoo and PayPal". TNW. Archived from the original on 15 September 2018. Retrieved 19 June 2018.
  38. ^ Hof, Robert. "AI Startup Vicarious Claims Milestone In Quest To Build A Brain: Cracking CAPTCHA". Forbes. Archived from the original on 15 September 2018. Retrieved 25 August 2017.
  39. ^ "Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach" (PDF). 25th ACM Conference on Computer and Communications Security (CCS), 2018. doi:10.1145/3243734.3243754. S2CID 53106794. Archived (PDF) from the original on 29 October 2020. Retrieved 16 March 2020.
  40. ^ Motoyama, Marti; Levchenko, Kirill; Kanich, Chris; McCoy, Damon; Geoffrey, Voelker; Savage, Stefan (August 2010). Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context.s (PDF). USENIX Security Symposium, 2010. Archived (PDF) from the original on 29 May 2016. Retrieved 5 April 2016.
  41. ^ Doctorow, Cory (27 January 2004). "Solving and creating captchas with free porn". Boing Boing. Archived from the original on 9 February 2006. Retrieved 27 April 2015.
  42. ^ "Hire People To Solve CAPTCHA Challenges". Petmail Design. 21 July 2005. Archived from the original on 18 September 2020. Retrieved 27 April 2015.
  43. ^ Hurler, Kevin. "Chat-GPT Pretended to Be Blind and Tricked a Human Into Solving a CAPTCHA". Gizmodo. Archived from the original on 11 April 2023. Retrieved 11 April 2023.
  44. ^ "Top 10 Captcha Solving Services Compared". Archived from the original on 15 December 2018. Retrieved 10 December 2018.
  45. ^ "How Cybercriminals Bypass CAPTCHA". www.f5.com. Archived from the original on 27 October 2022. Retrieved 27 October 2022.
  46. ^ Yeend, Howard (2005). "Breaking CAPTCHAs Without Using OCR". (pureMango.co.uk). Archived from the original on 25 June 2017. Retrieved 22 August 2006.
  47. ^ "CTFtime.org / #kksctf open 2019 / Kackers blockchained notes / Writeup". ctftime.org. Archived from the original on 27 October 2022. Retrieved 27 October 2022.
  48. ^ "Image Recognition CAPTCHAs" (PDF). Cs.berkeley.edu. Archived from the original (PDF) on 10 May 2013. Retrieved 28 September 2013.
  49. ^ "Imagination Paper". Infolab.stanford.edu. Archived from the original on 2 October 2013. Retrieved 28 September 2013.
  50. ^ "Asirra is a human interactive proof that asks users to identify photos of cats and dogs". Microsoft. Archived from the original on 15 December 2008.
  51. ^ Elson, Jeremy; Douceur, John; Howell, Jon; Saul, Jared (October 2007). Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization. Proceedings of 14th ACM Conference on Computer and Communications Security. Microsoft. Archived from the original on 15 December 2008. Retrieved 15 September 2017.
  52. ^ "After 8 years of operation, Asirra is shutting down effective October 1, 2014. Thank you to all of our users!". Microsoft. Archived from the original on 7 February 2015.

Further references

External links