By John Dunn for Sophos
As CAPTCHA-haters know to their frequent irritation, rumours of the imminent death of the text-based Completely Automated Procedures for Telling Computers and Humans Apart tend to be exaggerated.
On the contrary, despite being battered by proof-of-concept attacks and more sophisticated replacements (Google’s reCAPTCHA version 2 for instance), incarnations of text CAPTCHAs made up of jumbled letters and numbers can still be found surprisingly frequently across the internet.
But perhaps text CAPTCHAs have finally met their match thanks to a group of researchers from Northwest University and Peking University in China, and Lancaster University in the UK.
Their idea, as outlined in Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach, is to attack this type of CAPTCHA using a recent development called the Generative Adversarial Network (GAN).
This is a type of neural network comprising two parts – the generative network that synthesises lots of examples of the target (i.e. text CAPTCHAS), and a discriminative network that assesses the output against examples from the real world.
This should result in a virtuous circle in which the first network gradually produces better simulations that the second gets better at spotting.
When the second part of the network can no longer detect a difference between the simulated CAPTCHAs and real ones, these are fed into a ‘solver’ which further refines these simulated solutions against real-world systems.
Does it work?
Although GANs have been pitted against image-based CAPTCHAs in the past, this is apparently the first time they have been pitted against text equivalents with good results.
In total, the researchers tested their system against 11 text CAPTCHAs used by big internet companies, achieving alarmingly good results.
The easiest to beat were Sohu (92%), eBay (86.6%), JD.com (86%), Wikipedia (78%), and Microsoft (69.6%), while at the other extreme was Google (3%).
Comparing their results against 22 CAPTCHAs that have been attacked by other tests, the researcher’s system out-performed rivals by a significant margin.
Most impressive of all is the ease with which the researchers were able to do all this using only 500 genuine CAPTCHAs to refine the solver instead of the millions previously needed.
It also did its work at a rate of 0.05 seconds per CAPTCHA from a humble desktop computer and GPU.
Lancaster University’s Dr Zheng Wang:
We show for the first time that an adversary can quickly launch an attack on a new text-based captcha scheme with very low effort. This is scary because it means that this first security defence of many websites is no longer reliable.
Google, for one, have put a lot of effort into new types of CAPTCHA (or reCAPTCHA as Google calls its technology), culminating more recently in their complete disappearance in favour of an alternative system that models a user’s (or bot’s) interaction with websites in a more general way to sift friend from foe.
That makes life harder for neural net AI because there are no text or images to attack. But it surely won’t be long before researchers start trying to figure out how to simulate humans to beat these systems too.
CAPTCHA’s days might be numbered, but like the academics with their GANs, cybercriminals are unlikely to give up that easily.