Art Of Programming

musings by Dmytrii Nagirniak

Contributing Information via reCAPTCHA

Computers are very powerful these days. But it’s not still enough to solve mathematical problems. One way to leverage the calculating power is distributing it among users all around the world.

People can just run small applications to help scientific world. Not a problem at all. The software will just do its stuff silently.

Also people can help science manually. Who is going to spend time for that? Will you do it? Well, maybe. But there is very good solution to this problem: let people contribute even not letting them know about it.

This is exactly what reCAPTCHA is doing. It digitises old books using OCR. But prompts users to decipher unrecognised words.

What does user see?

Users generally see prompt to enter two words to prove they are not robots, but humans. Nothing special. Nothing at all.

reCAPTCHA Sample

What is behind the scene?

This is the most interesting thing. reCAPTCHA has its own API (ASP.NET Control is also available), so it is easy to integrate it with any server.

And here’s how it works:

  1. User enters TWO words.
  2. One of the words is known for reCAPTCHA, another - isn’t (unknown).
  3. If the known word is correct, reCAPTCHA assumes that user entered unknown word correctly. It also tells user about correct answer.
  4. To be more confident that unknown word is deciphered correctly it is shown on other images.

As simple as that.

We also can try to enter only one word correctly and after several times we will pass the CAPTCHA! It doesn’t look good for the user, but it still does the job. You have to be a human to recognise at least one word. And wrong word should not abuse the system because of it will be sent to many people and it is so unlikely they all will provide the same and wrong word.

There’s one note about it. If you want to distribute your (web) application you will push your clients to obtain public key for their domain in order to use the reCAPTCHA.

Comments