Surreptitious crowd sourced book digitizing… one word at a time.
Last night I attended the Internet Archive’s Open Content Alliance meeting. I was really amazed by how far their book scanning, (over a million books now), and contextualizing projects have come. The two most amazing things were the blog embed tools for the scanned book interface (more on that soon), and the most amazing use of Captcha technology I have seen. One of the inventors of Captcha’s, those funny squiggly words used to prove your a human when you sign up for something, has now put this wasted time and human brain power to work.
ReCaptcha is now getting its difficult to decipher words from scanning projects like the Internet Archive’s and is using the human effort to digitize the words the computer cant recognize. Over a half million man hours a year can now go to digitizing books instead of just wasting your time.