Big Ideas For Those With Time

sudara · August 29, 2006, 8:56pm

On 8/29/06, Alastair M. [email protected] wrote:

Larger sites typically offer an alternative captcha, for example an
audio captcha, for those who have poor sight.

Rob

–

sudara · August 30, 2006, 7:33pm

I’ve posted the hacked code at
http://null.in/2006/08/30/paptcha-a-captcha-killer/ , in case anyone
is interested.

Thanks,
Pratik

On 8/30/06, sudara [email protected] wrote:

So, the pony express is easily buildable, bot-proof, not-porn proof,
and not accessible.

Thanks Pratik for the example - it’s really clear.

–
rm -rf / 2>/dev/null - http://null.in

Dont judge those who try and fail, judge those who fail to try…

sudara · August 31, 2006, 5:18am

Sorry, this should have gone out a couple of days ago but I was
interrupted
and forgot about it for a while.

On Mon, Aug 28, 2006 at 09:07:49PM -0000, sudara wrote:
} > > It’s a nice idea, but I’m skeptical. Show me an example image that
} > > won’t make the user feel like s/he is playing Where’s Waldo (i.e.
} > > doesn’t annoy the user) and has sufficient noise to make image
} > > recognition difficult.
}
} Click on the dude with the baseball hat to prove your human-ness:
} http://caboo.se/

Actually, that’s pretty compelling. I have a few concerns, though:

file size (which really means bandwidth usage)
automated generation such that the target image is dependably
distinct
from the noise images
human time required to name/index images
ease or difficulty of training an image recognizer

In the interest of clarity, I will define the terms I am using in the
discussion below. I’ll be talking about target images, noise images,
feature phrases, and collages. A feature phrase is simply a textual
phrase
identifying a feature of an image. A collage is a generated image that
is
presented to the user with a feature phrase. The user is expected to
click
in the region of the collage that shows the target image, i.e. the
component image of the collage corresponding to the given feature
phrase.
All other component images in the collage are noise images. Note that
the
same generated image can be used in multiple different collages by
associating it with different feature phrases, thus making a different
component image the target image. Likewise, target and noise images are
only identified as such in the context of a collage. Whew. Okay, on to
the
discussion.

A typical captcha image is 4-12kb; the setup you show on caboo.se uses
17
images of 9-41kb. This isn’t entirely fair as a comparison, however,
since
one would use a single generated image. A screengrab of the combined
image,
saved to a JPEG compressed at very low quality, still comes out to 18kb.
It
may not look like much, but 18kb vs. 4kb more than quadruples the
bandwidth
used for human identification. That doesn’t necessarily mean it isn’t
worth
it, but it counts against the idea.

Assume for the moment that you have a large set of images intended to be
target images in collages, each of which is identified with some
specific
feature that you will be asking the human to find (e.g. baseball cap).
If
you have a separate set of noise images, you will need to be sure that
none
of them have a feature that has been identified in a target image. If
you
do not have that information, your image generator may have one
particular
target image in mind for a particular collage, but another component
image
may satisfy the feature phrase. Generating the information requires
human
work. This leads nicely into the next concern…

Human work is a significant cost. You need a human to look at every
component image your generator will be using and either associate at
least
one feature phrase with it or verify that it does not satisfy any other
feature phrases. That assumes two separate pools of images, however: one
for target images and one for noise images. If you have a single pool
then
a human will still need to examine each image, but it will be necessary
to
identify every plausible feature of each image to avoid generating a
collage in which more than one component image satisfies the feature
phrase. (Yes, you can allow multiple target collage regions, but that
doesn’t change the human work involved; the generation process still
needs
to be aware of multiple target images.) This also leads nicely into the
concern that follows…

Image recognition algorithms are pretty good these days. They are also
very
trainable. If I, as a spammer, go to your site and see this kind of
identification system I can tell my spambot that this feature phrase
goes
with the image in this region. A decent recognition algorithm should be
able to identify the same image in another collage with high confidence,
regardless of rotation and scaling (and humans start having problems
with
identification if you apply other transformations).

So now let’s put all these concerns together:

Suppose you have 1000 component images associated one-to-one with
feature
phrases. (It’s irrelevant whether you are using a split pool or not.
Multiple feature phrases per image and multiple images per feature
phrase
will have an effect on the numbers, but it’s a linear multiplier.) We’ll
assume the collages are generated offline, so ignore online CPU costs.
You
deliver up a collage to a spammer, who tells his spambot that the
component
image in the appropriate region corresponds to the given feature phrase.
He
does this 105 more times, which is the expected number of times needed
to
see 10% of the target images and feature phrases. Now the spambot can
consistently get through the captcha 10% of the time. With a few dozen
zombified Windows boxes to attack it, 10% is certainly enough to put a
steady stream of spam on your site.

Meanwhile, how much effort have you expended? You tagged 1000 component
images with features phrases (no mean task), and possibly verified that
some larger number of noise images do not conflict with the target
images.
Let’s say it takes five minutes to tag each target image with a feature
phrase. That comes out to 83 hours and 20 minutes, which is two
full-time
weeks.

How much effort did the spammer expend? Suppose it also took him five
minutes for each attempt, though two or three is probably more
realistic.
That’s eight hours and 50 minutes. It’s more than a day’s work, but it’s
a
lot less than two weeks. If it only takes him two minutes each, that’s a
measly 3:32. If 5% is a sufficient hit rate for his purposes, that time
gets even shorter.

Ultimately, you’re serving larger files than captchas, you have to tag
all
of your images by hand, which takes a tremendous amount of human time,
and
with very little human time a spammer can train an image recognizer to
get
a sufficient probability of success to accomplish his goals. You lose
worse
than you would with a captcha, and you pay more for it.

–Greg

sudara · August 30, 2006, 7:17pm

So, the pony express is easily buildable, bot-proof, not-porn proof,
and not accessible.

Thanks Pratik for the example - it’s really clear.

sudara · August 31, 2006, 6:50pm

@Pratik: +1 for posting the code!

@Greg. How about this:

CAPTCHA’s strength:
Bots can’t read the obscured text being presented in the image. Stop
there.

My primary concern:
Reducing the software’s demand on humans.

Everything else (such as bandwidth, accessiblity) is important, but
secondary to the notion that a user must WORK to
participate/contribute/use your software. Maybe I am from Mars, but
CAPTCHA is just one of those many features that have snuck their way
into (especially web) apps that sacrifice real usability in a glaring
and obvious way. Excess form fields? Bad UI design? Clear copy?
Terribly important but these come second in my book to ‘make the user
do work that software should do’

Pony Authentication isn’t 100% better, but it’s a hell of a lot easier
to identify a pony and click it than it is to ask a user to read and
type what looks like a unix admin’s password.

A solution offered by someone who wrote to me privately follows,
blatently plagerized as it was well expressed:

Put up a bunch of pictures (5x5 grid?), and ask the user to click
one at random. “please click the picture of the young girl”.
Use captcha-style noise generation to create a graphic which
includes the instructions. people are good at reading words, even with
noise. they don’t have to get each letter right, they just need to
know that it said “young girl” rather than “purple wolf”. randomizing
the graphic with the instructions means that the computer
can’t automatically match up pairs.

Faisal, the author of that concept combined the stregth of CAPTCHA with
the strength of Pony Authentication. The Bot can’t read the
instructions, asking a user to make one single click

Greg, I don’t know if that sounds like I’m avoiding your well-thought
out and practical line of questioning, but I’m a fan of just pulling
back to the main idea and going from there if it looks like a specific
implementation isn’t practical.

Right now, it looks like Pony Authentication as described by Faisal
would be a better solution than CAPTCHA given that the priority is to
minimize user demand. Most users would rather spend 1-5 seconds
downloading some extra images then 10 seconds acting like a chimp,
hunting and pecking on the keyboard.

chao,
sudara

sudara · September 1, 2006, 9:43pm

http://www.kittenauth.com/

That’s right.

sudara · September 1, 2006, 7:09pm

On Thu, Aug 31, 2006 at 04:41:48PM -0000, sudara wrote:
[…]
} CAPTCHA’s strength:
} Bots can’t read the obscured text being presented in the image. Stop
} there.
}
} My primary concern:
} Reducing the software’s demand on humans.
[…]
} Pony Authentication isn’t 100% better, but it’s a hell of a lot easier
} to identify a pony and click it than it is to ask a user to read and
} type what looks like a unix admin’s password.

It is, indeed, easier. It is easy enough for a spambot to do.

} A solution offered by someone who wrote to me privately follows,
} blatently plagerized as it was well expressed:
}
} 1. Put up a bunch of pictures (5x5 grid?), and ask the user to click
} one at random. “please click the picture of the young girl”.
}
} 2. Use captcha-style noise generation to create a graphic which
} includes the instructions. people are good at reading words, even
with
} noise. they don’t have to get each letter right, they just need to
} know that it said “young girl” rather than “purple wolf”. randomizing
} the graphic with the instructions means that the computer
} can’t automatically match up pairs.
}
} Faisal, the author of that concept combined the stregth of CAPTCHA
with
} the strength of Pony Authentication. The Bot can’t read the
} instructions, asking a user to make one single click

This isn’t bad. On the other hand, see the other branch of this thread
about doing it all in text, which has several advantages over both
captchas
and “Pony Authentication”:

it is accessible to the visually impaired
it requires less bandwidth (no images)
it requires minimal effort to develop lists of questions and answers

} Greg, I don’t know if that sounds like I’m avoiding your well-thought
} out and practical line of questioning, but I’m a fan of just pulling
} back to the main idea and going from there if it looks like a specific
} implementation isn’t practical.

If you aren’t familiar with the state of the art in AI algorithms and
don’t
do the analysis of your candidate solutions, you don’t have the tools to
determine whether they will achieve your goals. I gave you an analysis
of
“Pony Authentication”. If you are willing to accept those tradeoffs, it
is
a viable solution for you; if not, then it isn’t.

Also, you need to be clear about what you are optimizing, and metrics
for
evaluating the dimensions you are optimizing. If you are only minimizing
user demand, you don’t use any verification system at all and spam gets
posted. If you are also minimizing spam that gets posted, you must have
a
way of relating the value of minimizing one or the other. Even so, the
appropriate solution in that case is to skip the verification system and
just not publish anything until it’s been reviewed by a human.

Realistically, you are optimizing on many variables including, but not
limited to:

ease of use
quantity of spam
delay between submission and publication
cost (which is really a combination of human time, bandwidth costs,
etc.)

Before adopting a solution you need a decently solid idea of where the
proposed solution lies in this high-dimensional space.

So carry on your out-of-the-box thinking and visionary adventures. Just
remember that a proposed solution that doesn’t solve your problem isn’t
of
much use. It’s a lot better to know that before implementing it and
putting
it in production.

} Right now, it looks like Pony Authentication as described by Faisal
} would be a better solution than CAPTCHA given that the priority is
to
} minimize user demand. Most users would rather spend 1-5 seconds
} downloading some extra images then 10 seconds acting like a chimp,
} hunting and pecking on the keyboard.

Your last sentence is presented as a statement of fact, yet I suspect it
is
simply a statement of what you believe to be intuitively true. I know of
no
evidence to support it; do you? The studies I’ve seen show that users
perceive a length of time spent waiting for something to happen as a
much
longer time than the same length of time when they are actually doing
something.

I know I’m beating a dead horse, and that this isn’t likely to actually
change your thinking in these things, but I’m going to say it anyway.
There
is plenty of computer science and cognitive science research that covers
the things you are thinking about. The literature is published, and much
of
the information is available from a Google search. Thinking outside the
box
is not the same as pontificating from a position of ignorance. You can
only
see farther by standing on the shoulders of those who have gone before
you
if you know what they learned.

} chao,
} sudara
–Greg