Classifier::Bayes - handling "none of the above" cases

jfoxny · February 5, 2007, 11:37pm

I’m using Classifier::Bayes and am trying to figure out how to handle
classifications that don’t fit any of my categories. It seems that it
will guess a category no matter how poor the match. Is there a good way
to use the hash values from #classify() to “figure out” how bad a match
is? I’d like to handle those cases as a “none of the above”. This isn’t
the greatest example but hopefully it’ll work well enough:

Imagine three categories (shopping, health, and technology) and I want
to classify the text “cows dirt barn”. Obviously, those words aren’t a
good fit for any of the three. What I want is a way to determine how bad
an attemped classification is, and react to it. I was thinking maybe a
catch-all, “empty” category could handle this?

I’m new to the world of classifiers in general, so this may be an easy
question. Any and all suggestions are greatly appreciated.

Thanks!
-Jason

jfoxny · February 6, 2007, 4:10pm

On Tue, 06 Feb 2007 07:38:00 +0900, Jason F. wrote:

an attemped classification is, and react to it. I was thinking maybe a
catch-all, “empty” category could handle this?

Wouldn’t it be nice? Unfortunately for you, a bayesian classifier (and
most other classification algorithms) require examples for every class
that it could possibly categorize. The classifier just chooses the most
like class from what it’s been trained on. If you wanted a “none of the
above” class, then you’d need to provide examples from that none of the
above class. It’s not so easy to decide what’s representative of “none
of
the above”, and even if you could do so, it would probably violate the
assumptions of the classifier and lead to reduced performance. Thus, we
have to come up with more creative problem-specific solutions to handle
something resembling a “none-of-the-above” case, usually solutions that
change the definition of the problem quite dramatically.

–Ken

jfoxny · February 6, 2007, 6:14pm

above" class, then you’d need to provide examples from that none of the
above class. It’s not so easy to decide what’s representative of “none of
the above”, and even if you could do so, it would probably violate the
assumptions of the classifier and lead to reduced performance. Thus, we
have to come up with more creative problem-specific solutions to handle
something resembling a “none-of-the-above” case, usually solutions that
change the definition of the problem quite dramatically.

Really a “none of the above” filter is of limited usefulness.
Categories in Bayesian classifiers are all about compartmentalization.
The goal isn’t really categorization, it’s training the filter. You
really want the filter to separate on an unambiguous difference, like
“spam” vs. “not spam,” because this will teach the filter to
differentiate unambiguously. That’s what Bayesian filters are good at
doing.

Giving a Bayesian classifier a “none of the above” category will just
confuse it. It doesn’t work by checking category A, then category B,
then finally category C. It works by aggregating data and extracting
probabilistic similarity. The features shared by the “none of the
above” will be too varied and numerous for any similarity to be
extracted. Instead of all the stuff that doesn’t fit anywhere else
going into “none of the above,” everything will run a risk of
going into “none of the above,” because “none of the above” will be
too vaguely specified to be dissimilar from anything else.

Really you would either want to use the classifier differently, or use
a different technique altogether.

jfoxny · February 6, 2007, 6:43pm

Giles B. wrote:

above" class, then you’d need to provide examples from that none of the
above class. It’s not so easy to decide what’s representative of “none of
the above”, and even if you could do so, it would probably violate the
assumptions of the classifier and lead to reduced performance. Thus, we
have to come up with more creative problem-specific solutions to handle
something resembling a “none-of-the-above” case, usually solutions that
change the definition of the problem quite dramatically.

Really a “none of the above” filter is of limited usefulness.
Categories in Bayesian classifiers are all about compartmentalization.
The goal isn’t really categorization, it’s training the filter. You
really want the filter to separate on an unambiguous difference, like
“spam” vs. “not spam,” because this will teach the filter to
differentiate unambiguously. That’s what Bayesian filters are good at
doing.

Giving a Bayesian classifier a “none of the above” category will just
confuse it. It doesn’t work by checking category A, then category B,
then finally category C. It works by aggregating data and extracting
probabilistic similarity. The features shared by the “none of the
above” will be too varied and numerous for any similarity to be
extracted. Instead of all the stuff that doesn’t fit anywhere else
going into “none of the above,” everything will run a risk of
going into “none of the above,” because “none of the above” will be
too vaguely specified to be dissimilar from anything else.

Really you would either want to use the classifier differently, or use
a different technique altogether.

First of all Giles and Ken, thanks for your answers. It sounds like a
Bayesian approach won’t work for what I want to do. This same gem has
another classifer inside it, called Classifier::LSI which does latent
semantic indexing. I don’t know much about it yet other than it’s not as
fast or as small as a Bayesian classifier. However, would it be more
suited to supporting a “none of the above” feature?

Or would you recommend something entirely different?

Many thanks,
-Jason

jfoxny · February 6, 2007, 8:05pm

Giles B. wrote:

First of all Giles and Ken, thanks for your answers. It sounds like a
Bayesian approach won’t work for what I want to do. This same gem has
another classifer inside it, called Classifier::LSI which does latent
semantic indexing. I don’t know much about it yet other than it’s not as
fast or as small as a Bayesian classifier. However, would it be more
suited to supporting a “none of the above” feature?

Or would you recommend something entirely different?

Well, a latent semantic indexer is a whole different thing. I know of
a company that built a search engine with latent semantic analysis. If
you search it for naked pictures of Britney Spears – just as a stupid
example – it’ll also ask you if you want to hear her music or if
you’re interested in naked pictures of Lindsay Lohan as well. Latent
semantic indexers are a very smart technology but I think they require
extremely large data sets to be useful. They compare patterns of
linkage to identify things which must have some latent semantic
connection, that is to say, words that are different but mean similar
things. There are very few problems for which latent semantic analysis
isn’t overkill.

Well, within the not-too-distant future, we’ll be handling a sizable
dataset so LSI might make sense after all. This would be for a system
we’re building that’s doing something quite cool but I can’t shout all
the details from the rooftops just yet Would it be all right for me
to give you specifics via email? I’d be happy to edit the Ruby-germane
portions of our offline conversation and post them back onto the forum.
My email is jason at seethroo dot us.

Again, many thanks!
-Jason

jfoxny · February 7, 2007, 12:10am

On Wed, 07 Feb 2007 04:05:35 +0900, Jason F. wrote:

Well, a latent semantic indexer is a whole different thing. I know of

Well, within the not-too-distant future, we’ll be handling a sizable
dataset so LSI might make sense after all. This would be for a system
we’re building that’s doing something quite cool but I can’t shout all
the details from the rooftops just yet Would it be all right for me
to give you specifics via email? I’d be happy to edit the Ruby-germane
portions of our offline conversation and post them back onto the forum.
My email is jason at seethroo dot us.

I suggest learning about machine learning techniques in general before
you
try to do anything quite cool that you can’t shoud from the rooftops
just yet.

I recommend “Machine Learning” by Tom Mitchell[1].

–Ken
[1] Machine Learning textbook

jfoxny · February 7, 2007, 12:11am

On Wed, 07 Feb 2007 02:14:23 +0900, Giles B. wrote:

above" class, then you’d need to provide examples from that none of the
above class. It’s not so easy to decide what’s representative of “none of
the above”, and even if you could do so, it would probably violate the
assumptions of the classifier and lead to reduced performance. Thus, we
have to come up with more creative problem-specific solutions to handle
something resembling a “none-of-the-above” case, usually solutions that
change the definition of the problem quite dramatically.

Really a “none of the above” filter is of limited usefulness.

A “none of the above” filter can be quite useful. Supposing you have an
unknown text, and a sample of text from possible authors of the
document,
and you want to know who wrote it, so you set up a classifier[1], train
it
on the known authors and stick your text in. The answer will be one of
the
authors who you trained the classifier for. Do you actually know that
your
text was written by one of these guys? Maybe you don’t. Then you need a
different problem: authorhship verification, which can be solved with
different techniques.

Authorship verification[2] is a completely different problem. You have
text by an unknown author, and text by a known author, and you ask “are
these texts written by the same author?” The technique for doing this
abuses machine learning classifiers a bit, and as you can see it altered
the problem definition quite dramatically, but this is a
“none-of-the-above” capable version of the first problem.

–Ken B.
[1] Note that authorship classifiers use much more interesting features
than just word frequencies.
[2]

jfoxny · February 7, 2007, 1:47am

[2] http://www.cs.biu.ac.il/~koppel/papers/authorship-icml-formatted-01.04.pdf
True enough, but your goal there is moving the content out of the
“none of the above” filter.

jfoxny · February 6, 2007, 7:05pm

First of all Giles and Ken, thanks for your answers. It sounds like a
Bayesian approach won’t work for what I want to do. This same gem has
another classifer inside it, called Classifier::LSI which does latent
semantic indexing. I don’t know much about it yet other than it’s not as
fast or as small as a Bayesian classifier. However, would it be more
suited to supporting a “none of the above” feature?

Or would you recommend something entirely different?

Well, a latent semantic indexer is a whole different thing. I know of
a company that built a search engine with latent semantic analysis. If
you search it for naked pictures of Britney Spears – just as a stupid
example – it’ll also ask you if you want to hear her music or if
you’re interested in naked pictures of Lindsay Lohan as well. Latent
semantic indexers are a very smart technology but I think they require
extremely large data sets to be useful. They compare patterns of
linkage to identify things which must have some latent semantic
connection, that is to say, words that are different but mean similar
things. There are very few problems for which latent semantic analysis
isn’t overkill.

What is it that you’re trying to do?

jfoxny · February 8, 2007, 4:35am

Ken B. wrote:

On Wed, 07 Feb 2007 04:05:35 +0900, Jason F. wrote:

Well, a latent semantic indexer is a whole different thing. I know of

Well, within the not-too-distant future, we’ll be handling a sizable
dataset so LSI might make sense after all. This would be for a system
we’re building that’s doing something quite cool but I can’t shout all
the details from the rooftops just yet Would it be all right for me
to give you specifics via email? I’d be happy to edit the Ruby-germane
portions of our offline conversation and post them back onto the forum.
My email is jason at seethroo dot us.

I suggest learning about machine learning techniques in general before
you
try to do anything quite cool that you can’t shoud from the rooftops
just yet.

I recommend “Machine Learning” by Tom Mitchell[1].

–Ken
[1] Machine Learning textbook

Thanks for the link, it does look like a very worthwhile book.
Unfortunately, I haven’t got the time to read a complete textbook before
developing something workable (not feature-rich, just workable). If you
(or anyone else) is interested in doing an hour or two of consulting
about this, I’m able to pay for your time. Email me at jason at seethroo
dot us.

Thanks for all the excellent replies so far! This has turned into an
interesting thread.
-Jason