[QUIZ] Decision Tree Learning (#213)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this
    quiz until 48 hours have elapsed from the time this message was
    sent.

  2. Support Ruby Q. by submitting ideas and responses
    as often as you can!
    Visit: http://rubyquiz.strd6.com/suggestions

  3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby T. follow the discussion. Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Decision Tree Learning (#213)

How now Rubyists,

This week’s quiz is about decision tree learning1. Decision tree
learning uses a decision tree as a predictive model which maps
observations about an item to conclusions about the item’s target
value. In these tree structures, leaves represent classifications and
branches represent conjunctions of features that lead to those
classifications.

For a practical example let’s examine the plight of our friend David.
David is the manager of a famous golf club. Sadly, he is having some
trouble with his customer attendance. There are days when everyone
wants to play golf and the staff are overworked. On other days, for no
apparent reason, no one plays golf and staff are idle. David’s
objective is to optimise staff time by predicting when people will
play golf. To accomplish that he needs to understand the reasons
people decide to play. He assumes that weather must be an important
underlying factor, so he decides to use the weather forecast for the
upcoming week. So during two weeks he has been recording:

  • The outlook, whether it was sunny, overcast or raining.
  • The temperature (in degrees Fahrenheit).
  • The relative humidity in percent.
  • Whether it was windy or not.
  • Whether people attended the golf club on that day.

David compiled this data as shown:

Outlook, temperature, humidity, windy, play

data = [
[:sunny, 85, 85, false, false],
[:sunny, 80, 90, true, false],
[:overcast, 83, 78, false, true ],
[:rain, 70, 96, false, true ],
[:rain, 68, 80, false, true ],
[:rain, 65, 70, true, false],
[:overcast, 64, 65, true, true ],
[:sunny, 72, 95, false, false],
[:sunny, 69, 70, false, true ],
[:rain, 75, 80, false, true ],
[:sunny, 75, 70, true, true ],
[:overcast, 72, 90, true, true ],
[:overcast, 81, 75, false, true ],
[:rain, 71, 80, true, true ],
]

Our job is to write a Ruby program that will construct a decision tree
from this data. The algorithms that are used for constructing decision
trees work by choosing a variable at each step that is the next best
variable to use in splitting the set of items. “Best” is defined by
how well the variable splits the set into subsets that have the same
value of the target variable. Different algorithms use different
formulae for measuring “best”, the techniques you use are up to you.

Have Fun!

  • Courtesy of Wikipedia. *

P.S. The number of this quiz is out of order from the last one (#214).
The next quiz will be #215, then it’s back on track.


-Daniel
http://rubyquiz.strd6.com