Note that because I am traveling tomorrow, I’ve posted this week’s
quiz a bit early.
The three rules of Ruby Q. 2:
-
Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message. -
Support Ruby Q. 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Q. 2. Until then,
please visit the temporary website at -
Enjoy!
Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby T. follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Quiz #159
Food Database
There are numerous themes we have encountered across all of the past
Ruby Q.
problems, but there are a few that come back time and time again, albeit
sometimes in disguise. I can recall a number of quizzes that were best,
or most
easily, approached using pattern matching. Data searching is also a
common
theme, most often accessing the large, well-known databases of
vocabulary and
numbers.
This week we’re going to explore another large database that you might
not be familiar with: the USDA’s Nutrient Database. You can find out
about
this database at:
http://www.ars.usda.gov/services/docs.htm?docid=8964
The current database (SR20) can be downloaded from:
http://www.ars.usda.gov/Services/docs.htm?docid=15867
I recommend getting the abbreviated, ASCII download (a flat-file
database),
though those who want to experience the full brunt of the relational
database
are welcome to download that. I will focus on the abbreviated version,
since
it will serve our needs for this and future quizzes.
Opening the archive for the abbreviated database, you’ll find two files:
- ABBREV.txt: this is the ASCII database
- SR20_doc.pdf: a document describing the format and content of the
abbreviated
database.
(Note that SR20 now also contains a patch to the database. For the
purposes of
this quiz, I am not concerned whether you apply that patch or not. If
you
don’t want to worry about the patch, feel free to ignore it.)
The format of the database is fairly simple; the provided document
explains
the abbreviated file format beginning on page 29. To summarize, each
record
is a single line and contains more than a few delimited fields. Fields
are
separated by carets (^), and text fields are surrounded by tildes
(~).
The file is sorted by the first field, the food’s Nutrient Databank
Number
(NDB). Each line provides nutrient information for 100 grams of that
food.
Your task is to provide a function that will search this nutrient
database
for a food and provide information about it.
def nutrient_report(food, weight=100)
# print report to stdout
end
Parameter food will be a string that is the food to locate. Keep in
mind
that there may be multiple entries that will simply match (a la grep)
the
parameter provided. You should only report on one of these foods at this
time;
which one to choose is up to you. You may want to consider a metric such
as
the Levenshtein Distance
(Levenshtein distance - Wikipedia)
while comparing food names against the search string.
Parameter weight is the weight to measure in grams, defaulting to
100.
(Recall that the nutrient information of each record of the database is
based upon 100 grams.) Your report should output numerical information
that
corresponds to the weight requested. There is information in the
document
provided that explains how to adjust for weight.
The output you provide is mostly up to you, but should include as a
minimum:
- Full food name (as found in the database, not the search string)
- Food weight (as provided to the function)
- Nutrient values for:
- Water
- Protein
- Carbohydrates (the
Carbohydrt
field) - Fats (sum of the fields
FA_Sat
,FA_Mono
andFA_Poly
)
A few more things to consider. First, the database contains information
for
over 7,500 food items. That may be a lot to search and do string
comparisons
on. If you find your searches going very slowly, consider caching the
data
to a more search-efficient format.
Second, consider writing some tests with database integrity in mind. For
example, at a quick glance, it appears that all the food names are
presented
in the database in full-caps. But if you base your search on this
assumption,
you may miss at least one food (or perhaps more) in your search, as at
least
one food was entered into ABBREV.txt in mixed-case. There may be other
errors
in the file, so consider doing a few sanity checks on the data file
before
diving into the heart of the quiz. (Feel free to post integrity test
code
to the mailing list before the waiting period is up.)
Third, and finally, part of the goal here is to make available another
large, interesting database for future Ruby Q. problems. There are
plenty
of opportunities available here… meal planning is just one example.
Keep this in mind while designing your solution: we want a firm
foundation
for searching this nutrient database so that future problems can focus
on
examining the results of the search.