Forgive me, as I’m still a beginner. I have a pretty big amount of
data related to food (from USDA.gov) that I would like to add to a
database. A sample of the content looks like this:
~05088~^~CHICKEN,BROILERS OR FRYERS,NECK,MEAT
ONLY,RAW~^71.17^154^17.55^8.78^0.69^0.00^0.0^^27^2.06^17^113^175^81^2.68
^0.109^0.035^13.5^0.0^0.054^0.228^4.119^1.090^0.290^8^0^8^8^0.32^146^44^
44^^^^^^^^2.250^2.730^2.180^83^6.00^~1 unit~^20.00^~1 neck, bone and
skin removed~^75
I am wondering if someone count point me in the right direction as
far as how to sort this data out and add it to a database, assuming I
know which columns I need (such as name, nutrition info (calories,
fat, carbs, protein), portion size, etc. Do I use migrations or
something? How do I sort out all of the information from each line?
I am still new to this so I am a little confused.
take a look at kettle: http://www.kettle.be
this is a great (open source) data manipulation tool!
Robert G. wrote:
Hello,
Forgive me, as I’m still a beginner. I have a pretty big amount of
data related to food (from USDA.gov) that I would like to add to a
database. A sample of the content looks like this:
~05088~^~CHICKEN,BROILERS OR FRYERS,NECK,MEAT
ONLY,RAW~^71.17^154^17.55^8.78^0.69^0.00^0.0^^27^2.06^17^113^175^81^2.68
^0.109^0.035^13.5^0.0^0.054^0.228^4.119^1.090^0.290^8^0^8^8^0.32^146^44^
44^^^^^^^^2.250^2.730^2.180^83^6.00^~1 unit~^20.00^~1 neck, bone and
skin removed~^75
I am wondering if someone count point me in the right direction as
far as how to sort this data out and add it to a database, assuming I
know which columns I need (such as name, nutrition info (calories,
fat, carbs, protein), portion size, etc. Do I use migrations or
something? How do I sort out all of the information from each line?
I am still new to this so I am a little confused.
First you need to know the format of the data before you can do
anything. Then design a schema for the data if you haven’t already.
Probably the easiest way to do it is to just parse the data using
something like String.split combined with whatever cleanup you need to
do on it before it goes into the database. Then insert it one row at
a time using a ruby script with something like ruby-dbi. If it’s a
very large amount of data you could also write out a file in a format
that your database can accept as an import file.
I am wondering if someone count point me in the right direction as
far as how to sort this data out and add it to a database, assuming I
know which columns I need (such as name, nutrition info (calories,
fat, carbs, protein), portion size, etc. Do I use migrations or
something? How do I sort out all of the information from each line?
I am still new to this so I am a little confused.
I would use shell script that uses gawk/sed and try to filter that
out…