Help with refactor and db/ar advice

ball · March 12, 2009, 1:15am

Hi all,
I have a number of delimited data files that I would like to import
into MySql. I am using ActiveRecord to create the schema and do the
import of the data. Code snippets follows for how I am doing one of
the many files. However, the code bloats when I do it for all of them.
Now, of course, I can’t get around creating a schema for each table,
but it seems that the function that loads the code is a bit redundant.
So, my end goal in this post is just to learn a few things and create
some cleaner code, so I would love your comments.

My questions are:

Is there a way to take advantage of the schema so that I don’t have
to re-write it in the load function?
Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).
What is the “better” way (than ][1…-2]) to strip a string of a
starting/ending ~
Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).
Any other advice?

Thanks in advance!

===========> CURRENT CODE <=============

class FoodDescription < ActiveRecord::Base
end

def create_food_descriptions
ActiveRecord::Schema.define do
create_table(:food_descriptions, :primary_key =>
‘NDB_No’, :options => ‘DEFAULT CHARSET=utf8’) do |table|
table.column :NDB_No, :string, :limit => 5, :null => false
table.column :FdGrp_Cd, :string, :limit => 4, :null => false
table.column :Long_Desc, :string, :limit => 200, :null =>
false
table.column :Shrt_Desc, :string, :limit => 60, :null =>
false
table.column :CommName, :string, :limit => 100, :null =>
true
table.column :ManufacName, :string, :limit => 65, :null =>
true
table.column :Survey, :string, :limit => 1, :null => true
table.column :Ref_desc, :string, :limit => 135, :null =>
true
table.column :Refuse, :string, :limit => 2, :null => true
table.column :SciName, :string, :limit => 65, :null => true
table.column :N_Factor, :float, :null => true
table.column :Pro_Factor, :float, :null => true
table.column :Fat_Factor, :float, :null => true
table.column :CHO_Factor, :float, :null => true
end
puts " … food_descriptions created"
end
end

def load_food_descriptions( food_desc_file_name )
# Loop through the file, assuming the following format:
# ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^^^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
# ndbnum, fdgrp, long_desc, shrt_desc, common_name,
manufacturer_name, survey,
# refuse_desc, percent_refuse, science_name, n_fac, pro_fac,
fat_fac, CHO_fac

food_desc_file = File.open(food_desc_file_name)
food_desc_file.each do |line|
  # caret (^) is the delimeter
  # squiglies are the string delimeters: ~string~
  #  hence below, we split with 1..-2 to remove them if a string
  record_split = line.split('^')

  FoodDescription.create(
    :NDB_No => record_split[0][1..-2],
    :FdGrp_Cd => record_split[1][1..-2],
    :Long_Desc => record_split[2][1..-2],
    :Shrt_Desc => record_split[3][1..-2],
    :CommName => record_split[4][1..-2],
    :ManufacName => record_split[5][1..-2],
    :Survey => record_split[6][1..-2],
    :Ref_desc => record_split[7][1..-2],
    :Refuse => record_split[8][1..-2],
    :SciName => record_split[9][1..-2],
    :N_Factor => record_split[10],
    :Pro_Factor => record_split[11],
    :Fat_Factor => record_split[12],
    :CHO_Factor => record_split[13]
    );
end

end

ball · March 12, 2009, 9:52am

ball wrote:

Is there a way to take advantage of the schema so that I don’t have
to re-write it in the load function?

puts FoodDescription.columns.inspect
puts FoodDescription.map { |c| c.name }.inspect

Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).

As long as the columns in the text source appear in the same order as
the columns in the database definition you should be fine. I think this
ordering is guaranteed by the database, otherwise you couldn’t
meaningfully do “INSERT INTO foo VALUES (x,y,z)” (without naming the
columns) or “SELECT * FROM foo”

What is the “better” way (than ][1…-2]) to strip a string of a
starting/ending ~

Do it all in one go. A couple of options:

records = line.split(/^/).map { |rec| rec[1…-2] }
records = line.scan(/~([^~^]*)~/)

but they won’t work unless all your fields are ~ delimited. So this
might work better for you:

records = line.split(/^/).map { |rec| rec.sub(/~(.*)~) { $1 } }

Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).

ActiveRecord has some hard-coded assumptions about primary keys. If you
override its assumption that the key is called “id” then it won’t
generate keys automatically for new records. If you have non-integer
primary key then it may or may not work. It certainly doesn’t handle
composite primary keys; I saw a plugin which claimed to do this some
time ago, but I don’t know if it’s been kept up to date.

So my advice with AR is: if you are working with an existing
database/schema that you don’t control then try using non-standard
primary keys, but if you control the schema yourself, follow AR’s
assumption of an integer primary key.