Help with refactor and db/ar advice

Hi all,
I have a number of delimited data files that I would like to import
into MySql. I am using ActiveRecord to create the schema and do the
import of the data. Code snippets follows for how I am doing one of
the many files. However, the code bloats when I do it for all of them.
Now, of course, I can’t get around creating a schema for each table,
but it seems that the function that loads the code is a bit redundant.
So, my end goal in this post is just to learn a few things and create
some cleaner code, so I would love your comments.

My questions are:

  1. Is there a way to take advantage of the schema so that I don’t have
    to re-write it in the load function?
  2. Is there a way I can write a generic load function (taking
    advantage of #1 if available, or not if not).
  3. What is the “better” way (than ][1…-2]) to strip a string of a
    starting/ending ~
  4. Should I just use an AR generated primary key, rather than the data
    sets ASCII primary key? (many of the tables in the datasets have
    composite keys, in which I will not implement, so I am curious if I
    just create all of the primary keys through AR insted).
  5. Any other advice?

Thanks in advance!

===========> CURRENT CODE <=============

class FoodDescription < ActiveRecord::Base
end

def create_food_descriptions
ActiveRecord::Schema.define do
create_table(:food_descriptions, :primary_key =>
‘NDB_No’, :options => ‘DEFAULT CHARSET=utf8’) do |table|
table.column :NDB_No, :string, :limit => 5, :null => false
table.column :FdGrp_Cd, :string, :limit => 4, :null => false
table.column :Long_Desc, :string, :limit => 200, :null =>
false
table.column :Shrt_Desc, :string, :limit => 60, :null =>
false
table.column :CommName, :string, :limit => 100, :null =>
true
table.column :ManufacName, :string, :limit => 65, :null =>
true
table.column :Survey, :string, :limit => 1, :null => true
table.column :Ref_desc, :string, :limit => 135, :null =>
true
table.column :Refuse, :string, :limit => 2, :null => true
table.column :SciName, :string, :limit => 65, :null => true
table.column :N_Factor, :float, :null => true
table.column :Pro_Factor, :float, :null => true
table.column :Fat_Factor, :float, :null => true
table.column :CHO_Factor, :float, :null => true
end
puts " … food_descriptions created"
end
end

def load_food_descriptions( food_desc_file_name )
# Loop through the file, assuming the following format:
# ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^^^~Y~^^0^^6.38^4.27^8.79^3.87
# ndbnum, fdgrp, long_desc, shrt_desc, common_name,
manufacturer_name, survey,
# refuse_desc, percent_refuse, science_name, n_fac, pro_fac,
fat_fac, CHO_fac

food_desc_file = File.open(food_desc_file_name)
food_desc_file.each do |line|
  # caret (^) is the delimeter
  # squiglies are the string delimeters: ~string~
  #  hence below, we split with 1..-2 to remove them if a string
  record_split = line.split('^')

  FoodDescription.create(
    :NDB_No => record_split[0][1..-2],
    :FdGrp_Cd => record_split[1][1..-2],
    :Long_Desc => record_split[2][1..-2],
    :Shrt_Desc => record_split[3][1..-2],
    :CommName => record_split[4][1..-2],
    :ManufacName => record_split[5][1..-2],
    :Survey => record_split[6][1..-2],
    :Ref_desc => record_split[7][1..-2],
    :Refuse => record_split[8][1..-2],
    :SciName => record_split[9][1..-2],
    :N_Factor => record_split[10],
    :Pro_Factor => record_split[11],
    :Fat_Factor => record_split[12],
    :CHO_Factor => record_split[13]
    );
end

end

ball wrote:

  1. Is there a way to take advantage of the schema so that I don’t have
    to re-write it in the load function?

puts FoodDescription.columns.inspect
puts FoodDescription.map { |c| c.name }.inspect

  1. Is there a way I can write a generic load function (taking
    advantage of #1 if available, or not if not).

As long as the columns in the text source appear in the same order as
the columns in the database definition you should be fine. I think this
ordering is guaranteed by the database, otherwise you couldn’t
meaningfully do “INSERT INTO foo VALUES (x,y,z)” (without naming the
columns) or “SELECT * FROM foo”

  1. What is the “better” way (than ][1…-2]) to strip a string of a
    starting/ending ~

Do it all in one go. A couple of options:

records = line.split(/^/).map { |rec| rec[1…-2] }
records = line.scan(/~([^~^]*)~/)

but they won’t work unless all your fields are ~ delimited. So this
might work better for you:

records = line.split(/^/).map { |rec| rec.sub(/~(.*)~) { $1 } }

  1. Should I just use an AR generated primary key, rather than the data
    sets ASCII primary key? (many of the tables in the datasets have
    composite keys, in which I will not implement, so I am curious if I
    just create all of the primary keys through AR insted).

ActiveRecord has some hard-coded assumptions about primary keys. If you
override its assumption that the key is called “id” then it won’t
generate keys automatically for new records. If you have non-integer
primary key then it may or may not work. It certainly doesn’t handle
composite primary keys; I saw a plugin which claimed to do this some
time ago, but I don’t know if it’s been kept up to date.

So my advice with AR is: if you are working with an existing
database/schema that you don’t control then try using non-standard
primary keys, but if you control the schema yourself, follow AR’s
assumption of an integer primary key.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs