Compare two tables and find the difference?

I need to compare a column in TableA to a column in TableB and then
select 3 columns from TableA for all the records in TableA that don’t
exist in TableB.

TableA and TableB are different but do have one field in common that i
am comparing. The tables are not related in anyway.

right now I have a find_by_sql query located in a model that looks like
this:
def self.absent_charges
find_by_sql <<-QUERY
SELECT sources.charge_id_number, sources.charge_name,
sources.offense_level
FROM sources
LEFT JOIN charges ON sources.charge_id_number =
charges.charge_id_number
WHERE charges.id IS NULL
QUERY
end

I also need to dedupe what ends up in my final collection after the
find_by_sql query, but I am not sure how to go about this. Is there a
better way to do this? Any suggestions?

My view takes the 3 columns from the TableA records and presents them
for editing before they are added to TableB.

The update_charges.rhtml view:

Add missing records to charges table

<%= start_form_tag %> <%= render :partial => 'charge', :collection => @new_charges %>
Charge id number Charge name Classification level
<%= submit_tag 'Save' %> <%= end_form_tag %>

the partial _charge.rhtml:

<%= hidden_field_tag "charge[#{charge_counter}][charge_id_number]", charge.charge_id_number %><%= charge.charge_id_number %> <%= text_field_tag "charge[#{charge_counter}][name]", charge.charge_name, :size => 50 %> <%= collection_select 'charge', 'charge_type_id', @charge_types, 'id', 'full_name' %>

the relevant part of the controller:
class ImportController < ApplicationController

def update_charges

@new_charges = Source.absent_charges
@charge_types = ChargeType.find(:all)

if request.get?
  redirect_to :action => 'update_tables' if @new_charges.empty?
else
  begin
    params[:charge].each do |key, row|
      Charge.create!(row)
    end
  rescue Exception => e
    flash[:notice] = "Can't update charges table: " + e
    redirect_to :action => 'update_tables'
  else
    flash[:notice] = 'Charges table was succesfully updated.'
    redirect_to :action => 'update_tables'
  end
end

end


end

it is late, i am tired, and i am having a hard time even explaining what
needs to be done. i hope someone can help me get this on track. right
now i can get it to select the correct items, but there are duplicates
which i don’t want, and if i perform the update_charges twice in a row
the find_by_sql appends the same data and presents it twice in my view.
i guess i need to initialize the absent_charges somehow? any help
greatly appreciated.

You can get the SQL query to remove duplicates by using DISTINCT:

SELECT DISTINCT sources.charge_id_number, sources.charge_name…

Why don’t you try that and then report back what bugs you’re still
seeing.

Separately, you don’t seem to be using Rails idiomatic column names,
which are very helpful for setting up associations. Essentially, you
would replace charge_id_number with charge_id (if you’re not already
using charge_id). I would expect to see “LEFT JOIN charges ON
sources.charge_id = charges.id”.

        - dan


Dan K. mailto:[email protected]
http://www.dankohn.com/ tel:+1-415-233-1000

Note that if you use idiomatic column names, I believe you may be
able to do the search without even needing find_by_sql (not that you
should be ashamed to drop into SQL now and then for challenging
queries). I think you can lose the model method and just use
something like this in the controller:

@new_charges = Source.find( :all, :include => :charges, :select =>
“DISTINCT id, charge_name, offense_level”, :conditions =>
[“charges.id = ?”, nil])

However, this may not work because eager loading supports conditions
on the first table but not joined ones. See eager loading here:
http://api.rubyonrails.com/classes/ActiveRecord/Associations/
ClassMethods.html

        - dan


Dan K. mailto:[email protected]
http://www.dankohn.com/ tel:+1-415-233-1000

The use of DISTINCT seemed to fix my problem. Thanks!

Dan K. wrote:

You can get the SQL query to remove duplicates by using DISTINCT:

SELECT DISTINCT sources.charge_id_number, sources.charge_name…

Why don’t you try that and then report back what bugs you’re still
seeing.

Separately, you don’t seem to be using Rails idiomatic column names,
which are very helpful for setting up associations. Essentially, you
would replace charge_id_number with charge_id (if you’re not already
using charge_id). I would expect to see “LEFT JOIN charges ON
sources.charge_id = charges.id”.

        - dan


Dan K. mailto:[email protected]
http://www.dankohn.com/ tel:+1-415-233-1000