On Saturday 09 May 2009, Jian L. wrote:
else
recordWord.count += 1
end
recordWord.save
and so this part loops for 6000 times…
Simply put, this is an inappropriate use of a relational database. As
the other respondents said, you’re executing 12000 transactions each
with a single database access. If you wrap the code above into a
transaction block
Word.transaction do
…
end
you’re down to one transaction, but still have 12000 accesses. IIUC, you
have a word list of 6000 words (or something) in memory already, why not
do the frequency counting where it’s easy?
frequencies = Hash.new(0)
wordlist.each { |word| frequencies[word] += 1 }
Word.transaction do
wordlist.each do |word, freq|
unless w = Word.find_by_s(word)
w = Word.new(:s => word, :count => 0)
end
w.freq += 1
w.save!
end
end
This way still incurs two database accesses per unique word in your
list. To cut this down without custom SQL ([1], [2]) you have to
know/decide whether there are mostly known or unknown words in every new
document.
Assuming most words are already in the database, you could do the update
like this
Word.transaction do
wordlist.each do |word, freq|
update_count = Word.update_all(
[“freq = freq + ?”, freq],
[“s = ?”, word])
if update_count == 0
Word.create!(:s => word, :count => freq)
end
end
end
Michael
[1] http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
[2] http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-ERROR-TRAPPING
–
Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/