Sync list of files in DB vs. list of files in folder

This may be the wrong question to ask, but I’ll start here anyway
because this could probably use some optimization.
The code underneath gets a list of all the files in all the folders
specified, then deletes the database entries that aren’t in that list,
updates the entries in both the db and the list, and creates the entries
that aren’t in the db.

Is there a better way to do this?

all_reports is of the form:
{ “a_path” => [“report1”, “report2”, “report3”], “another_path” => […]
… }

list = []
all_reports.each_value { |reps| list.concat reps }
Report.all.each do |rep|
rep.destroy if !list.include? rep.filename
end
all_reports.each do |path, reports|
Dir.chdir path
reports.each do |rep|
report = Report.find_by_filename rep
if report.nil? # If it doesn’t exist, create it
generate(rep, path)
else # If it does exist, update the timestamps.
report.absolute_age = Time.now - File.ctime(rep)
report.age = get_age report.absolute_age # Magic.
aging = modalities.find_by_name(report.modality).aging rescue
DefaultAging
report.old = (report.absolute_age.to_hours > aging) # Yes, I did
create a to_hours
report.save
end
end
end
0

On Aug 19, 3:52 pm, Aldric G. <rails-mailing-l…@andreas-
s.net> wrote:

This may be the wrong question to ask, but I’ll start here anyway
because this could probably use some optimization.
The code underneath gets a list of all the files in all the folders
specified, then deletes the database entries that aren’t in that list,
updates the entries in both the db and the list, and creates the entries
that aren’t in the db.

Question: why do you need to duplicate the filesystem’s directory
information in the DB?

Is there a better way to do this?
[…]

Almost certainly. Notice that you’re using find_by_filename each time
through the loop. That’s a pretty good indication that something’s
wrong; in general, database queries do not belong inside loops. I’ll
try to come up with something better, but first I’d like to know what
you’re trying to achieve here.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Marnen Laibow-Koser wrote:

Question: why do you need to duplicate the filesystem’s directory
information in the DB?

The filesystem’s directory information has medical reports in Word
document format.
The naming convention is:
“last name, first name{, middle initial, title}, date of study,
procedure”

I use ‘antiword’ to parse the document and get the information I want
(win32OLE is way, way, WAY too slow).

The proofreaders need to know the date of study and how long the exam’s
been in the directory - which can’t easily be determined from just
looking at the directory (it’s awkward).

My first version of the page literally just built the table with the
information to be displayed on-the-fly, every time it was refreshed. I
figured this may be an issue if too many people attempted to do it at
once, so I stuck it in the database instead - now the background job
does the checking of the directory and the parsing of the files.

Additionally, I’m in fact checking a couple of directories and offering
them as options in a menu in my view, again, to make their life easier


Of course, and this is the issue which brings me here in the first
place, when I click on one of those options after a couple of sweeps
happened, the rails log overflows with that kind of lines:

e[4;35;1mReport Load (0.0ms)e[0m e[0mSELECT * FROM “reports” WHERE
(“reports”.“filename” = ‘SomeReportName.doc’) LIMIT 1e[0m
e[4;36;1mReport Update (0.0ms)e[0m e[0;1mUPDATE “reports” SET
“updated_at” = ‘2009-08-19 13:51:22’, “absolute_age” = 31393 WHERE “id”
= 927e[0m

Which is what bothered me and what bothers you :slight_smile:

Okay, I finally got a clear look at the log, and it is indeed how it
seems - all the transactions to the database are queued until I actually
look at the database.

The DB I am using is SQLite3. How do I force ActiveRecord / SQLite /
Rails to commit / process the transaction so it happens in a better type
of real-time than that?