Directory searching againist a text file


#1

I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords stored
in a file. My code so far is below, but before moving ahead I have two
questions.

First: I am passing in a text file called “terms.txt” to search for each
keyword in the file I assume the best way to to do so is as follows:

terms.each do |term|
if line =~ term
puts “”
end

My second question is: This program works well for searching text files
but what about word docs and spreadsheets? Do i need some Windows API in
there??

Many thanks

require ‘find’

class ESearch

#method which is passed file path from cmd line
def scanFiles(path)
terms = “C:\Documents and Settings\user\Desktop\terms.txt”
#process each file under the passed file path
Find.find(path) do |curPath|
next unless File.file?(curPath)
#process the contens of each file line by line counting line
nmbers
File.open(curPath) do |file|
file.each do |line|
#check if a line in the file matches term and output the path
and line number
if line =~ terms
puts “#{curPath}”
end
end
end
end
end
end

#run of cmd line pass in file path, this will ask for a file path if one
is not passed
if FILE == $0
if ARGV.size != 1
puts “Use: #{$0} [path]”
exit
end

esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end


#2

Stuart C. wrote:

if line =~ term
puts “”
end

My second question is: This program works well for searching text
files but what about word docs and spreadsheets? Do i need some
Windows API in there??

You can read these files if you open them in binary mode.
However, they will contain so much extra binary crap that
it may not be easy to search in them.

def scanFiles(path)
if line =~ terms
if FILE == $0
if ARGV.size != 1
puts “Use: #{$0} [path]”
exit
end

esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end

terms = IO.read(“terms.txt”).strip.split(/\s*\n\s*/)

ARGF.each{|line| line.strip!
if terms.include? line
puts “#{ARGF.filename}:#{ARGF.lineno}: #{line}”
end
}

Running it:

ruby scanner.rb *.dat