Forum: Ruby Directory searching againist a text file

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cf25fbf53c67e27d95845e77e949b56f?d=identicon&s=25 Stuart Clarke (sclarke)
on 2008-12-06 01:55
I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords stored
in a file. My code so far is below, but before moving ahead I have two
questions.

First: I am passing in a text file called "terms.txt" to search for each
keyword in the file I assume the best way to to do so is as follows:

terms.each do |term|
if line =~ term
  puts ""
end

My second question is: This program works well for searching text files
but what about word docs and spreadsheets? Do i need some Windows API in
there??


Many thanks


require 'find'

class ESearch

  #method which is passed file path from cmd line
  def scanFiles(path)
    terms  = "C:\Documents and Settings\user\Desktop\terms.txt"
    #process each file under the passed file path
    Find.find(path) do |curPath|
      next unless File.file?(curPath)
      #process the contens of each file line by line counting line
nmbers
      File.open(curPath) do |file|
        file.each do |line|
          #check if a line in the file matches term and output the path
and line number
          if line =~ terms
            puts "#{curPath}"
          end
        end
      end
    end
  end
end

#run of cmd line pass in file path, this will ask for a file path if one
is not passed
if __FILE__ == $0
  if ARGV.size != 1
    puts "Use: #{$0} [path]"
    exit
  end

  esearch = ESearch.new()
  esearch.scanFiles(ARGV[0])
end
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2008-12-06 21:41
(Received via mailing list)
Stuart Clarke wrote:

> if line =~ term
>   puts ""
> end
>
> My second question is: This program works well for searching text
> files but what about word docs and spreadsheets? Do i need some
> Windows API in there??

You can read these files if you open them in binary mode.
However, they will contain so much extra binary crap that
it may not be easy to search in them.

>   def scanFiles(path)
>           if line =~ terms
> if FILE == $0
>   if ARGV.size != 1
>     puts "Use: #{$0} [path]"
>     exit
>   end
>
>   esearch = ESearch.new()
>   esearch.scanFiles(ARGV[0])
> end


terms = IO.read("terms.txt").strip.split(/\s*\n\s*/)

ARGF.each{|line| line.strip!
  if terms.include? line
    puts "#{ARGF.filename}:#{ARGF.lineno}: #{line}"
  end
}

Running it:

ruby scanner.rb *.dat
This topic is locked and can not be replied to.