Finding files with regular expressions

remcoh · October 2, 2007, 2:17pm

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

remco

remcoh · October 2, 2007, 2:58pm

On 10/2/07, Remco Hh [email protected] wrote:

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

Look at Ruby’s Find library. I am not sure if it can take regexp
arguments
(haven’t tried, but it would be hella cool).

remcoh · October 2, 2007, 3:41pm

Remco Hh wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Try something like this:

results = []

Dir.foreach("./programs_ruby") do |filename|
if filename.index(“mod”)
results << filename
end
end

p results

remcoh · October 2, 2007, 3:52pm

On Oct 2, 6:17 am, Remco Hh [email protected] wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Sorry, I just re-read your request and saw your desire for an array of
filenames. How about this:

Slim2:/usr/local/bin phrogz$ irb
irb(main):001:0> Dir[ ‘*’ ]
=> [“erb”, “fastri-server”, “findfile”, “fri”, “gem”, “gem_mirror”,
“gem_server”, “gemlock”, “gemri”, “gemwhich”, “gpgen”,
“index_gem_repository.rb”, “irb”, “lua”, “luac”, “mate”,
“mongrel_rails”, “p4”, “p4d”, “qri”, “rails”, “rake”, “rdoc”, “rdoc-
osa”, “ri”, “ri-emacs”, “rot13”, “ruby”, “sql”, “sqlite3”, “svn”,
“svnadmin”, “svndumpfilter”, “svnlook”, “svnserve”, “svnsync”,
“svnversion”, “swig”, “testrb”, “update_rubygems”]

irb(main):002:0> Dir[ ‘*’ ].grep /\d$/
=> [“p4”, “rot13”, “sqlite3”]

You could use Dir.chdir to pick a working directory if you like.

remcoh · October 2, 2007, 3:57pm

Hi –

On Tue, 2 Oct 2007, 7stud – wrote:

Dir.foreach("./programs_ruby") do |filename|
if filename.index(“mod”)
results << filename
end
end

p results

A little more concise:

results = Dir.entries("./programs_ruby").grep(/mod/)

Or you could do:

results = Dir[“mod”]

to automatically exclude hidden files, if that’s desired.

David

remcoh · October 2, 2007, 3:59pm

On 10/2/07, Remco Hh [email protected] wrote:

–
Posted via http://www.ruby-forum.com/.

Dir.glob(“/”).grep(/filename)
HTH
Robert

remcoh · October 2, 2007, 3:51pm

On Oct 2, 6:17 am, Remco Hh [email protected] wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Here’s my ‘findfile’ script that I use daily. It lets you use a regexp
for the filename, file content, specify depth of search, whether or
not to show all matches inside a file, and so on.

(You may need to unwrap some of the longer lines after copy/paste.)

See additional notes at the end.

Slim2:/usr/local/bin phrogz$ cat findfile
#!/usr/bin/env ruby

USAGE = <<ENDUSAGE
Usage:
findfile [-d max_depth] [-a] [-c] [-i] name_regexp
[content_regexp]
-d,–depth the maximum depth to recurse to (defaults to no
limit)
-a,–showall with content_regexp, show every match per file
(defaults to only show the first-match per file)
-c,–usecase with content_regexp, use case-sensitive matching
(defaults to case-insensitive)
-i,–includedirs also find directories matching name_regexp
(defaults to files only; incompatible with
content_regexp)
-h,–help show some help examples
ENDUSAGE

EXAMPLES = <<ENDEXAMPLES

Examples:
findfile foo

Print the path to all files with ‘foo’ in the name

findfile -i foo

Print the path to all files and directories with ‘foo’ in the

name

findfile js$

Print the path to all files whose name ends in “js”

findfile js$ vector

Print the path to all files ending in “js” with “Vector” or

“vector”

(or “vEcTOr”, “VECTOR”, etc.) in the contents, and print some of

the

first line that has that content.

findfile js$ -c Vector

Like above, but must match exactly “Vector” (not ‘vector’ or

‘VECTOR’).

findfile . vector -a

Print the path to every file with “Vector” (any case) in it

somewhere

printing every line in those files (with line numbers) with that

content.

findfile -d 0 .

Print the path to every file that is in the current directory.

findfile -d 1 .

Print the path to every file that is in the current directory or

any

of its child directories (but no subdirectories of the children).

ENDEXAMPLES

ARGS = {}
UNFLAGGED_ARGS = [ :name_regexp, :content_regexp ]
next_arg = UNFLAGGED_ARGS.first
ARGV.each{ |arg|
case arg
when ‘-d’,‘–depth’
next_arg = :max_depth
when ‘-a’,‘–showall’
ARGS[:showall] = true
when ‘-c’,‘–usecase’
ARGS[:usecase] = true
when ‘-i’,‘–includedirs’
ARGS[:includedirs] = true
when ‘-h’,‘–help’
ARGS[:help] = true
else
if next_arg
if next_arg==:max_depth
arg = arg.to_i + 1
end
ARGS[next_arg] = arg
UNFLAGGED_ARGS.delete( next_arg )
end
next_arg = UNFLAGGED_ARGS.first
end
}

if ARGS[:help] or !ARGS[:name_regexp]
puts USAGE
puts EXAMPLES if ARGS[:help]
exit
end

class Dir
def self.crawl( path, max_depth=nil, include_directories=false,
depth=0, &block )
return if max_depth && depth > max_depth
begin
if File.directory?( path )
yield( path, depth ) if include_directories
files = Dir.entries( path ).select{ |f| true unless f=~/^.
{1,2}$/ }
unless files.empty?
files.collect!{ |file_path|
Dir.crawl( path+‘/’+file_path, max_depth,
include_directories, depth+1, &block )
}.flatten!
end
return files
else
yield( path, depth )
end
rescue SystemCallError => the_error
warn “ERROR: #{the_error}”
end
end

end

start_time = Time.new
name_match = Regexp.new(ARGS[:name_regexp], true )
content_match = ARGS[:content_regexp] && Regexp.new( “.
{0,20}#{ARGS[:content_regexp]}.{0,20}”, !ARGS[:usecase] )

file_count = 0
matching_count = 0
Dir.crawl( ‘.’, ARGS[:max_depth], ARGS[:includedirs] && !
content_match){ |file_path, depth|
if File.split( file_path )[ 1 ] =~ name_match
if content_match
if ARGS[:showall]
shown_file = false
IO.readlines( file_path ).each_with_index{ |
line_text,line_number|
if match = line_text[content_match]
unless shown_file
puts file_path
matching_count += 1
shown_file = true
end
puts ( “%5d: " % line_number ) + match
end
}
puts " " if shown_file
elsif IO.read( file_path ) =~ content_match
puts file_path,” #{$~}“,” "
matching_count += 1
end
else
puts file_path
matching_count += 1
end
end
file_count += 1
}
elapsed = Time.new - start_time
puts “Found #{matching_count} file#{matching_count==1?‘’:‘s’} (out of
#{file_count}) in #{elapsed} seconds”

You do have to watch for shell escaping of the regexp, either escaping
chars as needed or quoting your regexp:

Slim2:/usr/local/bin phrogz$ findfile \d
./findfile
./index_gem_repository.rb
./p4d
./rdoc
./rdoc-osa
./svnadmin
./svndumpfilter
./update_rubygems
Found 8 files (out of 40) in 0.001228 seconds

Slim2:/usr/local/bin phrogz$ findfile \d
./p4
./p4d
./rot13
./sqlite3
Found 4 files (out of 40) in 0.001088 seconds

Slim2:/usr/local/bin phrogz$ findfile \d$
./p4
./rot13
./sqlite3
Found 3 files (out of 40) in 0.001118 seconds

Slim2:/usr/local/bin phrogz$ findfile “\d$”
./p4
./rot13
./sqlite3
Found 3 files (out of 40) in 0.001298 seconds

remcoh · October 2, 2007, 6:37pm

everybody, thanks for the good advice
this is most helpfull

remco

Remco Hh wrote:

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

remco

remcoh · October 2, 2007, 4:45pm

David A. Black wrote:

A little more concise:

results = Dir.entries("./programs_ruby").grep(/mod/)

Or you could do:

results = Dir[“mod”]

to automatically exclude hidden files, if that’s desired.

Thanks. I have some questions though. I notice that a lot of people
that post to the this forum don’t employ iterators for reading input as
they go. Instead, they tend to slam everything into memory first, and
then they work on iterating over the data–often with no care at all if
they happen to create a copy or two of the data along the way. I always
try to ask myself, “What if the input is 2-3GB?” I realize that’s
probably not going to be the case with filenames, but who knows? There
are multi Terabyte hard drives now. As a result, I always try to
iterate over input as I go rather than read it into memory in one chunk.
Is there something I am missing about ruby in that regard?

I assume that ruby iterators buffer file i/o. Is that not the case? Is
ruby so inefficient that you need to read everything into memory in the
biggest chunks possible to get reasonable performance while iterating
over data. Also, on a side note, it seems like it’s standing operating
procedure to shuttle as much code as you can into shell commands. Is
that because people want to avoid using the ruby interpreter?