Quick script to search for


#1

I need to write rb script that will recursively search a directory for
all files of particular types - say *.jpg, *.gif and *.png files. Then
I need to re-search the same directory structure identifying which of
these image files are not referenced anywhere in *.htm, *.html or
*.php files.

In a nutshell I need to find all files of a particular type that are
not referenced in the source of a second set of files. This has to
work recursively and be general purpose enough to be easily modified
for different but similar problems. Anyone know how to approach this
task? I would like to have some arguments like the following:

look_for=%w(jpg gif png)
look_in=%w(htm html php)

I am cleaning up a big pile of unstructured files for a web site and I
anticipate having to perform dozens of small tasks like this to
identify unreferenced files, backup copies, dead code. Maybe there is
a gem for something like this. Thanks in advance for any help.


#2

I need to write rb script that will recursively search a directory for
all files of particular types - say *.jpg, *.gif and *.png files. Then

Use Dir[‘path/**/*.gif’].each to deeply search folders.

I need to re-search the same directory structure identifying which of
these image files are not referenced anywhere in *.htm, *.html or
*.php files.

Use this (off the top of my head) to get started:

doc = Nokogiri::HTML( File.read(some_file_name) )
doc.xpath(’//img’).each do |node|
p node[:src]
end

In a nutshell I need to find all files of a particular type that are
not referenced in the source of a second set of files. This has to
work recursively and be general purpose enough to be easily modified
for different but similar problems. Anyone know how to approach this
task?

This is very well trodden territory in website testing. You can use
Mechanize to hit the site from its server, or can use Watir to hit it
thru
an IE or Firefox web browser.

Put another way, if the developers had used unit testing to start with,
cleaning up the website would have been easier all along. Clean sites
are
easier to test.


#3

I need to re-search the same directory structure identifying which of
these image files are not referenced anywhere in *.htm, *.html or
*.php files.

Put another way, if the developers had used unit testing to start with,
cleaning up the website would have been easier all along. Clean sites are
easier to test.

Occurs to me that, in theory, you could also try PhpUnit…