Forum: Ruby quick script to search for ...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
469022b903a01f9532bdbf50ad63a905?d=identicon&s=25 worcestershire77 (Guest)
on 2009-04-16 18:40
(Received via mailing list)
I need to write rb script that will recursively search a directory for
all files of particular types - say *.jpg, *.gif and *.png files. Then
I need to re-search the same directory structure identifying which of
these image files are not referenced anywhere in *.htm, *.html or
*.php files.

In a nutshell I need to find all files of a particular type that are
not referenced in the source of a second set of files. This has to
work recursively and be general purpose enough to be easily modified
for different but similar problems. Anyone know how to approach this
task? I would like to have some arguments like the following:

look_for=%w(jpg gif png)
look_in=%w(htm html php)

I am cleaning up a big pile of unstructured files for a web site and I
anticipate having to perform dozens of small tasks like this to
identify unreferenced files, backup copies, dead code. Maybe there is
a gem for something like this. Thanks in advance for any help.
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2009-04-16 19:30
(Received via mailing list)
> I need to write rb script that will recursively search a directory for
> all files of particular types - say *.jpg, *.gif and *.png files. Then

Use Dir['path/**/*.gif'].each to deeply search folders.

> I need to re-search the same directory structure identifying which of
> these image files are not referenced anywhere in *.htm, *.html or
> *.php files.

Use this (off the top of my head) to get started:

  doc = Nokogiri::HTML( File.read(some_file_name) )
  doc.xpath('//img').each do |node|
    p node[:src]
  end

> In a nutshell I need to find all files of a particular type that are
> not referenced in the source of a second set of files. This has to
> work recursively and be general purpose enough to be easily modified
> for different but similar problems. Anyone know how to approach this
> task?

This is very well trodden territory in website testing. You can use
Mechanize to hit the site from its server, or can use Watir to hit it
thru
an IE or Firefox web browser.

Put another way, if the developers had used unit testing to start with,
cleaning up the website would have been easier all along. Clean sites
are
easier to test.
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2009-04-16 19:36
(Received via mailing list)
>> I need to re-search the same directory structure identifying which of
>> these image files are not referenced anywhere in *.htm, *.html or
>> *.php files.

> Put another way, if the developers had used unit testing to start with,
> cleaning up the website would have been easier all along. Clean sites are
> easier to test.

Occurs to me that, in theory, you could also try PhpUnit...
This topic is locked and can not be replied to.