Regexp with Ruby

ajay7 · November 15, 2006, 6:56pm

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:

Will scan this image tag:

I already tested this with the follow code:

…scan(/<img.>/m)
and with
…scan(/<img.?>/m)

But the result was always:

I hope someone can help me! Thanks a lot!

Kind Regards
Ajay

ajay7 · November 15, 2006, 7:05pm

On Thu, 16 Nov 2006, Ajay Vijey wrote:

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:

    [trimmed]

and with

I hope someone can help me! Thanks a lot!

I’d agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

Kind Regards
Ajay

    Hugh

ajay7 · November 15, 2006, 7:20pm

Hugh S. wrote:

I’d agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

rubyscript

datei_new = IO.read(â€œindex.htmâ€)
datei_regexp = datei_new.scan(/(<img.*>)/m)

puts datei_regexp

index.htm

test

ajay7 · November 15, 2006, 8:09pm

On 11/16/06, Paul L. [email protected] wrote:

data = File.read(“sample.html”)

–
Paul L.
http://www.arachnoid.com

If i were to do this…I would use hpricot.

ajay7 · November 15, 2006, 9:35pm

Ajay Vijey wrote:

puts datei_regexp

Works for me with datei_new.scan(/(<img.?>)/m) (the .? performs a
non-greedy match so it stops with the smallest match it can make,
rather than the longest)

The parentheses you have around the text of the regexp are unnecessary,
they cause the results to be more deeply nested in arrays. You should
use /<img.*?>/m

–Ken B.

ajay7 · November 15, 2006, 7:41pm

Ajay Vijey wrote:

Hallo @ all,

I have to replace in a File the image tags with an other!

As another poster has pointed out, you aren’t showing enough code for an
analysis, and, while you are replacing tags, please reformat your IMG
tags
thus:

Note the self-closing form. This won’t bother older browsers, and it
will
allow you to meet the newer (X)HTML standards as well.

Here is sample program that extracts all the IMG tags from a Web page
(of
both the old and new varieties):

#!/usr/bin/ruby -w

data = File.read(“sample.html”)

extract = data.scan(%r{<img.*?/>}m)

puts extract.join("\n")

This outputs from my sample page: