Ruby regex on html file

eggie5 · September 26, 2007, 1:40am

I’m trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the HTML comment tags.

Expected results are:

/javascripts/prototype.js,
management/javascripts/management.js,
/javascripts/scriptaculous.js,
/javascripts/effects.js,
/javascripts/controls.js

Snippet:

      <script type="text/javascript"

src="/javascripts/prototype.js"></
script>

      <script type="text/javascript" src="management/javascripts/

management.js">

      <script src="/javascripts/scriptaculous.js" type="text/

javascript">

      <script src="/javascripts/effects.js"

type=“text/javascript”></
script>

      <script src="/javascripts/controls.js"

type=“text/javascript”></
script>

      <!--endscripts-->

eggie5 · September 26, 2007, 2:33am

I have no shame....

For something as large and (maybe) complex as this, you might want to
try generating your regexp through TextualRegexp.

gem install TextualRegexp

Good luck
ari

On Sep 25, 2007, at 7:40 PM, eggie5 wrote:

I’m trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the HTML comment tags.
---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive

eggie5 · September 26, 2007, 3:41am

eggie5 [email protected] wrote:

I’m trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Is that a solution 4 u ??? :

#! /usr/bin/env ruby

html = ’

                <script type="text/javascript"

src=“/javascripts/prototype.js”>

                <script type="text/javascript"

src=“management/javascripts/management.js”>

                <script src="/javascripts/scriptaculous.js"

type=“text/javascript”>

                <script src="/javascripts/effects.js"

type=“text/javascript”>

                <script src="/javascripts/controls.js"

type=“text/javascript”>

                <!--endscripts-->

’
js = []
html.each {|l|
js << l.chomp.gsub(/.* src="(.[^ ])"[ >]./, ‘\1’).gsub(/(.)"
type=./, ‘\1’) if /<script / === l
}
p js

gives :
RubyMate r6354 running Ruby r1.8.6 (/opt/local/bin/ruby)

extract_js.rb

[“/javascripts/prototype.js”, “management/javascripts/management.js”,
“/javascripts/scriptaculous.js”, “/javascripts/effects.js”,
“/javascripts/controls.js”]

on Mac OS X 10.4.10

i didn’t found a solution with only one gsub…
sure it exits :[

eggie5 · September 26, 2007, 4:12am

On Sep 25, 6:35 pm, [email protected] (Une
Bévue) wrote:

                <script type="text/javascript"
                <script src="/javascripts/effects.js"
type=.*/, ‘\1’) if /<script / === l}
“/javascripts/controls.js”]

on Mac OS X 10.4.10

i didn’t found a solution with only one gsub…
sure it exits :[

Une Bévue

Thank you so must for your effort. This is much more succinct than
what I came up with!

File.open(“app/views/layouts/management.rhtml”, “r”) do |infile|
file_text=""
while (line = infile.gets)
file_text << line
end

       script_block=file_text.match("<!--scripts-->[\\S\\s]*?<!--

endscripts–>")

       script_block=script_block.to_s
       script_refs=script_block.scan(/[^\"]+.js/)

       script_refs.length

       script_refs.each do |ref|
           base_path = "public/"
           puts "#{base_path}#{ref}"
       end
   end

eggie5 · September 26, 2007, 4:40am

On Sep 25, 6:36 pm, eggie5 [email protected] wrote:

management/javascripts/management.js,

script>

                                    <!--endscripts-->

puts DATA.read.scan( /<script\s+[^>]src="(.?)"/m ).flatten

END

eggie5 · September 26, 2007, 6:03am

William J. [email protected] wrote:

puts DATA.read.scan( /<script\s+[^>]src="(.?)"/m ).flatten
i don’t understand your “?” here --------------^

what is his meaning after * ???

eggie5 · September 26, 2007, 6:16am

Quoth Une BÃ©vue:

William J. [email protected] wrote:

puts DATA.read.scan( /<script\s+[^>]src="(.?)"/m ).flatten
i don’t understand your “?” here --------------^

what is his meaning after * ???

Une BÃ©vue

Non-greedy match. Find as few characters as possible to match, which in
this
case means don’t match quote characters.

HTH,

eggie5 · September 26, 2007, 6:42am

I’m trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Your subject says regex, but your request says Hpricot:

require ‘hpricot’
doc = Hpricot(input)
scripts = (doc/‘script’).map {|x| x[‘src’]}.compact

eggie5 · September 26, 2007, 7:22am

On Sep 25, 9:41 pm, “Daniel S.” [email protected] wrote:

I’m trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Your subject says regex, but your request says Hpricot:

require ‘hpricot’
doc = Hpricot(input)
scripts = (doc/‘script’).map {|x| x[‘src’]}.compact

Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

eggie5 · September 26, 2007, 6:01am

eggie5 [email protected] wrote:

Thank you so must for your effort. This is much more succinct than
what I came up with!

I found it with only one gsub :

#! /usr/bin/env ruby

html = ’

                <script type="text/javascript"

src=“/javascripts/prototype.js”>

                <script type="text/javascript"

src=“management/javascripts/management.js”>

                <script src="/javascripts/scriptaculous.js"

type=“text/javascript”>

                <script src="/javascripts/effects.js"

type=“text/javascript”>

                <script src="/javascripts/controls.js"

type=“text/javascript”>

                <!--endscripts-->

’
js = []
html.each {|l|
js << l.chomp.gsub(/^\s+<script\s+[^>]src="([^ "]).*/, ‘\1’) if
/<script / === l
}
p js

gives :

[“/javascripts/prototype.js”, “management/javascripts/management.js”,
“/javascripts/scriptaculous.js”, “/javascripts/effects.js”,
“/javascripts/controls.js”]

best,

eggie5 · September 26, 2007, 7:37am

Your subject says regex, but your request says Hpricot:

require ‘hpricot’
doc = Hpricot(input)
scripts = (doc/‘script’).map {|x| x[‘src’]}.compact

Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

Probably - Hpricot should treat all the rhtml guff as if you’re just
really really bad at writing html and treat the rhtml bits as just raw.

Hpricot(’<%= %>’).at(‘script’)[‘src’]
=> “monkey”

The rhtml will get in the way of Hpricot seeing your tree correctly, so
finding script tags only within the head section or something like that
might not work, but for simple finds it should be fine.

Dan.

eggie5 · September 26, 2007, 7:55am

Konrad M. [email protected] wrote:

Non-greedy match. Find as few characters as possible to match, which in this
case means don’t match quote characters.

OK, fine, thanks a lot to remaind me…

Ruby regex on html file

i didn’t found a solution with only one gsub… sure it exits :[

what is his meaning after * ???

i didn’t found a solution with only one gsub…
sure it exits :[