Forum: Ruby using regular expressions...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
soldier.coder (Guest)
on 2008-11-11 15:25
(Received via mailing list)
I have the following code:

require 'open-uri'
def scrape_table(html)
  %r{</thead.*?>(.*?)</table>}m =~html
  $1
end

def scrape_case(a_line)
  %r{(<a\s.*?\d{6}-\d{2}'>\d{6}-\d{2}<\/a>)}m =~ a_line
  $1
end

  if $0 == __FILE__

    url = 'http://localhost:8080/tests/raw.html';
    page = open(url)  #open the url like a file
    text = page.read; #read it into one string
    my_table = scrape_table(text) #grab or "scrape" the table
    my_link = scrape_case(my_table) #grab a html that includes a 6-2
digit number (ex: 080910-15)
    puts(my_table)  #prints out my_table -- which contains the table
information
    puts("\n")
    puts(my_link)

end

The code grabs the one table contained in my URL then looks for an
HTML link that includes a number that is 6 digits, followed by a dash,
followed by 6 digits.  I'm fairly certain the regex in scrape_case( )
grabs more than one html link, if more than one is in the table.  Is
there any way I can grab all those links into an array?
Peter S. (Guest)
on 2008-11-11 15:53
(Received via mailing list)
On 2008.11.11., at 14:22, soldier.coder wrote:

>  $1
> end

>  Is there any way I can grab all those links into an array?

Sure - String#scan is your friend:

def scrape_case(a_line)
   a_line.scan(/<a\s.*?\d{6}-\d{2}'>\d{6}-\d{2}<\/a>/)
end

ex:

 >> "<a href='123456-78'>123456-78</a> here is another: <a
href='111111-99'>111111-99</a>".scan(/<a\s.*?\d{6}-\d{2}'>\d{6}-\d{2}<
\/a>/)
=> ["<a href='123456-78'>123456-78</a>", "<a
href='111111-99'>111111-99</a>"]


HTH,
Peter
This topic is locked and can not be replied to.