Find block in a long string

okkezSS · October 19, 2010, 2:27pm

Hi,

I am parsing a rss feed and i am try to extract a specific element from
an attribute.

The attribute looks like this, and I would like to extract the location
from it.
“\n Location: Columbus,
Ohio\n\n\n<”

Here is the horrible code I’ve done, could you help me to make it
cleaner

  description.split("\n").each do |ugly|
    next unless ugly.include?("Location")
    location = ugly.split("\n")[1].split("gt;").last
  end

Greg

gregm · October 19, 2010, 3:08pm

Greg Ma wrote in post #955398:

Hi,

I am parsing a rss feed and i am try to extract a specific element from
an attribute.

The attribute looks like this, and I would like to extract the location
from it.
“\n Location: Columbus,
Ohio\n\n\n<”

Here is the horrible code I’ve done, could you help me to make it
cleaner
 description.split("\n").each do |ugly|
 next unless ugly.include?("Location")
 location = ugly.split("\n")[1].split("gt;").last
 end
Greg

Ruby has an RSS parser isn’t it? Why not use it?

http://ruby-doc.org/core/classes/RSS.html
http://www.cozmixng.org/~rwiki/?cmd=view;name=RSS+Parser%3A%3ATutorial.en

gregm · October 19, 2010, 4:02pm

Steel S. wrote in post #955407:

Ruby has an RSS parser isn’t it? Why not use it?

http://ruby-doc.org/core/classes/RSS.html
http://www.cozmixng.org/~rwiki/?cmd=view;name=RSS+Parser%3A%3ATutorial.en

I do use a rss parser. The rss parser was just here to explain what I am
doing, but it has no relation with my issue.

gregm · October 19, 2010, 5:45pm

Does this snippet solve your need?

Given the string:

str=“\n Location: Columbus,
Ohio\n\n\n<”

remove unwanted chars with:

str1=str.gsub(/[&/;\n]+|lt|gt|strong|p/im,’ ‘).strip.squeeze(’ ')

and obtain;

puts “str1: #{str1}” #=> Location: Columbus, Ohio

if you wish, remove Location: and obtain;

str2 = str1.gsub(/Location:\s+/i,‘’)
puts “str2: #{str2}” #=> Columbus, Ohio

HTH gfb

“Greg Ma” [email protected] wrote in message
news:[email protected]…

gregm · October 19, 2010, 10:15pm

On 19.10.2010 19:51, Greg Ma wrote:

if you wish, remove Location: and obtain;

str2 = str1.gsub(/Location:\s+/i,‘’)
puts “str2: #{str2}” #=> Columbus, Ohio

HTH gfb

"Greg Ma"[email protected] wrote in message
news:[email protected]…

Thanks this is what I was looking for
I really need to get learning regular expressions…

Here’s a bit more to play with:

gist.github.com

https://gist.github.com/rklemme/635002

html-extract.rb

#!/bin/env ruby19

require 'nokogiri'

# Nokogiri should have this as well
REPL =  {
  '&lt;' => '<',
  '&le;' => '<=',
  '&gt;' => '>',
  '&ge;' => '>=',

This file has been truncated. show original

Kind regards

robert

gregm · October 19, 2010, 11:43pm

Robert K. wrote in post #955554:

On 19.10.2010 19:51, Greg Ma wrote:

if you wish, remove Location: and obtain;

str2 = str1.gsub(/Location:\s+/i,‘’)
puts “str2: #{str2}” #=> Columbus, Ohio

HTH gfb

"Greg Ma"[email protected] wrote in message
news:[email protected]…

Thanks this is what I was looking for
I really need to get learning regular expressions…

Here’s a bit more to play with:

Extract piece of an HTML fragment with Nokogiri · GitHub

Kind regards

robert
require ‘cgi’ —> CGI.unescapeHTML is more to the point, but pretty
much what Robert posted. You’re dealing with an escaped fragment of
HTML; treat it as such.

gregm · October 19, 2010, 7:51pm

Gianfranco Bozzetti wrote in post #955473:

Does this snippet solve your need?

Given the string:

str=“\n Location: Columbus,
Ohio\n\n\n<”

remove unwanted chars with:

str1=str.gsub(/[&/;\n]+|lt|gt|strong|p/im,’ ‘).strip.squeeze(’ ')

and obtain;

puts “str1: #{str1}” #=> Location: Columbus, Ohio

if you wish, remove Location: and obtain;

str2 = str1.gsub(/Location:\s+/i,‘’)
puts “str2: #{str2}” #=> Columbus, Ohio

HTH gfb

“Greg Ma” [email protected] wrote in message
news:[email protected]…

Thanks this is what I was looking for
I really need to get learning regular expressions…