For the life of me, i can’t figure out a ruby equivalent to perl’s /g
basically, i want to do the following
while htmlSource=~m/
For the life of me, i can’t figure out a ruby equivalent to perl’s /g
basically, i want to do the following
while htmlSource=~m/
On Tue, Nov 18, 2008 at 4:06 PM, knohr [email protected]
wrote:
while tableSource=~m/(.*?)</tr>/g do
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)Thread safe would be a plus.
any suggestions?
I think this does what you want, although I don’t think gsub was really
made
for this purpose.
def doSomethingWith(s)
print s, “\n”
end
htmlSource = ‘
1,11,2htmlSource.gsub(/
(.?)</table>/) do |t|On 2008.11.19., at 1:06, knohr wrote:
while tableSource=~m/(.*?)</tr>/g do
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)Thread safe would be a plus.
any suggestions?
While I can’t answer your original question, I could possibly help you
with the scraping if you are willing to reveal the page you are trying
to scrape and the data bits on it which should be scraped.
Cheers,
Peter
On Nov 18, 7:08 pm, knohr [email protected] wrote:
tableRowSource=$1
Thread safe would be a plus.
Would fast be a plus? No nested loop?
require ‘nokogiri’
doc = Nokogiri::HTML(htmlSource)
doc.search(‘//tr’).each do |row|
index = row.xpath(‘ancestor::table/*[contains(“Index”,.)]’)
doSomethingWith(row.text,index[/(\d)/])
end
The location of the element containing the index may have to be
modified.
– Mark.
On 19.11.2008, at 00:37 , Alan Johnson wrote:
tableSource=~m/Index (\d+)/
I will actually need to pull multiple vars, not just a single one,
any suggestions?
htmlSource = ‘’
1,1 1,2
Alan
That is pretty much how, except globals are hardly thread safe I
think. Use scan instead of gsub:
Here’s something I wrote to extract information from data structured
like this:
tablename
table2name
+field1 : string
+field2
Table = Struct.new(:name, :fields)
Field = Struct.new(:name, :type)
def extract_db_spec(file)
tables = []
doc = open(file, File::RDONLY) {|f|f.read}
table_name = /- (\w*)\s*?\n/
field_name = /(\s++ (\w+)\s*(:\s*(\w*))?\n)/
doc.scan /#{table_name}(#{field_name}+)/ do |tablename, fields|
t = Table.new tablename, []
fields.scan field_name do |junk, fieldname, junk2, type|
if type.nil? || type == “”
if /\w+_id/ === fieldname
type = “int”
else
type = “string”
end
end
t.fields << Field.new(fieldname, type)
end
tables << t
end
tables
end
einarmagnus
On 19.11.2008 07:08, Einar Magnús Boson wrote:
That is pretty much how, except globals are hardly thread safe I
think.
$1 and the like are
robert@fussel ~
$ ruby -e ‘2.times{|i|Thread.new(i){|ii|4.times{/(\d+)/=~ii.to_s;puts
$1;sleep 1}}};sleep 5’
0
1
1
0
1
0
1
0
robert@fussel ~
$
Use scan instead of gsub:
Right, as far as I can see no replacements should be done. Just read
only access.
html_source.scan %r{
table_source.scan %r{
But a proper HTML parser is probably much better.
Kind regards
robert
I use this as an equivalent to global match:
class Regexp
def global_match(str, &proc)
retval = nil
loop do
res = str.sub(self) do |m|
proc.call($~) # pass MatchData obj
‘’
end
break retval if res == str
str = res
retval ||= true
end
end
end
re = /…/
re.global_match(…) do |m|
…
end
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs