Ruby noob

Hi,

I would like to parse a very simple html(index_msg.htm) file described
below :

WM_ACTIVATE 0x0006 0x0000 WM_NULL WM_ACTIVATEAPP 0x001C 0x0001 WM_CREATE ... I would like to parse this file and to extract information like this :

enum foo
{
eWM_ACTIVATE = 0x0006,
eWM_ACTIVATEAPP = 0x0001,

};

I am starting with this :

fileIn = File.open(“C:/WIKI_CE/index_msg.htm”, “r”)
fileOut = File.new(“C:/WIKI_CE/enumWmMsg.h”, “w”)

begin
while (line = fileIn.readline)
line.chomp
$stdout.print line
end
rescue EOFError
fileIn.close
fileOut.close
end

but now I am stuck. Should I use regex or how can I compare two string ?

mosfet wrote:

WM_NULL

fileIn = File.open(“C:/WIKI_CE/index_msg.htm”, “r”)
end
This should get you started:

=====================================================================
require ‘rubygems’
require ‘scrubyt’

data = Scrubyt::Extractor.define do
fetch(‘input.html’)

record do
var_name ‘WM_ACTIVATE’
code ‘0x0006’
end
end

result = data.to_xml.to_s
names = result.scan(/var_name>(.+?)</var_name/).flatten
values = result.scan(/code>(.+?)</code/).flatten
pairs = names.zip(values)

pairs.each do |name, value|
puts “e#{name} = #{value}”
end

The XML to array code kind of sucks, in the next version of scRUBYt! you
will be able to output the result directly to a hash (or CSV or YAML or
some other, more friendly format for such a task).

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

On 3/5/07, mosfet [email protected] wrote:

WM_NULL

fileIn = File.open(“C:/WIKI_CE/index_msg.htm”, “r”)
end

but now I am stuck. Should I use regex or how can I compare two string ?

  1. have a look at hpricot
  2. if it’s too big for you use regexen with /m flag, and use
    Regex#scan():

REGEX = /

\s* (.*?)<\/td>\s* (.*?)<\/td> (.*?)<\/td> (.*?) (.*?) <\/tr>/xm

file_in = File.read(“C:/WIKI_CE/index_msg.htm”)
File.open(“C:/WIKI_CE/enumWmMsg.h”, “w”) do |file_out|
file_in.scan(REGEX) do
file_out.puts $1, $2, $3, $4, $5
end
end
end

Notes:

  1. we_use_snake_case_for_variable_names
  2. Use File.open with block to automatically close the file
  3. You’ll have the values in $1…$5
  4. It seems you are inconsistent - in the first example you chose the
    second line, in the other the fourth one.

In any case, Peter’s approach will be easier, and more stable.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs