New guy... Intoduction and first question on some direction


#1

Hi everyone. I’m new to these forums. I am sysadmin in California and
I’m learning Ruby. I’ve been working on an automated web application
testing using WATIR and I really like this language. Its the first one
I’ve actually contiuned learning after the “hello world” example.

It even made me want to write something on my own outside of work and
here’s the basics of the project… I just don’t even know where to
start. I can’t seem to be searching for the right terms thus I can’t
find modules that would help me out.

– sorry for the lenght of the post –

I want to merge 2 data txt files together.

  • Each file has sections and subsections.
  • Each section and subsection has data that may or may not be on both
    files.
  • The data that is in both files may be slightly different and in this
    case I need it to be “magically merged together” if its within certain
    arbitrary range
  • The data that is in both files that is outside of the range I mention
    above needs to be considered “new” data for the resulting file…
  • I’m not good with reg expressions but I can learn if that is part of
    the solution.

I was going to post 2 samples of the files I want to merge but the post
would have been over 4 pages long! Do you guys think the “needs” I’ve
posted are enough to point me in the right direction?


#2

Oscar G. wrote:

Hi everyone. I’m new to these forums. I am sysadmin in California and
I’m learning Ruby. I’ve been working on an automated web application
testing using WATIR and I really like this language. Its the first one
I’ve actually contiuned learning after the “hello world” example.

That’s great news! Welcom aboard.

files.
post would have been over 4 pages long! Do you guys think the “needs”
I’ve posted are enough to point me in the right direction?

It’s difficult to help out with some hints as we don’t know much yet.
Probably just post as much from those files so we can recognize how
sections and subsections are recognized.

From what I know so far: you might want to have classes Section and
SubSection with obvious meaning. I don’t know whether there’s some kind
of optimization possible with your data but in the worst case you’ll
have
O(n*m) effor to compare all possible pairs of SubSections. Also the
algorithm to decide whether they are close or not might be tricky.

Kind regards

robert

#3

hello,

  1. what should happen if the files have different structure (different
    set of sections/subsections)?

  2. please define “data”, “magically merged together”, “range”.

konstantin


#4

i would write a parser for these files. represent the contents as a set
of hashes/arrays. if you have control over the format of the files, you
might want to make them simpler so that you won’t have to write a real
parser.


#5

akonsu wrote:

hello,

  1. what should happen if the files have different structure (different
    set of sections/subsections)?

  2. please define “data”, “magically merged together”, “range”.

Thanks for the responses guys… Here’s a little more info based on
them.

The sections and subsections and data are defined by {} and [] and
values… for example.

DataGroup = {
[1] = {
[1] = {
[“dataidentifier”] = {
[1] = {
[“type”] = 1,
[“x”] = 45.5,
[“count”] = 1,
[“image”] = 4,
[“y”] = 18.8,
},
[1] = {
[“type”] = 1,
[“x”] = 21.5,
[“count”] = 5,
[“image”] = 4,
[“y”] = 31.8,
},
},
[2] = {
[“dataidentifier2”] = {
[1] = {
[“type”] = 1,
[“x”] = 74.5,
[“count”] = 1,
[“image”] = 3,
[“y”] = 11.8,
},
[1] = {
[“type”] = 1,
[“x”] = 27.5,
[“count”] = 5,
[“image”] = 3,
[“y”] = 36.8,
},
},

Disregard the last piece of my post when I said I wanted to merge data
within a range. After reviewing what I want to do, this is no longer the
case. I only want to merge data if the value is the same.

take for example the “x” and “y” values above… For the first section
where there is “dataidentifier”, both of my files have that section… I
want that if the “x” and “y” values are the same then to just add the
values under “count” If the “x” and “y” values are different then I just
need to have the resulting file showing the section with each of the
dataidentifier data for each of the x and y. Obviously these “x” and
“y” values are coordinates.

Maybe if I explain where I’m coming from it will make more sense. Say
I’m looking for widgets at x and y, and I need to record how many I find
and where.

Scenario 1.
I find 2 of the widgets at 15,18. That gets recorded into file 1.
I find 1 of the widgets at the same location, 15,18 These get recorded
into file 2.

Scenario 2.
I find 2 widgets at location 21,30. This goes into File 1
I find 2 contraptions at location 10,23. This goes into File 1
I find 3 widgets at location 13,40. This goes into File 2

On scenario 1, I want the resulting file to show that I found a total of
3 items at location 15,18.

On scenario 2, I want the resulting file to show that I found 2 widgets
at location 21,30, 3 widgets at location 13,40 and 2 contraptions at
location 10,23.

Is what I want to do better explained now? There is obviously the
complication (I think) of the coordinates having decimal points. I
can’t aovid this. I also can’t avoid writing the 2 files, this is why I
am working on this. I know it would be ideal to have just a single data
file where everything gets written to but this is beyond my control for
this project…

I really appreciate how fast you guys replied and I hope I’m helping you
help me. :slight_smile:


#6

Well I do’nt have control over the format of the files…

If it helps at all… I believe the syntax of the files I’m parsing is
Lua based.


#7

akonsu wrote:

i would write a parser for these files. represent the contents as a set
of hashes/arrays. if you have control over the format of the files, you
might want to make them simpler so that you won’t have to write a real
parser.

Well I do’nt have control over the format of the files…

I’ll try to look into the parser thing… the thing is this is the first
time I do any real coding so I don’t even know what to look for in the
ruby libraries to help me with this… What modules are out there that
can help me or what are some keywords I should use to search for this.
And are there any ruby parsers out there that I can look at? And this
seems like a complex project… am I taking on too big of a project for
a beginner?


#8

a parser is a big project for a beginner. parsing is a process of
translating a text stream in to a memory representation of the contents
of the stream. to do that, you will have to be able to split the stream
in to chunks called tokens, and then check if the combination of these
tokens is valid, that is if this combination corresponds to the so
called grammar for your language. there is a theory behind all that. if
your file was simpler and for example had each line precisely
identifying a data item like this for example:

/1/1/dataidentifier/1/type = 1

then you could just scan the file line by line and get all you need.
there are tools used to generate parsers. the original ones are called
lex, and yacc. lex would split your stream in to tokens, and yacc would
check if the resulting sequence of tokens satisfies the grammar. i am
not sure if there are parser generators for ruby, although it is
comparatively easy to write them because they are based on a sound
theory.

hope this helps.
konstantin


#9

On Thursday 08 December 2005 02:13 pm, Oscar G. wrote:

                          ["count"] = 1,
         },
                          ["type"] = 1,
                          ["x"] = 27.5,
                          ["count"] = 5,
                          ["image"] = 3,
                          ["y"] = 36.8,
                      },
         },

If you can count on indentation like you have above, the easy way might
be to
run it through the OutlineParser object of Node.rb
(http://www.troubleshooters.com/projects/Node.rb/index.htm). Once the
data is
in a Node tree instead of a file, you can use Walker objects and simple
callbacks to put massage the data and then output it in any form you’d
like,
including XML or SQL.

If you cannot count on the indentation, you could remove all indentation
with
a simple sed script, then run a Ruby program to convert every opening
brace
to a new level of indentation and convert ever closing brace to a
previous
level of indentation, and then use that conversion through Node.rb’s
parser.

SteveT

Steve L.
http://www.troubleshooters.com
removed_email_address@domain.invalid


#10

On 12/8/05, Oscar G. removed_email_address@domain.invalid wrote:

akonsu wrote:

How accurate is this example? Just wondering if the mockup has
copy/paste errrors. more below…

                      },
                    [1] = {

Does the [1] really repeat here, or should this be [2] (or some other
number)?

                          ["type"] = 1,
                          ["x"] = 21.5,
                          ["count"] = 5,
                          ["image"] = 4,
                          ["y"] = 31.8,
                      },

Should there be a ‘}’ here to close off the ‘dataidentifier’?

                    [1] = {
                          ["type"] = 1,
                          ["x"] = 27.5,
                          ["count"] = 5,
                          ["image"] = 3,
                          ["y"] = 36.8,
                      },
         },

I’m assuming any missing ‘}’ here would be at the end of the file.

If my guesses are right, it wouldn’t be too tough to convert this
quickly with something along the lines of:

require ‘pp’

text = File.read(‘some.log’)
text.gsub!(/Datagroup = /, ‘’)
text.gsub!(/["?(.*?)"?] =/, ‘"\1" =>’)

datagroup = eval(text)

pp datagroup


#11

On Dec 8, 2005, at 4:34 PM, Oscar G. wrote:

Well I do’nt have control over the format of the files…

If it helps at all… I believe the syntax of the files I’m parsing is
Lua based.


Posted via http://www.ruby-forum.com/.

Perhaps this is a matter of the right tool for the job. Maybe you
want to consider using Lua for this project if they really are Lua
files.


#12

Well thats a lot of info so I have to digest on it. I’ll post back as
soon as I have a better grasp of your responses… However I do not want
to use Lua for this because I want to learn Ruby… I don’t think its a
problem that the data is in Lua syntax, from what I can see, it doesnt
matter what format the data is in. It seems to be a matter of finding a
pattern and being able to merge the data from two files into one.

For the purpose of being accurate on the smple, I’ve posted the file on
my site so maybe if you see the actual file I’m working with you’ll get
a better idea of what I want.

http://www.muychingon.com/gatherer.txt


#13

On 12/8/05, Oscar G. removed_email_address@domain.invalid wrote:

http://www.muychingon.com/gatherer.txt

Ok, seems I was right about the format. Don’t peek if you want to
solve it on your own :wink:

http://www.mvgo.com/anarchy/lua.rb.txt

a fun little ruby quiz (I think I got it right).


#14

On Thursday 08 December 2005 06:39 pm, Oscar G. wrote:

http://www.muychingon.com/gatherer.txt

Below my sig is a 45 line program using Node.rb that converts the file
into
Node objects, each with a name and value. You can see how Walker objects
and
callback routines work. In order to output your chosen format (which I
didn’t
completely understand), you’d need to create probably a couple more
Walkers
and a couple more callback routines.

This program assumes consistent indentation. If that cannot be assumed,
you
need to either do something else (maybe what Bill G. suggested), or
create a tiny brace to indent converter and then run the result through
my
program.

HTH

SteveT

Steve L.
http://www.troubleshooters.com
removed_email_address@domain.invalid

#!/usr/bin/ruby
require “Node.rb”

class Callbacks
def cb_look_data(checker, level)
print “\t” * level
print "Name = ", checker.name
print ", Value = " , checker.value unless checker.firstchild
print “\n”
end

def cb_get_fields(checker, level)
	if checker.value =~ /\s*}/
		checker.deleteSelf()
	end
	checker.value.gsub!(/,\s*$/, "")
	checker.value.strip!
	checker.value =~ /\[([^\]]*)\]/
	checker.name = $1 if $1
	if level == 1
		checker.value =~ /(.*)\s*=/
		checker.name = $1 if $1
	end

	checker.value =~ /=\s*(.*)/
	checker.value = $1 if $1
	checker.value = "" if checker.value == "{"
end

end

cb = Callbacks.new() # INSTANTIATE CALLBACKS OBJECT

PARSE THE FILE

parser = OutlineParser.new()
head = parser.parse("/home/slitt/gatherer.txt")

PARSE THE NODE TREE NODES INTO NAME AND VALUE FIELDS

walker = Walker.new(head, cb.method(:cb_get_fields), nil)
walker.walk()

PRINT THE NAME FIELDS FOR CONTAINERS,

AND NAME AND VALUE FIELDS FOR LEAF LEVELS

walker = Walker.new(head, cb.method(:cb_look_data), nil)
walker.walk()


#15

Steve L. removed_email_address@domain.invalid wrote:

If you can count on indentation like you have above, the easy way might be to
run it through the OutlineParser object of Node.rb
(http://www.troubleshooters.com/projects/Node.rb/index.htm). Once the data is

Very nice piece of software indeed.

martin


#16

On Friday 09 December 2005 09:47 am, Martin DeMello wrote:

Steve L. removed_email_address@domain.invalid wrote:

If you can count on indentation like you have above, the easy way might
be to run it through the OutlineParser object of Node.rb
(http://www.troubleshooters.com/projects/Node.rb/index.htm). Once the
data is

Very nice piece of software indeed.

martin

Thanks Martin,

It should be nice. I’ve written it in three different languages so far
:slight_smile:

I use VimOutliner (http://www.vimoutliner.org) to create tab indented
outlines, and find that Node.[pm py rb] makes processing outlines
trivial for
substantial jobs, and doable for arduous ones (like converting an
outline
into a menu system).

Thanks for the compliment.

SteveT

Steve L.
http://www.troubleshooters.com
removed_email_address@domain.invalid