Making a simple parser

Hi all,

To automate some of the tests I have to run, I decided to use ruby to
generate some script files on a particular (very simple) language based
on several possible input files. My first approach at this was to use
inherited method on a parent class (which I called InputFormat) to hold
all the children in an array. Then, different formats could become a
child class of InputFormat and return a known data format (I decided to
use an array of hashes because the output is really really simple) to
the output generator code.

So the idea is something like:

class InputFormat
@children = []

def initialize(input)
@input = input
end

def parse
@children.each { |child|
child.parse(@input) if child.supported?(@input)
}
end

def self.inherited(child)
@children << child
end
end

class AInputFormat < InputFormat
def supported?
# check if we can parse this type of file
end

def parse
# parse and generate array of hashes in known format
end
end

Then on the core file I would have something like:

input = InputFormat.new(ARGV[0])
input.parse

As it turns out, this isn’t working because AInputFormat will only
inherit from InputFormat at the time I actually use it, am I right ? Any
tips you guys could give me to achieve what I want ? (from several
possible input formats generate one output format)

Felipe B. wrote in post #993252:

As it turns out, this isn’t working because AInputFormat will only
inherit from InputFormat at the time I actually use it, am I right ?

You can see when inherited() is called in these examples:

class A
def self.inherited(child)
puts ‘inherited called’
end
end

class B < A
end

–output:–
inherited called

class A
def self.inherited(child)
puts ‘inherited called’
end
end

B = Class.new(A)

–output:–
inherited called

hi felipe,

well, i don’t know how many possible input types you have, but if
they’re not too very many, you could try a very different approach:

for this test i created three files, “input.eng,” “input.esp,” and

input.fr,” each with a few lines of random text…

class Parser
attr_reader :output
def initialize(inputfile)
@output = [] ## this can of course be changed to what best suits your
purposes
@oktypes = %W[.eng .esp .fr]
self.checkType(inputfile)
end

def checkType(file)
if File.exists?(file)
if ! @oktypes.include?(File.extname(file))
puts “Unrecognized File Type”
else
@oktypes.collect{|type|
case
when file.downcase.include?(type)
self.parseInput(file)
false
end
}
end
else
puts “File Not Found”
end
end

def parseInput(file)
case
when file.downcase.include?(“.eng”)
self.engParse(file)
when file.downcase.include?(“.esp”)
self.espParse(file)
when file.downcase.include?(“.fr”)
self.frParse(file)
end
end

def loadData(inputfile)
@data = []
file = File.open(inputfile, ‘r’)
file.collect{|line| @data << line.chomp}
file.close
end

here’s where you do whatever parsing you need to, my examples are

dumb… but the important thing is that you end up with @output

def engParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.reverse}
end

def espParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.upcase}
end

def frParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.upcase.reverse}
end

end #class

test = Parser.new(“input.esp”)
puts test.output

this may be WAY too simple for what you’re trying to do, but hey,
maybe not! :wink:

  • j

Hi Jake,

jake kaiden wrote in post #993355:

class Parser
attr_reader :output
def initialize(inputfile)
@output = [] ## this can of course be changed to what best suits your
purposes
@oktypes = %W[.eng .esp .fr]
self.checkType(inputfile)
end

def checkType(file)
if File.exists?(file)
if ! @oktypes.include?(File.extname(file))
puts “Unrecognized File Type”
else
@oktypes.collect{|type|
case
when file.downcase.include?(type)
self.parseInput(file)
false
end
}
end
else
puts “File Not Found”
end
end

def parseInput(file)
case
when file.downcase.include?(".eng")
self.engParse(file)
when file.downcase.include?(".esp")
self.espParse(file)
when file.downcase.include?(".fr")
self.frParse(file)
end
end

def loadData(inputfile)
@data = []
file = File.open(inputfile, ‘r’)
file.collect{|line| @data << line.chomp}
file.close
end

here’s where you do whatever parsing you need to, my examples are

dumb… but the important thing is that you end up with @output

def engParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.reverse}
end

def espParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.upcase}
end

def frParse(file)
self.loadData(file)
@data.collect{|line|
@output << line.upcase.reverse}
end

end #class

Initially I thought about taking this approach, but frankly I don’t
know how many input files I will have, then I wanted to have an
approach so that I don’t need to mess with the core classes and
any core file. I wanted changes to be local to the place where they
are necessary. I mean, when I want to add another input format
all I would have to do would be to create a new class and the code
would just work.

With this approach, I would have keep on adding more and more
methods for doing the actual parsing of different formats and what
I wanted was to offload that to another class without touching the
caller code.

Oh well, I’ll keep on trying. I’m sure there’s some pattern for doing
just that, maybe I just didn’t implement correctly :-p

Felipe B. wrote in post #993383:

Hi Jake,
I want to add another input format
all I would have to do would be to create a new class and the code
would just work.

With this approach, I would have keep on adding more and more
methods for doing the actual parsing of different formats and what
I wanted was to offload that to another class without touching the
caller code.

a very good point - and really what inheritance is for. good luck
with a solution, i (and i imagine those who read this post) will keep
playing with the idea…

hasta otro…

-j

On Sat, Apr 16, 2011 at 9:52 PM, Felipe B. [email protected] wrote:

@children.each { |child|
def supported?

input = InputFormat.new(ARGV[0])
input.parse

As it turns out, this isn’t working because AInputFormat will only
inherit from InputFormat at the time I actually use it, am I right ? Any
tips you guys could give me to achieve what I want ? (from several
possible input formats generate one output format)

The above doesn’t work, because the @children inside the instance
method “parse” is not the same as the @children inside the class
method “self.inherited”. You have to give access to the class instance
variable, and then use that one from the parse method (the you will
see the next problem):

class InputFormat
class << self
attr_accessor :children
end

def initialize(input)
@input = input
end

def parse
self.class.children.each {|child| child.parse(@input) if
child.supported?(@input)}
end

def self.inherited(child)
(@children ||= []) << child
end
end

class AInputFormat < InputFormat
def supported?

check if we can parse this type of file

end

def parse

parse and generate array of hashes in known format

end
end

ruby-1.8.7-p334 :028 > input = InputFormat.new(“test”)
=> #<InputFormat:0xb738bf50 @input=“test”>
ruby-1.8.7-p334 :029 > input.parse
NoMethodError: undefined method supported?' for AInputFormat:Class from (irb):11:in parse’
from (irb):11:in each' from (irb):11:in parse’
from (irb):29

The next problem, as you see, is that you are defining instance
methods in the subclasses, but are calling them on the class. Maybe
the methods parse and supported? in the children could be class
methods, or maybe what you store in @children could be an instance of
the class.

Jesus.

7stud – wrote in post #993588:

Note that when inherited() is called, the methods of the subclass are
not defined yet, so if you create objects of the subclass inside
inherited(), the initialize() method in the parent is called.

And you can get around that problem by letting InputFormat#parse create
the objects:

class InputFormat
@children = []

def self.children
@children
end

def initialize(input)
@input = input
end

def parse
self.class.children.each { |child|
instance = child.new
instance.parse(@input) if instance.supported?
}
end

def self.inherited(sub_class)
@children << sub_class
end
end

class InputFormatA < InputFormat
def initialize
puts “Initializing instance of #{self.class}”
end

def supported?
true
end

def parse(str)
puts “InputFormatA is parsing #{str}”
end
end

class InputFormatB < InputFormat
def initialize
puts “Initializing instance of #{self.class}”
end

def supported?
true
end

def parse(str)
puts “InputFormatB is parsing #{str}”
end
end

input = InputFormat.new(‘hello world’)
input.parse

–output:–
Initializing instance of InputFormatA
InputFormatA is parsing hello world
Initializing instance of InputFormatB
InputFormatB is parsing hello world

I’m pretty unclear about what you are trying to do, but maybe this will
help:

class InputFormat
@children = []

def self.children
@children
end

def initialize(input)
@input = input
end

def parse
InputFormat.children.each { |child|
child.parse(@input) if child.supported?
}
end

def self.inherited(sub_class)
@children << sub_class.new(‘dummy’)
end
end

class InputFormatA < InputFormat
def supported?
true
end

def parse(str)
puts “InputFormatA is parsing #{str}”
end
end

class InputFormatB < InputFormat
def supported?
true
end

def parse(str)
puts “InputFormatB is parsing #{str}”
end
end

input = InputFormat.new(‘hello world’)
input.parse

–output:–
InputFormatA is parsing hello world
InputFormatB is parsing hello world

Note that when inherited() is called, the methods of the subclass are
not defined yet, so if you create objects of the subclass inside
inherited(), the initialize() method in the parent is called.

Hi,

“Jesús Gabriel y Galán” [email protected] wrote in post
#993452:

On Sat, Apr 16, 2011 at 9:52 PM, Felipe B. [email protected] wrote:

@children.each { |child|
def supported?

input = InputFormat.new(ARGV[0])
input.parse

As it turns out, this isn’t working because AInputFormat will only
inherit from InputFormat at the time I actually use it, am I right ? Any
tips you guys could give me to achieve what I want ? (from several
possible input formats generate one output format)

The above doesn’t work, because the @children inside the instance
method “parse” is not the same as the @children inside the class
method “self.inherited”. You have to give access to the class instance
variable, and then use that one from the parse method (the you will

aaa, you’re right :slight_smile: Good point.

see the next problem):

class InputFormat
class << self
attr_accessor :children
end

def initialize(input)
@input = input
end

def parse
self.class.children.each {|child| child.parse(@input) if
child.supported?(@input)}
end

def self.inherited(child)
(@children ||= []) << child
end
end

class AInputFormat < InputFormat
def supported?

check if we can parse this type of file

end

def parse

parse and generate array of hashes in known format

end
end

ruby-1.8.7-p334 :028 > input = InputFormat.new(“test”)
=> #<InputFormat:0xb738bf50 @input=“test”>
ruby-1.8.7-p334 :029 > input.parse
NoMethodError: undefined method supported?' for AInputFormat:Class from (irb):11:in parse’
from (irb):11:in each' from (irb):11:in parse’
from (irb):29

The next problem, as you see, is that you are defining instance
methods in the subclasses, but are calling them on the class. Maybe
the methods parse and supported? in the children could be class
methods, or maybe what you store in @children could be an instance of
the class.

I’m not instantiating AInputFormat in any part of the code… so making
those
class methods is the way to go for me :slight_smile: Thanks for the tip :slight_smile:


balbi