Parsing files

One thing I find myself doing over and over is parsing some type of
text file and making something sensible out of it.
As yet, I haven’t found a good solution (I’m probably missing some
gems or modules someone has already created). I think that, though
BNF is fine for writing compilers, it’s a little more complex than
need be for parsing files. I’ve recently created a solution (with
thanks to the author of OptionParser for the modus operandi). The
following is part of a routine for parsing some really horrid SAP R/3
files:

horrid file

VERSION
100
DEBUG
0
SYSTEM
PRD
HOSTNAME
198.203.4.202
SYSNUM
01

CONTAINER_ELEMENT_INFO
BASEUNITOFMEASURE 000002003C
CONTAINER_ELEMENT_VALUE
EA
CONTAINER_ELEMENT_INFO
CAUSECODEGROUP 000000008C
CONTAINER_ELEMENT_VALUE
CES

Parsing routine using my parser module

 DOTTED_QUAD_RE = '(\d+)\.(\d+)\.(\d+)\.(\d+)'


@wrk = RegexpParser.new

 @wrk.on( /VERSION\n(\d+)\n/m ) { |version| @version = version }
 @wrk.on( /DEBUG\n(\d+)\n/m ) { |debug| @debug = debug }
 @wrk.on( /SYSTEM\n([^\n]+)\n/m ) { |system| @system = system }
 @wrk.on( /HOSTNAME\n#{DOTTED_QUAD_RE}\n/m ) { |hostname|
   @hostname = hostname
 }


@wrk.on(
/CONTAINER_ELEMENT_INFO\n
([^\s]+) (?# element name )
\s+
(\d{4}) (?# unknown digits )
(\d{2}) (?# index number)
(\d{3}) (?# field width )
C\n (?# trailing literal C )
CONTAINER_ELEMENT_VALUE\n
([^\n]+)\n (?# value of the element )
/mx
) { |name, unknown, index, width, value|
case name
when /REQUIREDSTARTDATE/, /REQUIREDEND/, /DATESENT/,
/CONTRACTSTARTDATE/, /CONTRACTENDDATE/
value = yyyymmdd_to_datetime(value)

if @elements[name].nil?
@elements[name] = []
end
@elements[name][index] = value
}

Is there a better way to do this?
Should I share my parser with others (via RubyForge or the like)?

Thanks,
JJ


Help everyone. If you can’t do that, then at least be nice.

On Sat, 2006-06-24 at 04:09 +0900, John J. wrote:

Should I share my parser with others (via RubyForge or the like)?

Looks nifty to me!

Re the EBNF route, you’ve probably seen RACC already?

http://i.loveruby.net/en/projects/racc/

Yours,

Tom