Parsing data using dhaka (http://dhaka.rubyforge.org/)

emmanuel · April 4, 2007, 7:25pm

Hello! I requested help from Mushfeq, the author of dhaka, but he seems
to be busy Is anyone familiar with dhaka? (http://dhaka.rubyforge.org/)

My problem is:

I succesfully built my lexer, my grammar and a parser for the grammar:
“NaturalParser < Dhaka::CompiledParsed”.

Now i can do:

lexed_program= NaturalLexer.lex(source_code) #Natural is the
language name
parse_result = NaturalParser.parse(lexed_program)

And all is ok. Now i want to pretty print the source code: My objective
is to beautify the source code, colorize keywords, and extract some info
(ala-doxygen but for my language).

How do you sugest me to use the parse_result object?<<<<

I think in this case a Dhaka::Evaluator would be very messy (i’m trying
unseccessfully to built one NaturalPrettyPrinter < Dhaka::Evaluator
Object), the thing i’m thinking in is doing a tree walker for the
parse_result object and printing the nodes only if they are leafs, but i
don’t get how to do it!

Thank you very much! Here’s the original message i send to the author of
dhaka:

Mushfeq:

Hello, is Emmanuel again. I have succesfully written a grammar for the
actual “code-part” of the language i’m parsing. (my language is divided
in a define-data part and a code part). For the define data part i
followed your advice and use groups of regular expressions. For the code
part, i have built a Dhaka::CompiledParser.

My goal is to build a doxygen-like documentor for my ugly,
uglier-than-COBOL “enterprisey” language .

Now, I have to choose how to continue. Since i want to pretty-print the
source code and extract some information while doing it, what would be
your advice?:

Build a tree walker to the Dhaka::CompiledParser#parse result
or
use a Dhaka::Evaluator?

If i choose a Dhaka::Evaluator, how would you sugest i can use it for my
task?
Thank you verry much!!!

Emmanuel O.

emmanueloga.blogspot.com

----- Original Message -----
From: Mushfeq K.
To: Emmanuel O.
Sent: Friday, March 23, 2007 2:10 AM
Subject: Re: Dhaka Help

Emmanuel,

I don’t envy your situation. This doesn’t look like a pleasant language
to have to deal with.

It seems that Natural uses integer prefixes to specify levels of
nesting. This has serious problems. Basically you can’t detect nesting
at the parsing state, which means that you have to figure it out at the
evaluation stage.

This is only a sketch - I haven’t tried running this.

DEFINE, DATA, LOCAL and END_DEFINE are keywords.

for_symbol(‘data_definition’) do
definition_block %w| DEFINE DATA LOCAL definitions END-DEFINE|
end

for_symbol(‘definitions’) do
single_definition %w| definition |
multiple_definitions %w| definitions opt_newline definition |
end

for_symbol(‘definition’) do
definition %w| int_literal word_literal opt_type |
end

for_symbol(‘opt_type’) do
no_type %||
type %| ( word_literal ) |
end

You can certainly pursue this, but since you’re detecting nesting by
looking at the value of the int_literal in the expansion for definition
(the parser doesn’t know that ‘2’ is a deeper level of nesting than
‘1’), the evaluator does most of the work. At that point it seems that
you might even be better off processing the thing line by line and using
regexes and capture groups on each line (Ruby’s regexes are much faster
than Dhaka’s regexes - the difference between a C implementation and a
Ruby one) to extract the information.

Hope this helps.

Mushfeq.

On 3/22/07, Emmanuel O. [email protected] wrote:
Hello Mushfeq, i wan’t to thank you for publishig such a usefull library
(dhaka)!

I have a difficult parser to implement, and i wish you could helpme with
some issues. Thank you in advance if you decide to read the whole thing
I wanted to ask you if you have more example grammars, specially one
that can parse a typed language.

I’m dealing with an horrible such-called 4th generation language that i
want to parse,
(softwareag’s Natural www.softwareag.com if you are curious, some tells
me that is similar to COBOL).

My objective is to program a doxygen-like code analyzer to help me
navigate my one-and-a-half-million-of-natural-lines code base.

This is what a tipycal chunk of natural looks like:
DEFINE DATA
LOCAL
1 CUSTOMER VIEW OF PRO_CUSTOMER
2 CUSTOMER_NO
2 METH_OF_DELIVERY
2 NO_COPIES
2 UPD_PROGRAM
END-DEFINE

RC. READ CUSTOMER BY CUST_STATUS_NO
IF UPD_PROGRAM EQ ‘INVOICE’
ESCAPE TOP
END-IF

GC. GET CUSTOMER *ISN (RC.)
MOVE 'FACTURA' TO UPD_PROGRAM
SUBTRACT 1 FROM NO_COPIES
UPDATE (GC.)
END TRANSACTION

END
I decided to fragment my grammar in two:
one grammar for the DEFINE-DATA … END-DEFINE part
other grammar for the body (the actual code).
Do you think dhaka would be suitable for such a language?

How do you sugest i can implement the grammar for the structured
variable definition? i.e. which grammar would parse this? :
1 MY_VAR (A5)
2 MY VAR_1 (A2)
2 MY_VAR_2 (A3)
In natural, this means:
if MY_VAR, an alphanumeric variable of length 4 has the string “HELLO”,
MY_VAR_1 is ==“HE” and
MYVAR_2 is == “LLO”
A can ve anything from A1 to A255

etc…

THANK YOU VERY MUCH IN ADVANCE

Emmanuel O.,
http://emmanueloga.blogspot.com

emmanuel · April 5, 2007, 2:08am

Emmanuel,

Apologies for not replying sooner. I’m going to answer this question off
the
list.

Mushfeq.

Parsing data using dhaka (http://dhaka.rubyforge.org/)

Thank you very much! Here’s the original message i send to the author of dhaka:

Thank you very much! Here’s the original message i send to the author of
dhaka: