Hello! I requested help from Mushfeq, the author of dhaka, but he seems
to be busy Is anyone familiar with dhaka? (http://dhaka.rubyforge.org/)
My problem is:
I succesfully built my lexer, my grammar and a parser for the grammar:
“NaturalParser < Dhaka::CompiledParsed”.
Now i can do:
lexed_program= NaturalLexer.lex(source_code) #Natural is the
parse_result = NaturalParser.parse(lexed_program)
And all is ok. Now i want to pretty print the source code: My objective
is to beautify the source code, colorize keywords, and extract some info
(ala-doxygen but for my language).
How do you sugest me to use the parse_result object?<<<<
I think in this case a Dhaka::Evaluator would be very messy (i’m trying
unseccessfully to built one NaturalPrettyPrinter < Dhaka::Evaluator
Object), the thing i’m thinking in is doing a tree walker for the
parse_result object and printing the nodes only if they are leafs, but i
don’t get how to do it!
Thank you very much! Here’s the original message i send to the author of
Hello, is Emmanuel again. I have succesfully written a grammar for the
actual “code-part” of the language i’m parsing. (my language is divided
in a define-data part and a code part). For the define data part i
followed your advice and use groups of regular expressions. For the code
part, i have built a Dhaka::CompiledParser.
My goal is to build a doxygen-like documentor for my ugly,
uglier-than-COBOL “enterprisey” language .
Now, I have to choose how to continue. Since i want to pretty-print the
source code and extract some information while doing it, what would be
Build a tree walker to the Dhaka::CompiledParser#parse result
use a Dhaka::Evaluator?
If i choose a Dhaka::Evaluator, how would you sugest i can use it for my
Thank you verry much!!!
----- Original Message -----
From: Mushfeq K.
To: Emmanuel O.
Sent: Friday, March 23, 2007 2:10 AM
Subject: Re: Dhaka Help
I don’t envy your situation. This doesn’t look like a pleasant language
to have to deal with.
It seems that Natural uses integer prefixes to specify levels of
nesting. This has serious problems. Basically you can’t detect nesting
at the parsing state, which means that you have to figure it out at the
This is only a sketch - I haven’t tried running this.
DEFINE, DATA, LOCAL and END_DEFINE are keywords.
definition_block %w| DEFINE DATA LOCAL definitions END-DEFINE|
single_definition %w| definition |
multiple_definitions %w| definitions opt_newline definition |
definition %w| int_literal word_literal opt_type |
type %| ( word_literal ) |
You can certainly pursue this, but since you’re detecting nesting by
looking at the value of the int_literal in the expansion for definition
(the parser doesn’t know that ‘2’ is a deeper level of nesting than
‘1’), the evaluator does most of the work. At that point it seems that
you might even be better off processing the thing line by line and using
regexes and capture groups on each line (Ruby’s regexes are much faster
than Dhaka’s regexes - the difference between a C implementation and a
Ruby one) to extract the information.
Hope this helps.
On 3/22/07, Emmanuel O. email@example.com wrote:
Hello Mushfeq, i wan’t to thank you for publishig such a usefull library
I have a difficult parser to implement, and i wish you could helpme with
some issues. Thank you in advance if you decide to read the whole thing
I wanted to ask you if you have more example grammars, specially one
that can parse a typed language.
I’m dealing with an horrible such-called 4th generation language that i
want to parse,
(softwareag’s Natural www.softwareag.com if you are curious, some tells
me that is similar to COBOL).
My objective is to program a doxygen-like code analyzer to help me
navigate my one-and-a-half-million-of-natural-lines code base.
This is what a tipycal chunk of natural looks like:
1 CUSTOMER VIEW OF PRO_CUSTOMER
RC. READ CUSTOMER BY CUST_STATUS_NO
IF UPD_PROGRAM EQ ‘INVOICE’
GC. GET CUSTOMER *ISN (RC.) MOVE 'FACTURA' TO UPD_PROGRAM SUBTRACT 1 FROM NO_COPIES UPDATE (GC.) END TRANSACTION
I decided to fragment my grammar in two:
one grammar for the DEFINE-DATA … END-DEFINE part
other grammar for the body (the actual code).
Do you think dhaka would be suitable for such a language?
How do you sugest i can implement the grammar for the structured
variable definition? i.e. which grammar would parse this? :
1 MY_VAR (A5)
2 MY VAR_1 (A2)
2 MY_VAR_2 (A3)
In natural, this means:
if MY_VAR, an alphanumeric variable of length 4 has the string “HELLO”,
MY_VAR_1 is ==“HE” and
MYVAR_2 is == “LLO”
A can ve anything from A1 to A255
THANK YOU VERY MUCH IN ADVANCE