Hello! I requested help from Mushfeq, the author of dhaka, but he seems to be busy Is anyone familiar with dhaka? (http://dhaka.rubyforge.org/) My problem is: I succesfully built my lexer, my grammar and a parser for the grammar: "NaturalParser < Dhaka::CompiledParsed". Now i can do: lexed_program= NaturalLexer.lex(source_code) #Natural is the language name parse_result = NaturalParser.parse(lexed_program) And all is ok. Now i want to pretty print the source code: My objective is to beautify the source code, colorize keywords, and extract some info (ala-doxygen but for my language). >>>>How do you sugest me to use the parse_result object?<<<< I think in this case a Dhaka::Evaluator would be very messy (i'm trying unseccessfully to built one NaturalPrettyPrinter < Dhaka::Evaluator Object), the thing i'm thinking in is doing a tree walker for the parse_result object and printing the nodes only if they are leafs, but i don't get how to do it! ------------------------------------------------------------------------------- Thank you very much! Here's the original message i send to the author of dhaka: ------------------------------------------------------------------------------- Mushfeq: Hello, is Emmanuel again. I have succesfully written a grammar for the actual "code-part" of the language i'm parsing. (my language is divided in a define-data part and a code part). For the define data part i followed your advice and use groups of regular expressions. For the code part, i have built a Dhaka::CompiledParser. My goal is to build a doxygen-like documentor for my ugly, uglier-than-COBOL "enterprisey" language ;-) . Now, I have to choose how to continue. Since i want to pretty-print the source code and extract some information while doing it, what would be your advice?: Build a tree walker to the Dhaka::CompiledParser#parse result or use a Dhaka::Evaluator? If i choose a Dhaka::Evaluator, how would you sugest i can use it for my task? Thank you verry much!!! Emmanuel O. emmanueloga.blogspot.com ----- Original Message ----- From: Mushfeq K. To: Emmanuel O. Sent: Friday, March 23, 2007 2:10 AM Subject: Re: Dhaka Help Emmanuel, I don't envy your situation. This doesn't look like a pleasant language to have to deal with. :) It seems that Natural uses integer prefixes to specify levels of nesting. This has serious problems. Basically you can't detect nesting at the parsing state, which means that you have to figure it out at the evaluation stage. This is only a sketch - I haven't tried running this. DEFINE, DATA, LOCAL and END_DEFINE are keywords. for_symbol('data_definition') do definition_block %w| DEFINE DATA LOCAL definitions END-DEFINE| end for_symbol('definitions') do single_definition %w| definition | multiple_definitions %w| definitions opt_newline definition | end for_symbol('definition') do definition %w| int_literal word_literal opt_type | end for_symbol('opt_type') do no_type %|| type %| ( word_literal ) | end You can certainly pursue this, but since you're detecting nesting by looking at the value of the int_literal in the expansion for definition (the parser doesn't know that '2' is a deeper level of nesting than '1'), the evaluator does most of the work. At that point it seems that you might even be better off processing the thing line by line and using regexes and capture groups on each line (Ruby's regexes are much faster than Dhaka's regexes - the difference between a C implementation and a Ruby one) to extract the information. Hope this helps. Mushfeq. On 3/22/07, Emmanuel O. <firstname.lastname@example.org> wrote: Hello Mushfeq, i wan't to thank you for publishig such a usefull library (dhaka)! I have a difficult parser to implement, and i wish you could helpme with some issues. Thank you in advance if you decide to read the whole thing :) I wanted to ask you if you have more example grammars, specially one that can parse a typed language. I'm dealing with an horrible such-called 4th generation language that i want to parse, (softwareag's Natural www.softwareag.com if you are curious, some tells me that is similar to COBOL). My objective is to program a doxygen-like code analyzer to help me navigate my one-and-a-half-million-of-natural-lines code base. This is what a tipycal chunk of natural looks like: DEFINE DATA LOCAL 1 CUSTOMER VIEW OF PRO_CUSTOMER 2 CUSTOMER_NO 2 METH_OF_DELIVERY 2 NO_COPIES 2 UPD_PROGRAM END-DEFINE RC. READ CUSTOMER BY CUST_STATUS_NO IF UPD_PROGRAM EQ 'INVOICE' ESCAPE TOP END-IF GC. GET CUSTOMER *ISN (RC.) MOVE 'FACTURA' TO UPD_PROGRAM SUBTRACT 1 FROM NO_COPIES UPDATE (GC.) END TRANSACTION END I decided to fragment my grammar in two: one grammar for the DEFINE-DATA ... END-DEFINE part other grammar for the body (the actual code). Do you think dhaka would be suitable for such a language? How do you sugest i can implement the grammar for the structured variable definition? i.e. which grammar would parse this? : 1 MY_VAR (A5) 2 MY VAR_1 (A2) 2 MY_VAR_2 (A3) In natural, this means: if MY_VAR, an alphanumeric variable of length 4 has the string "HELLO", MY_VAR_1 is =="HE" and MYVAR_2 is == "LLO" A can ve anything from A1 to A255 etc.. THANK YOU VERY MUCH IN ADVANCE :) Emmanuel O., http://emmanueloga.blogspot.com
on 2007-04-04 21:25
on 2007-04-05 04:08
Emmanuel, Apologies for not replying sooner. I'm going to answer this question off the list. Mushfeq.