C DSL anyone?

bpauly · May 2, 2007, 6:15pm

Just curious,

is anybody working on a C language DSL that could generate C code. I’m
kinda interested in how it may be possible to use RSpec to unit test C
code in a nice way.

My first attempt at a C DSL allows the following example.

require ‘cexpression.rb’

code = CTools::CCode.new

code = CTools::CCode.open do

_decl :a, :ubyte2
_decl :b, :ubyte2
_decl :c, :ubyte2

_typedef :Point do
   _struct do
      _decl :x, :ubyte4
      _decl :y, :ubyte2
      _decl :z, :ubyte1
   end
end

_decl :point, :Point

_for(a.assign(0), a < 1, a.inc(1)) do
   _while(b < 1) do
      _for(c.assign(0), c < 1, c.inc(1)) do
         _let b, b + 1
         _printf "%d %d %d", point.x, point.y, point.z
      end
   end
end

end

puts code

generates

ubyte2 a;
ubyte2 b;
ubyte2 c;
typedef struct {
ubyte4 x;
ubyte2 y;
ubyte1 z;
}Point;
Point point;
for (a = 0; a < 1; a += 1){
while (b < 1){
for (c = 0; c < 1; c += 1){
b = b + 1
printf("%d %d %d", point.x, point.y, point.z );
}
}
}

===========================
library cexpression.rb is quite simple

module CTools

# Indent a block of C code

def CTools.c_indent(text)
   indent_text = "
       "
   indent = 0;
   cont = false;
   out_text = []
   text.each_line do |line|
      line.gsub!(/^\s*/,"")
      if line =~ /\{.*\}/
         line.gsub!(/^/,indent_text[1..indent])
      else
         if line =~ /\}/
            indent = indent - 3
         end
         line.gsub!(/^/,indent_text[1..indent])
         if line =~ /\{/
            indent = indent + 3
         end
         # Convert "/**/" into blank lines
         line.gsub!(/^\s*\/\*\*\//,'')
      end
      # Indent on unmatched round brackets
      indent = indent + (  line.count("(") - line.count(")") ) * 3
      # Indent on backslash continuation
      if cont
         if line !~ /\\$/
            indent = indent - 6
            cont = false
         end
      else
         if line =~ /\\$/
            indent = indent + 6
            cont = true
         end
      end
      out_text << line
   end
   out_text.join
end

class CExpr

   # Operator
   attr_reader :op

   # Arguments ( Sub expressions )
   attr_reader :args

   def initialize(*args)
      @args = args
   end

   def to_s
     "#{@op}(" + args.collect { |a| a.to_s }.join(', ') + ")"
   end

   ##### Operators and Calls ##########


   def assign(val)
      method_missing "=", val
   end

   def inc( val )
      method_missing "+=", val
   end

   def decr( val )
      method_missing "-=", val
   end

   def method_missing(meth_id, *args)
      case meth_id.to_s
      when '=', '+=', '-=', '>=', '<=', '<', '>', '+', '-', '*', '/'
         BinOp.new(meth_id, self, *args)
      when '[]'
         ArrayOp.new(self, *args)
      else
         BinOp.new(".", self, CExpr.new(meth_id, *args))
      end
   end

end

class BinOp < CExpr
   def initialize(op, *args)
      @op = op
      @args = args
      if args.length != 2
         raise :BadNumberOfArgs
      else
      end

   end

   def to_s
      case @op
      when '.'
        "(#{args[0]}.#{CTools.debracket(args[1].to_s)})"
      else
        "(#{args[0]} #{@op} #{args[1]})"
      end
   end
end

class ArrayOp < CExpr
   def initialize(op, *args)
      @op = op
      @args = args
   end

   def to_s
     "#{op}[" + args.join(', ') + "]"
   end
end


class CVar < CExpr
   attr :name
   attr :type

   def initialize(name, type)
      @name = name;
      @type = type;
   end

   def decl
      "#{type} #{name};"
   end

   def to_s
      name.to_s
   end

end

def self.debracket(str)
      (str.gsub(/^\(/,'')).gsub(/\)$/,'')
end

class BlankSlate
    instance_methods.each { |m|
       case m
       when /^__/
       when /instance_eval/
       else
          undef_method m
       end
    }
end

class CCode
   private

   def initialize
      @buffer = ""
   end

   def new
   end

   public

   def self.open &block
      code = CCode.new
      code.instance_eval &block
      code.to_s
   end

   def method_missing(meth, *args)
      @buffer << meth.to_s.gsub(/^_/,'') << "(" << args.collect{ |a|
         case a
         when String
            # Literal strings are output quoted
            '"' + a + '"'
         else
            # Other objects are considered names
            CTools.debracket(a.to_s)
         end
      }.join(', ') << " );\n"
   end


   def scope( lf=true)
      @buffer << "{\n"
      yield
      @buffer << "}"
      @buffer << "\n" if lf
   end

   def <<( e )

      s = CTools::debracket(e.to_s)

      @buffer << s << ";\n"
   end

   def _if(cond, &block)
      @buffer << "if (#{cond})"
      scope &block
   end

   def _else(&block)
      @buffer << "else"
      scope &block
   end

   def _let(a, b)
      @buffer << "#{a} = #{CTools.debracket(b.to_s)}\n"
   end

   def _for(init, cond, inc, &block)
      init = CTools.debracket(init.to_s)
      cond = CTools.debracket(cond.to_s)
      inc  = CTools.debracket(inc.to_s)

      @buffer << "for (#{init}; #{cond}; #{inc})"
      scope &block
   end

   def _while(cond, &block)
      cond = CTools.debracket(cond.to_s)
      @buffer << "while (#{cond})"
      scope &block
   end

   def _typedef name, &block
      @buffer << "typedef "
      yield
      @buffer << name.to_s << ";\n"
   end

   def _struct (name="", &block)
      @buffer << "struct #{name}"
      # Evaluate the struct declarations in a new
      # scope
      @buffer << CTools::CCode.open do
         scope false do
            instance_eval &block
         end
      end
      @buffer << ";\n" if name != ""
   end

   # Declare a variable in scope and
   # add an instance method to retrieve
   # the symbol
   def _decl name, type="void *"
      var = CVar.new(name, type)
      @buffer << var.decl << "\n"
      self.class.send(:define_method, name){ var }
   end

   def to_s
      CTools.c_indent @buffer
   end
end

end

bpauly · May 2, 2007, 7:00pm

On Thu, 3 May 2007, Brad P. wrote:

Just curious,

is anybody working on a C language DSL that could generate C code. I’m kinda
interested in how it may be possible to use RSpec to unit test C
code in a nice way.

check out RubyInline - it is exactly this

-a

bpauly · May 2, 2007, 8:26pm

Brad P. wrote:

Just curious,

is anybody working on a C language DSL that could generate C code. I’m
kinda interested in how it may be possible to use RSpec to unit test C
code in a nice way.

Here’s one:

http://raa.ruby-lang.org/project/cgenerator/

There’s an example at:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/170530

IMO, cgen’s shadow class mechanism is what differentiates cgen from
RubyInline. (Shadow classes give you an easy way of defining and
inheriting T_DATA structs as if they were just part of your ruby
classes, using the shadow_attr_accessor methods.) RubyInline is probably
more sophisticated in many ways (compiler options, storing intermediate
code securely, availability as a gem).

I’ve been using cgen since 2001 for boosting the performance of
numerical integration and simulations. I’ve also used it to wrap some
libraries that I didn’t want to bother with swig for.

bpauly · May 3, 2007, 10:07am

On 5/3/07, Brad P. [email protected] wrote:

Just curious,

is anybody working on a C language DSL that could generate C code. I’m
kinda interested in how it may be possible to use RSpec to unit test C
code in a nice way.

I just read an interview that contained some of your keywords there:
“… I’m actually working on a tool right now I call cuby. It’s a
dialect of ruby that generates C code directly.”

Here’s the link:

It appears to be all jumbled up in a re-implementation of Ruby, but
might be interesting for you to check out.

hth,
-Harold

bpauly · May 3, 2007, 10:18am

On May 3, 2007, at 4:18 PM, SonOfLilit wrote:

neighbours. With C, I don’t see such gains overcoming the loss.

Aur

Indeed, the OP is trying to generate C rather than write C. Very
understandable. Certainly doable. It just means a lot less of the
flexibility that C usually has (for good and bad).
But actually it is very possible to generate corresponding looping
structures, structs, functions and more.
Need to generate declarations of many variables.
To get started, find a C file that follows the conventions you want
to have generated. Rome was not built in a day and no generator will
be either.
In theory all languages end up the same in ASM anyway.
The main problem is that it must be a DSL because some Ruby
mechanisms wouldn’t translate so smoothly to C.
They’d be more like Objective-C.
That said, there’s a lot of typing that could be saved.
Perhaps the most painful things would be dealing with pointers and
memory management.

bpauly · May 3, 2007, 9:20am

Am I the only one that thinks OP is looking for a library to assist in
generating ascii C code, like Markaby does for HTML, and not for
executing C code that you wrote as a string?

Brad, I don’t think you’ll find one, and in fact, I don’t think you’ll
need one. Why? C has so much syntax that you’re better off generating
it with string manipulation than with a DSL.

HTML is so simple that there was more gain (succintness,
automatability) than loss (new language to learn) in Markaby and it’s
neighbours. With C, I don’t see such gains overcoming the loss.

Aur

bpauly · May 3, 2007, 6:03pm

Paul B. wrote:

I wrote a tool a while back I called rubypp (pp as in preprocessor),
which lets you do something like this:
…
#ruby def foo(x) ;
x.to_s.chop ;
end

How is this def being used in the output?

bpauly · May 3, 2007, 4:23pm

I wrote a tool a while back I called rubypp (pp as in preprocessor),
which lets you do something like this:

#include

Ruby <<END
puts ‘#define FOO BAR’
‘#define BAR BAZ’
END

Ruby def foo(x) ;
x.to_s.chop ;
end

extern “C” {
int foo(int a) {
std::cout << “1” << std::endl;
}

int foo(double a) {
std::cout << “2” << std::endl;
}

main() {
foo(1);
foo(1.0);
std::cout << “#{foo(1.0)}” << std::endl;
}

which produces this output:

#include

#define FOO BAR
#define BAR BAZ

extern “C” {
int foo(int a) {
std::cout << “1” << std::endl;
}

int foo(double a) {
std::cout << “2” << std::endl;
}

main() {
foo(1);
foo(1.0);
std::cout << “1.” << std::endl;
}

The syntax is a little odd, but it’s surprisingly powerful. I use it to
generate code for nodewrap. You can find it at:

http://rubystuff.org/rubypp/rubypp.rb

Paul

bpauly · May 3, 2007, 9:20pm

Paul B. wrote:

I wish I had a more realisitc example; I think then it would be clearer.

std::cout << “#{foo(1.0)}” << std::endl;

std::cout << “1.” << std::endl;

My bad. I just missed that. So the #ruby stuff is for defining utility
functions that can be used inside the C code templates?

bpauly · May 3, 2007, 6:37pm

On Fri, May 04, 2007 at 01:03:20AM +0900, Joel VanderWerf wrote:

Paul B. wrote:

I wrote a tool a while back I called rubypp (pp as in preprocessor),
which lets you do something like this:
…
#ruby def foo(x) ;
x.to_s.chop ;
end

How is this def being used in the output?

I wish I had a more realisitc example; I think then it would be clearer.

std::cout << “#{foo(1.0)}” << std::endl;

std::cout << “1.” << std::endl;

Paul

bpauly · May 4, 2007, 12:13am

On 5/3/07, SonOfLilit [email protected] wrote:

automatability) than loss (new language to learn) in Markaby and it’s
neighbours. With C, I don’t see such gains overcoming the loss.

I’ve written compilers for high-level languages before that targeted C
rather than ASM. It’s not a bad approach, if you understand how C is
optimized. It’s possible to generate C that will compile to something
pretty
fast. Although these days, memory-bus bandwidth is a much more
constrained
resource than it ever was in the past (mostly because everything else
has
gotten so much faster), and that adds a level of complexity.

Is the point of this to get better performance? Not if you keep the
essential Ruby features (open classes and all the rest). Is the point
to
save typing? Maybe, but I’ve always found that the vast majority of the
time
spent in writing C goes not into typing but into either planning or
debugging. (The more planning you do, the less debugging, and vice
versa.)

bpauly · May 5, 2007, 5:25pm

Related to what the OP was asking for, I’ve been thinking of
implementing a DSL in Ruby for defining message types for a C-based
distributed framework. We’re building a distributed system that
consists of tasks sending each other messages. The messages have types
and payloads.

As things are now, we have to define message types by hand. This
includes defining a payload type, which might be as simple as a single
int, a struct consisting of primitive types, or a hierarchical struct
containing dynamic sized variables. In addition to the payload type we
have to implement functions for marshalling/unmarshalling the payload
and to print the contents out as text.

Writing all this for simple payload types is relatively painless, but
boring. Writing all this for complex hierarchical payloads with
dynamic sized variables is painful, boring, and extremely error prone,
because it usually involves a lot of copy-pasting.

What I have at the moment is something like this (this is from memory,
because I don’t have the code at hand):

messages “c-file-basename” do |msgs|
msgs.define_message “message_type_name” do |m|
m.add_member :uin32_t, “uint32_t_variable_name”
m.add_pointer :uint8_t, “uint8_t_pointer_name”
end

define other message types that will be included in the same C-file.

…

end

This would result in something like the C-code at the end of the email.

The difficulties come with more complicated messages. What if I have a
struct used elsewhere in the system that should be a part of many
different message types? For example, we describe task ids with
structs and these are passed from task to task in message payloads
quite often. One solution is to add it as a basic type to the DSL.
That is, in the same way that the DSL understands what an uint32_t is
and how to marshall and print an uint32_t, I enhance it to understand
what the struct is. The definition of the struct would be in the
common header files of the system. The down side is that whenever
someone defines a new complex type, we must implement that type in the
DSL. Me being the only Rubyist in the project would mean that I’ll end
up doing the enhancing

Another approach would be to enhance the DSL so that you can import
other files. I.e. we could define common structs in files that are
then imported to the message definitions. This to me seems overly
complicated in our case. I think it would make the DSL a lot more
complicated to implement and I’m not sure our use cases require the
functionality. We can always implement the more complex messages by
hand if the DSL is not able to handle them.

I’d appreciate any kind of input on this that the list members might
have. I’ve read as much about DSLs in Ruby as I could find on the web,
but none of them really covered what I’m trying to do.

– Lauri

– generated C-code –

/ * I wrote this on the fly to this email, so it will most likely
contains errors */

typedef struct MESSAGE_TYPE_NAME_ {
uint32_t uint32_t_variable_name;
uint32_t uint8_t_pointer_name_len;
uint8_t* uint8_t_pointer_name;
} MESSAGE_TYPE_NAME

uint32_t message_type_name_marshall(void* msg, void* buf, uint32_t
buf_len) {
MESSAGE_TYPE_NAME* my_msg = (MESSAGE_TYPE_NAME*) msg;
uint8_t* ptr = buf;

if (buf_len < (sizeof(uint32_t) + sizeof(uint8_t) *
my_msg->uint8_t_pointer_name_len)) {
return 0;
}

memcpy(ptr, &my_msg->uint32_t_variable_name, sizeof(uint32_t);
ptr += sizeof(uint32_t);

memcpy(buf, my_msg->uint8_t_pointer_name,
my_msg->uint8_t_pointer_name_len);
ptr += my_msg->uint8_t_pointer_name_len;

return (ptr - buf);
}

/* And similar unmarshall and to_text functions */

MSG_TYPE_DEF message_type_name_type = {
message_type_name_marshall,
message_type_name_unmarshall,
message_type_name_to_text
};

bpauly · May 4, 2007, 4:16am

On Fri, May 04, 2007 at 04:18:48AM +0900, Joel VanderWerf wrote:

My bad. I just missed that. So the #ruby stuff is for defining utility
functions that can be used inside the C code templates?

You can use them that way, or if the code returns non-nil, the result
will be converted to a string and inserted in the output stream, or
anything sent to stdout will also be included in the preprocessed
output.

Paul

bpauly · May 6, 2007, 4:18pm

On Sun, May 06, 2007 at 12:23:38AM +0900, Lauri P. wrote:

have to implement functions for marshalling/unmarshalling the payload
messages “c-file-basename” do |msgs|

The difficulties come with more complicated messages. What if I have a
struct used elsewhere in the system that should be a part of many
different message types?

The thought which struck me when reading this was: “ASN.1”

OK, it’s horrible, but it does pretty much exactly what you ask, and has
its
own (standard) DSL for describing the message formats. So if you could
find
a good C library which reads ASN.1 and outputs code to parse messages in
BER/DER format, maybe that would be an alternative solution.

The standards documentation is comprehensive, if not easy to read:

And of course there are probably books and other resources.

There might be Ruby libraries for handling ASN.1 directly. The only one
I
know of is the one built into openssl, which is low-level but
functional. I
used it in ruby-ldapserver, which you can find on rubyforge.org.

Regards,

Brian.

bpauly · May 7, 2007, 2:24pm

On 06/05/07, Brian C. [email protected] wrote:

The difficulties come with more complicated messages. What if I have a
struct used elsewhere in the system that should be a part of many
different message types?

The thought which struck me when reading this was: “ASN.1”

OK, it’s horrible, but it does pretty much exactly what you ask, and has its
own (standard) DSL for describing the message formats. So if you could find
a good C library which reads ASN.1 and outputs code to parse messages in
BER/DER format, maybe that would be an alternative solution.

You do have a point. I agree that ASN.1 would be able to handle all
possible message payloads.

I’ve never used ASN.1 personally and my impressions of it are that it
is, as you say, horrible. I laso feel that it is overkill in this
case. I mean that I don’t need a general solution to my problem that
is capable of describing all possible message types. I’m trying to
make it less painful for the other developers in the team to create
new message types and as far as I can tell most of the message types
we’ll have will be very simple, flat structures. We’ll occasionally
come across a more complicated messages, but those can be implemented
by hand if need be.

In many cases we do not need the marshalling and to_text functions:
to_text is used only for logging, and marshalling is only necessary if
the message is crosses process boundaries. Never the less it would be
nice to have these functions for all message types, because i) it nice
to get human readable log messages, and ii) having mashalling
functions available allows us to move components from one process to
another almost transparently without having to worry about breaking
the messaging.

The standards documentation is comprehensive, if not easy to read:
http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf
http://www.itu.int/ITU-T/studygroups/com10/languages/X.690_1297.pdf

Thanks for all the links. I’m hoping that I can avoid using ASN.1. On
the other hand we’re doing SNMP as well, so I’ll probably have to get
my hands dirty at some point.

I’ll chug along with my approach and I’ll report back if I can make it
work.

C DSL anyone?

generates

=========================== library cexpression.rb is quite simple

define other message types that will be included in the same C-file.

…

===========================
library cexpression.rb is quite simple