Ruby-based data language

Has anyone ever endeavored to create a data/configuration file format
based on Ruby’s syntax? In other words, a format like YAML but with a
Ruby-based syntax.

I tried to do this the easy way using $SAFE=4, but #method_missing
doesn’t appear to work at that safe level.

Thomas S. wrote:

Has anyone ever endeavored to create a data/configuration file format
based on Ruby’s syntax? In other words, a format like YAML but with a
Ruby-based syntax.

Which bit of Ruby syntax are you thinking of?

If you wanted key/value pairs in a Hash, then you might as well use
JSON. It’s not pretty for config files, but it’s OK.

If you wanted

foo = 123

then that’s hard to make work, although maybe you could frig it with
binding and local_variables. You could instead use the Rails way:

config.foo = 123

or use constants:

module Config
  Foo = 123
end

or globals:

$foo = 123

The Sinatra way would be:

set :foo, 123
set(:bar) { delayed_expr }

Maybe what you’re trying to do is

foo 123

I don’t see a particular problem with that, even with $SAFE=4 and
method_missing:

$h = {}
=> {}

def method_missing(k,v); $h[k] = v; end
=> nil

t = Thread.new { $SAFE=4; eval “foo 123” }
=> #<Thread:0x7f4eb7749a48 dead>

t.join
SecurityError: (irb):2:in []=': Insecure: can't modify hash from (irb):3 from (irb):4:injoin’
from (irb):4
from :0

$h.taint
=> {}

t = Thread.new { $SAFE=4; eval “foo 123” }
=> #<Thread:0x7f4eb7726930 dead>

t.join
=> #<Thread:0x7f4eb7726930 dead>

$h
=> {:foo=>123}

This is using ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

On Sep 19, 5:38 am, Brian C. [email protected] wrote:

If you wanted
module Config
set(:bar) { delayed_expr }

Maybe what you’re trying to do is

foo 123

Yes, I should have been more specific. This is what I mean. A longer
example:

name “Joe Foo”
age 33
contact do
email “[email protected]
phione “555-555-1234”
end

=> #<Thread:0x7f4eb7726930 dead>>> $h

=> {:foo=>123}

This is using ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

Ah. You are right. I made a mistake in my experiments. And thank you
for this example, #taint made all the difference.

So now the question: is $SAFE=4 safe enough? My gut says no. Even
though the above works, I think ultimately a true data format is
desirable for most needs, such as configuration files. I think Ruby
has a very nice syntax for it. Such a format would be to Ruby as JSON
is to Javascript. And Ruby has a nice advantage with blocks.

Thomas S. wrote:

name “Joe Foo”
age 33
contact do
email “[email protected]
phione “555-555-1234”
end

You could make a parser for that fairly easily - or transform it to JSON
or YAML and parse that (assuming that contact do … end constructs a
nested Hash)

Such a format would be to Ruby as JSON is to Javascript. And Ruby has a nice advantage with blocks.

That’s where I disagree with you. You are using blocks in a Builder or
Markaby way, but that’s not the fundamental purpose or interpretation of
blocks in the language. And how would you handle Arrays?

The Ruby equivalent of JSON would be a literal Hash:

{
“name”=>“Joe Foo”,
“age”=>33,
“contact”=>{
“email”=>“[email protected]”,
“phone”=>“555-555-1234”,
},
}

Or if you buy into 1.9 syntax, then you could have

{
name: “Joe Foo”,
age: 33,
contact: {
email: “[email protected]”,
phone: “555-555-1234”,
},
}

Both are similar enough to JSON that I’d use that instead, and gain the
portability benefit. It’s a shame that JSON doesn’t allow trailing
commas.

On Sep 21, 3:25 pm, Brian C. [email protected] wrote:

nested Hash)
“name”=>“Joe Foo”,
{
portability benefit. It’s a shame that JSON doesn’t allow trailing
commas.

That’s a good point. But if you look at Ruby-based configuration files
they always use block-based DSL notations, not hashes. So the analogy
isn’t over the data structure, but rather the use of the underlying
language as a syntax model.

On Tue, Sep 21, 2010 at 9:59 AM, Intransition [email protected]
wrote:

So now the question: is $SAFE=4 safe enough? My gut says no. Even
though the above works, I think ultimately a true data format is
desirable for most needs, such as configuration files. I think Ruby
has a very nice syntax for it. Such a format would be to Ruby as JSON
is to Javascript. And Ruby has a nice advantage with blocks.

Enough for what? To trust arbitrary configurations that are executable
Ruby?

Trans, if you go way way back in the archives, you’ll find a longish
running series of posts between Guy Decoux and a few other people
about $SAFE=4 and various clever hacks to try to, with Ruby, create an
environment that was truly safe. Guy showed in that typically terse,
let-my-code-speak-for-me way that he had, that every very clever Ruby
approach that was conceived of could also be circumvented.

The only way to have a “Ruby” data format that can contain executable
Ruby is to do something like what _Why did with his sandbox (which,
IIRC, came about in part because of this thread involving Guy) – it
has to be supported at a very low level in the language.

You could create a Ruby-like format that is itself simply interpreted,
and if that’s what you are talking about, I’d say go for it. But the
moment you allow anything in that format to be executed as_Ruby you
can no longer trust arbitrary configurations.

Kirk H.

Thomas S. wrote:

But if you look at Ruby-based configuration files
they always use block-based DSL notations, not hashes.

I’d have said most packages use YAML. If you can provide examples which
use method_missing and block-based structure just for conveying static
configuration information, I’d be interested to see them. But then
that’s what you asked for in the first place :slight_smile:

So the analogy
isn’t over the data structure, but rather the use of the underlying
language as a syntax model.

The difference is that JSON is a direct subset of Javascript, and can
be eval’d directly to give a value.

Let me put it another way. If you were parsing your example:

name “Joe Foo”
age 33
contact do
email “[email protected]
phione “555-555-1234”
end

would you expect as Hash oas the result?

If yes: then it looks like you’re proposing an alternative syntax for
Hash literals - one which isn’t widely used. It’s Ruby-inspired but not
really Ruby, since a simple eval of the above will fail without
additional supporting code. It would be similar to Builder, but (a)
creating a Hash as its output instead of XML, and (b) intended for use
with untrusted inputs, so parsed rather than eval’d.

If no: then what do you expect to get instead? I suppose you could have
a stream parser API, and trigger start/end actions as you go, but that’s
not a very convenient API for reading a config file.

If yes: then it looks like you’re proposing an alternative syntax for
Hash literals - one which isn’t widely used. It’s Ruby-inspired but not
really Ruby, since a simple eval of the above will fail without
additional supporting code.

That is, to parse that record using eval I think you need something
along these lines:

class Parser # < BasicObject ??
def self.parse(*args, &blk)
o = new
o.instance_eval(*args, &blk)
o.instance_variable_get(:@value)
end

def initialize
@value = {}
end

def method_missing(label, value=:MISSING,&blk)
if value != :MISSING
raise “Cannot provide both value and block for #{label}” if
block_given?
@value[label] = value
else
raise “Must provide value or block for #{label}” unless
block_given?
@value[label] = self.class.parse(&blk)
end
end
end

person = Parser.parse <<‘EOS’
name “Joe Foo”
age 33
contact do
email “[email protected]
phione “555-555-1234”
end
EOS
p person

Now, to get the same capabilities as JSON, you also need syntax for
Arrays. You could do something like the following, although note that
the parser above doesn’t handle this properly:

people([
person do
name “Joe”
age 33
end,
person do
name “Fred”
age 64
end,
])

You do need both parentheses and square brackets, unless you replace
do/end with braces:

people [
person {
name “Joe”
age 33
},
person {
name “Fred”
age 64
},
]

Add a few colons and commas and you’re back to JSON. Or drop the closing
braces and brackets and keep the indentation, and you’re back to YAML.

You could treat multiple arguments as an array:

people
person {
name “Joe”
age 33
},
person {
name “Fred”
age 64
}

but then if you wanted a one-element array it would have to be a special
case. Or you could perhaps have a flag to indicate that you want an
Array:

people [],
person {
name “Joe”
age 33
},
person {
name “Fred”
age 64
}

JSON and YAML may be ugly, but IMO so is this.

Thomas S. wrote:

On Sep 22, 7:20�am, Brian C. [email protected] wrote:

I’d have said most packages use YAML. If you can provide examples which
use method_missing and block-based structure just for conveying static
configuration information, I’d be interested to see them. But then
that’s what you asked for in the first place :slight_smile:

I think a Gemfile is a pretty good example

Do you mean a Gemfile from Bundler? e.g.

source “http://rubygems.org

gem “rails”, “3.0.0.rc”
gem “rack-cache”
gem “nokogiri”, “~> 1.4.2”

Some attributes have one value, some attributes have more than one
value, some attributes can be repeated. I guess a generic parser API for
that would be a call to callback(key, *args), just like method_missing,
or it could build a linear data structure like

[[:source,“http://rubygems.org”],
[:gem,“rack-cache”],
[:gem,“nokogiri”,“~> 1.4.2”]]

I’m not sure what output you’d want from nested blocks.

I would expect to get an object that gave me access to the data. What
kind of object depends on the parser.

If the parser is generic (not application-specific) then I guess you’d
just get the structure I’ve shown above. If you want to build that into
application-specific objects, then you can do so. For example, for each
“gem” line you might want to build a Gem object and append it to an
Array of gems.

I get what you are saying. But even JSON goes through a parser in
Javascript

It can go through a parser for security reasons, but the result from
parsing it is exactly the same as if you read it in as a Javascript
literal. And there are cases where you intentionally interpret it as
Javascript code, e.g. JSONP.

when someone thinks “Ruby-
syntax data/config format” they are thinking builder-style, not hash
literals.

Perhaps. I think more common would be a Rails-style configuration or a
gemspec:

Gem::Specification.new do |s|
s.name = %q{snailgun}
s.version = “1.0.6”
… etc
end

Of course, the advantage of having it as real executable code is that
you can use Ruby to assemble data constructs, and conditional inclusion.

s.files = [ …lots of items… ]
s.files.concat [ …more items… ]

if s.respond_to? :specification_version then
s.specification_version = 2
end

Hi Tom,

maybe the Doodle gem does what you’re after:

require ‘doodle’

class Contact < Doodle
has :email, :kind => String
has :phone, :kind => String do
must “be of form xxx-xxx-xxxx” do |s|
s =~ /\d{3}-\d{3}-\d{4}/
end
end
end

class Person < Doodle
has :name, :kind => String
has :age, :kind => Integer
has Contact
end

class People < Doodle
has :people, :collect => Person
end

people = People do
person do
name “Joe Foo”
age 33
contact do
email “[email protected]
phone “555-555-1234”
end
end
end

p people
END
#<People:0x8c22ec0 @people=[#<Person:0x8c31100 @name=“Joe Foo”,
@age=33, @contact=#<Contact:0x8c3ccbc @email=“[email protected]”,
@phone=“555-555-1234”>>]>

Regards,
Sean

On Sep 22, 3:17 pm, Brian C. [email protected] wrote:

just get the structure I’ve shown above.
That’s one way. The parser could just offer a couple of options to
vary how the generic structure is formed.

literals.

Perhaps. I think more common would be a Rails-style configuration or a
gemspec:

Gem::Specification.new do |s|
s.name = %q{snailgun}
s.version = “1.0.6”
… etc
end

That’s another way to do it, yes. But can it be made a pure data
format?

Of course, the advantage of having it as real executable code is that
you can use Ruby to assemble data constructs, and conditional inclusion.

s.files = [ …lots of items… ]
s.files.concat [ …more items… ]

if s.respond_to? :specification_version then
s.specification_version = 2
end

But there’s the disadvantage here too --it’s no longer just data.
Depends on what you’re trying to achieve.

On Sep 22, 7:20 am, Brian C. [email protected] wrote:

I’d have said most packages use YAML. If you can provide examples which
use method_missing and block-based structure just for conveying static
configuration information, I’d be interested to see them. But then
that’s what you asked for in the first place :slight_smile:

I think a Gemfile is a pretty good example --yes you can use
conditional code in these files, but IMO it’s bad design.

Also, configuration | RubyGems.org | your community gem host looks fairly popular and
it is pretty close to what I’m talking about.

age 33
additional supporting code. It would be similar to Builder, but (a)
creating a Hash as its output instead of XML, and (b) intended for use
with untrusted inputs, so parsed rather than eval’d.

If no: then what do you expect to get instead? I suppose you could have
a stream parser API, and trigger start/end actions as you go, but that’s
not a very convenient API for reading a config file.

I would expect to get an object that gave me access to the data. What
kind of object depends on the parser. If only #eval were the parser
and nothing more, I’d expect a method missing error.

I get what you are saying. But even JSON goes through a parser in
Javascript and is not simply evaled, for the same reasons I would like
to see a Ruby-syntax data/config format. And when someone thinks “Ruby-
syntax data/config format” they are thinking builder-style, not hash
literals.

On Sep 22, 11:37 am, Brian C. [email protected] wrote:

o = new
  raise "Cannot provide both value and block for #{label}" if

person = Parser.parse <<‘EOS’
Arrays. You could do something like the following, although note that
end,
person {
people
case. Or you could perhaps have a flag to indicate that you want an
}
You make a good point about arrays. I’m thinking about configuration
files as the common use case, so complex arrays aren’t as common. But,
maybe there are better solutions in anycase:

people do
person :name=>“Joe”, :age=>33,
preson :name=>“Fred”, :age=> 64
end

The format is Ruby syntax so it can handle hash arguments. Or,
alternately:

people [“Joe”, 33], [“Fred”, 64]

The underlying app would know how to turn this into persons. We might
even do:

people [‘name’, ‘age’],
[‘Joe’, 33],
[‘Fred’, 64]

On Sep 22, 11:22 pm, “Sean O’Halpin” [email protected] wrote:

  s =~ /\d{3}-\d{3}-\d{4}/

class People < Doodle
end
Sean
Close. The only thing with Doodle is that you have to pre-define the
data structure. Hmm… Doodle would be something akin to a Schema
language for the format I am proposing.