Creating Ruby Classes from XSD?

vtphysguy · February 23, 2006, 6:15pm

.NET ships with a tool that will generate classes directly from an XML
schema (XSD) file. Does the Ruby standard library have anything similar?
It’s a great tool on the .NET side, and I bet Ruby could do it even
better,
if not.

vtphysguy · February 23, 2006, 6:34pm

On Feb 23, 2006, at 12:12 PM, Justin B. wrote:

.NET ships with a tool that will generate classes directly from an XML
schema (XSD) file. Does the Ruby standard library have anything
similar?
It’s a great tool on the .NET side, and I bet Ruby could do it even
better,
if not.

Ok, I’m going to be honest and say first I’ve never had occasion to
use this tool. But I’m wondering what exactly is the point? Ruby’s
syntax is much less verbose than XML.
Why would you want to type this:

<?xml version="1.0" encoding="ISO-8859-1" ?>

<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”>
<xs:element name=“shiporder”>
xs:complexType
xs:sequence
<xs:element name=“orderperson” type=“xs:string”/>
<xs:element name=“shipto”>
xs:complexType
xs:sequence
<xs:element name=“name” type=“xs:string”/>
<xs:element name=“address” type=“xs:string”/>
<xs:element name=“city” type=“xs:string”/>
<xs:element name=“country” type=“xs:string”/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=“item” maxOccurs=“unbounded”>
xs:complexType
xs:sequence
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“note” type=“xs:string” minOccurs=“0”/>
<xs:element name=“quantity” type=“xs:positiveInteger”/>
<xs:element name=“price” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name=“orderid” type=“xs:string” use=“required”/>
</xs:complexType>
</xs:element>
</xs:schema>

When you can do one of these:
class Shipto < Struct.new(:name, :address, :city, :country)
end

class Item < Struct.new(:orderid, :title, :note, :quantity, :price)
end

class Shipment < Struct.new(:shipto, :item)
end

shipments = []

I may have mangled some of the XSD, I’m admitedly not familiar with
it, but generating classes froma schema seems rife with problems

vtphysguy · February 23, 2006, 7:10pm

Sometimes you don’t control the source of the information. If someone
is sending you XML documents and provides you with a schema,
hard-coding your own class to match the schema is just unnecessary
repetition.

In the spirit of DRY, it’s better to generate your class on the fly.
This allows you to instantly adapt to future schema changes, too. I
have written such an application (but not in Ruby), and it’s much
nicer than what it replaced.

In Ruby it would be really easy to do. REXML + metaprogramming and
you’re done.

-Ed

vtphysguy · February 23, 2006, 6:49pm

Well, what it really makes easy is the import of a given XML document
into
code. In .NET land, the XSD describes the format for a set of XML
documents.
The class generated is then used to import those XML documents into a
structure that it’s to work with in code than the raw DOM would be.

vtphysguy · February 23, 2006, 7:58pm

.NET ships with a tool that will generate classes directly from an XML
schema (XSD) file. Does the Ruby standard library have anything similar?
It’s a great tool on the .NET side, and I bet Ruby could do it even
better,
if not.

This may be interesting to you: http://rubyforge.org/projects/pippin/

I haven’t used it, but the author discussed it at a recent RUG meeting
and
it interested me at the time.

-Brian

vtphysguy · February 24, 2006, 3:10am

Check out SOAP4R.
It includes xsd2ruby and wsdl2ruby tools. They don’t come in the
standard distribution I don’t think.

As for everyone wanting to know why XSD?

XSD allows you to define language-independent schemas that can be
consumed by multiple languages. If you define a Schema up front when
you do SOAP document literal services, you can get full validation
before your service is called.

vtphysguy · February 24, 2006, 5:45am

Henrik M. wrote:

Unfortunately, the idea is flawed. You are quite right. There isn’t much
of a point. Building classes directly from a data format specification
makes the application using the classes tightly coupled to the data
format.

ActiveRecord?

I’ve never used Rails and have been in limbo for 2 weeks learning Ruby
due to work swampage, but it seems from everything I read that
ActiveRecord does exactly what you are coming out against, and does it
very well.

-dave

vtphysguy · February 23, 2006, 11:59pm

On Thu, 2006-02-23 at 18:32, Logan C. wrote:

Ok, I’m going to be honest and say first I’ve never had occasion to
use this tool. But I’m wondering what exactly is the point? Ruby’s
syntax is much less verbose than XML.
Why would you want to type this:

When you can do one of these:
<snipped Ruby class definitions

I may have mangled some of the XSD, I’m admitedly not familiar with
it, but generating classes froma schema seems rife with problems

The basic idea is that if you have a schema specification, you can
generate classes for whatever language you want, and the applications
that use them will all understand the same data formats.

Unfortunately, the idea is flawed. You are quite right. There isn’t much
of a point. Building classes directly from a data format specification
makes the application using the classes tightly coupled to the data
format.

In most cases it is better to work with a generic XML parsing API,
wether it is REXML, DOM, SAX, or something else. It makes it a lot
easier to write robust applications.

I have worked with SGML/XML for a long time. I like XML, but not for
everything. One of the things that attracted me to Ruby is that it has
to a large part escaped the “XML for everything” craze that permeats
Java, and to some extent .NET.

/Henrik

–
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page
http://declan.rubyforge.org/ - The Declan Home Page

vtphysguy · February 24, 2006, 2:30pm

On Feb 23, 2006, at 4:59 PM, Henrik M. wrote:

I may have mangled some of the XSD, I’m admitedly not familiar with
it, but generating classes froma schema seems rife with problems

The basic idea is that if you have a schema specification, you can
generate classes for whatever language you want, and the applications
that use them will all understand the same data formats.

Building classes directly from a data format specification
makes the application using the classes tightly coupled to the data
format.

Yep, that’s the exact idea. You don’t only want to be tightly-coupled
to the data specification, you want it to be EXACTLY the same. The
most common place this is used is in SOAP Web Service design. It’s
often called contract first or schema first design. It’s not a bad
thing to be tightly coupled to your data at that point because the
whole idea is to be able to interoperate with an existing service.
This is all about sharing, and it’s nice to make things easy and
language-neutral when you’re going to share. (Cue up Mr. Rogers theme
music.)

XSD schema (for all of its imperfections) has the ability to specify
constraints on the data that help validate it. Up front your service
can define things like:

(pseudo-XSD)
US Address:
Street1 = required
Street2 = optional
City = required
State = [A-Z]{2}
USZip = [0-9]{5}(-[0-9]{4})?

Those kinds of constraints can be done in code, but no language I
know of can make an interface to a method that explicit.

vtphysguy · February 24, 2006, 5:30pm

On Fri, 2006-02-24 at 05:42, Dave C. wrote:

ActiveRecord does exactly what you are coming out against, and does it
very well.

No, I have nothing against ORM tools. ActiveRecord, (which I must
confess I haven’t used, just poked at while playing with Rails a while
ago,) provides an interface between a data source layer and a business
layer, and that is fine. Couldn’t manage relational databases without
ORM. (Still, there are XML databases that you can query with XPath or
XQuery, and get an XML message from when they reply. This approach does
have practical problems (like abysmal performance), but when you are
dealing with hierarchical information it is a pretty neat approach.)

An ORM tool provides a way to make objects persistent and bridges the
data source layer and the business layer in an application. This is
good.

A schema to code generator creates a tight coupling between a document
(or ‘message’) format and business logic. It is often used in client
applications to generate code that couples the application tightly to a
data format owned_by_somebody_else. This is bad.

If I build a SOAP service, and you build a client and generate classes
from the WSDL I provide, then your application will break every time I
change the data format.

On the other hand, if I offer a REST service, and you build an
application that uses LibXML to mine the messages for the information
you need, then I can add new stuff to the dataformat without breaking
your application.

Consider an XML editor. An XML editor uses an internal representation of
XML documents that is generic. For example, XMetaL uses DOM, and you can
use the editor to edit XHTML, DocBook, XSEIF, TIM, TEDD, TEI, or
whatever you like.

If the editor had been written using a code generator to map from a
schema to an internal representation that is specific to that one
format, then the editor would only have been able to handle one format.
For example, you would get an XHTML editor that could not handle DocBook
or TEI. If you wanted to use a new format, you would need a new editor.
(Plenty of editors are built that way. It is still not a good idea.)

/Henrik

–
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page
http://declan.rubyforge.org/ - The Declan Home Page

vtphysguy · February 24, 2006, 6:00pm

On Fri, 2006-02-24 at 14:27, Geoffrey Lane wrote:

Yep, that’s the exact idea. You don’t only want to be tightly-coupled
to the data specification, you want it to be EXACTLY the same. The
most common place this is used is in SOAP Web Service design. It’s
often called contract first or schema first design. It’s not a bad
thing to be tightly coupled to your data at that point because the
whole idea is to be able to interoperate with an existing service.

My point is that you are tightly coupled to somebody elses data. This
means your application is very vulnerable.

SOAP does have a lot of problems because of this. REST offers a more
flexible alternative.

Tight coupling isn’t necessarily bad - a lot of the time nothing
changes, and then everything works. On the other hand, why take the risk
if there is no great benefit? Sometimes things do change. then things
get expensive.

This is all about sharing, and it’s nice to make things easy and
language-neutral when you’re going to share. (Cue up Mr. Rogers theme
music.)

Also, when sharing, it is important to be flexible. In this case, the
flexibility doesn’t really cost you anything, because using a generic
data model is just as easy as using a customized one. Sometimes it is
even easier.

State = [A-Z]{2}
USZip = [0-9]{5}(-[0-9]{4})?

Those kinds of constraints can be done in code, but no language I
know of can make an interface to a method that explicit.

You can do it in W3C XML Schema and in RELAX NG. In principle, you can
do it with NOTATIONs in a DTD, if only two people could ever agree on
what a NOTATION declaration means.

I do like your example. Suppose the addresses are in a customer
database, and the company that owns it expands its operations to Europe.
They will then need to extend their data format so that they can handle
the addresses of European customers, so they add a country field, and a
zip code field that isn’t U.S. specific. While they’re at it, they add
fields for phone number, fax, mobile, email, and web site. Seing that
the Street2 field isn’t very useful, they remove it.

If you have built a client in the U.S. that queries the database, and
maps the result to an object model generated from a schema, your
application will now go KABLOINK! If your application uses REXML, SAX or
DOM, it can just ignore the extra information and keep working smoothly.

/Henrik

–
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page
http://declan.rubyforge.org/ - The Declan Home Page

vtphysguy · February 24, 2006, 6:21pm

I’ve actually found that you don’t usually want to use the xml
directly. Most of the time, data I get in xml was formed by a
commitee, and they didn’t have anyone who was familiar with either XSDs
or structuring information in xml well.

Roughly, I have a base class that has this method in it:

def self::xml_attr(name, path, type, options={})
class_eval do
define_method(name) do
# Get node or attribute
node = REXML::XPath.first(@root, path)
#Convert to a typed value
value = self.class.parse_node(node, type, options) if
node
end
end
once(name) unless options[:cache] == false
end

parse_node manages known types of nodes that I define elsewhere like
strings, floats, dates, arrays, and classes. Like I said, that’s
roughly what I use because that’s a little old, but this way you can
build model objects similar to ActiveRecord or Og, and map them to the
appropriate xml bits with XPath. I’ve found that it’s pretty flexible
and only comes in around 70 lines of code

Someone posted a nearly identical solution on RubyGarden, but I can’t
remember where now. I can pull it up if you’re interested.
.adam

vtphysguy · February 24, 2006, 6:18pm

On Fri, Feb 24, 2006 at 10:27:52PM +0900, Geoffrey Lane wrote:
[…]
} XSD schema (for all of its imperfections) has the ability to specify
} constraints on the data that help validate it. Up front your service
} can define things like:
}
} (pseudo-XSD)
} US Address:
} Street1 = required
} Street2 = optional
} City = required
} State = [A-Z]{2}
} USZip = [0-9]{5}(-[0-9]{4})?
}
} Those kinds of constraints can be done in code, but no language I
} know of can make an interface to a method that explicit.

Converting those constraints to Ruby is a challenge, but far from
impossible. First off, I’d store the fields in a hash and override
method_missing to implement accessors and mutators as requested (the
same
way ActiveRecord implements find_by_*).

The regex constraints are relatively simple. They just require
minor adjustments to account for the differences in regex flavors
between
XSD and Ruby. I’d store a hash of symbol to regex-and-optional for use
by a
generalized setter method.

The required/optional issue is dealt with in the constructor. The
constructor takes a (possibly nested) hash, raises an exception if a
required field is missing or an unknown field is present, and uses
mutators
to set the fields. You might also have a constructor that converts an
XML
(sub-)tree into a (nested) hash before passing it to the other
constructor.
Implementing your example:

class XSDclass

def set_field(symbol, value)
constraint = constraints[symbol]
fail “No such field” unless constraint
valregex = constraint[:valregex]
if valregex && valregex !~ value
fail “Invalid value ‘#{value}’ for field #{symbol}”
end
@fields[symbol] = value
end

def method_missing(method_id, *arguments)
if constraints[method_id] && arguments.length == 0
return @fields[method_id]
elsif method_id.to_s[-1].chr == ‘=’
setter_for = method_id.to_s[0…-1].to_sym
if constraints[setter_for] && arguments.length == 1
return set_field(setter_for, arguments[0])
end
end
super
end

private

def constraints
{}
end

def initialize(hashed_fields)
required_fields = constraints.select { |k,v| not v[:optional] }
required_fields.map! { |f| f[0] }
missing = (required_fields - hashed_fields.keys).join(', ‘)
excess = (hashed_fields.keys - constraints.keys).join(’, ')
if (missing.length + excess.length) != 0
errors = “Validation error constructing #{self.class}:\n\n”
errors << “\tMissing fields: #{missing}\n” if missing.length > 0
errors << “\tUnknown fields: #{excess}\n” if excess.length > 0
fail errors
end
@fields = {}
hashed_fields.each { |k,v| set_field(k,v) }
end

end

class US_Address < XSDclass

Constraints = { :street1 => { :optional => false },
:street2 => { :optional => true },
:city => { :optional => false },
:state => { :valregex => /[A-Z]{2}/,
:optional => false },
:uszip => { :valregex => /[0-9]{5}(-[0-9]{4})?/,
:optional => false }
}

def constraints
Constraints
end

def initialize(hashed_fields)
super
end

end

The conversion of XSD into a Ruby class definition, conversion of
regular
expressions, a better implementation of the method_missing override
(which
creates accessors/mutators for the next use), and the XML subtree
constructor are left as exercises to the reader. Note that I have
actually
tested the code above; it works.

} Geoff L. [email protected]
–Greg

vtphysguy · February 24, 2006, 7:20pm

Henrik M. wrote:

music.)
(pseudo-XSD)
You can do it in W3C XML Schema and in RELAX NG. In principle, you can

If you have built a client in the U.S. that queries the database, and
maps the result to an object model generated from a schema, your
application will now go KABLOINK! If your application uses REXML, SAX or
DOM, it can just ignore the extra information and keep working smoothly.

/Henrik

In general, I’m not worried about this because things like Schema have
the ability to support optional entities so you can grow your schema
over time. Schema is flexible so that growing to another Address format
would not be difficult and people could still use the US format. You can
provide xsd:choice elements to provide multiple valid options for a
single entity. (OK, I’ll quit now) People also generally do things like
versioning of Schemas to handle breaking changes.

Also, don’t get me wrong. I’m not saying it’s a good idea to query a
database directly into an interchange format like SOAP. You are totally
correct that it is often a good idea to decouple a service layer from a
data layer, from a business layer. But using generated code for a
Service layer can be very helpful.

There’s nothing wrong with REST, but you’re implying that you can change
a REST service at any time and all the clients will still work? That’s
just not true. If you make enough changes, you’re going to break calling
clients. The service has some basic expectations. That’s true of REST
and of SOAP.

Don’t mean to beat a dead horse with this one. SOAP is a tool, just like
many others. If I’m going to go through the trouble of doing SOAP, I do
Document Literal services because I can get full Schema validation from
it.

vtphysguy · February 24, 2006, 11:55pm

On Fri, 2006-02-24 at 19:18, Geoff L. wrote:

In general, I’m not worried about this because things like Schema have
the ability to support optional entities so you can grow your schema
over time. Schema is flexible so that growing to another Address format
would not be difficult and people could still use the US format. You can
provide xsd:choice elements to provide multiple valid options for a
single entity. (OK, I’ll quit now) People also generally do things like
versioning of Schemas to handle breaking changes.

I agree that changing the schema is easy. That is part of the problem.
You can do a schema change in a few minutes, and spend a year fixing
broken clients.

I spent some time last year working with a company that was just in the
process of painting itself into that kind of a corner. That was with one
service and two clients.

> > There's nothing wrong with REST, but you're implying that you can change > a REST service at any time and all the clients will still work? That's > just not true. If you make enough changes, you're going to break calling > clients. The service has some basic expectations. That's true of REST > and of SOAP.

Agreed. Any change to the service that is not backwards compatible can
break the clients. REST is no safeguard against poor schema design, but
it does keep the door open to building flexible data formats.

Don’t mean to beat a dead horse with this one. SOAP is a tool, just like
many others. If I’m going to go through the trouble of doing SOAP, I do
Document Literal services because I can get full Schema validation from it.

But then the question becomes: what does schema validation buy you?
Sometimes it gets you a lot of benefits, I would not want to use an XML
editor without validation. When just transporting transient messages
between a service and its client, I have yet to see it pay off.

One problem with Document Literal is namespace polution. When you work
with content management systems, this can get quite painful. REST does
not have that problem, because there is no special wrapper around the
message.

/Henrik

–
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page
http://declan.rubyforge.org/ - The Declan Home Page