XML - converting from one feed to another (beginner)

casper_the_ghost · December 29, 2006, 4:36pm

I’m trying to read an XML feed of products and convert them to a
different XML feed to upload to Froogle (Google Base).

How do I read the lines of XML and then rewrite them in the new XML?
I’ve started with rexml, but I’m not sure if I’m on the right track.

def convert_to_base_xml(xml)

Takes a product XML feed and converts it into one that

is formatted for uploading to Google Base

doc = REXML::Document.new

feed = REXML::Document.new(xml)

Create the export XML document

doc = REXML:Document.new
rss = doc.add_element ‘rss’

channel = doc.add_element ‘channel’

Loop through old feed and create new feed based

on that data

feed.each_child do |child|
#add items to new rss feed, see below

# I'm not sure what I should do here

end

Save rss feed to a file

end

original feed XML

<artist_url></artist_url>

<title_url></title_url>

<cat_url></cat_url>

Convert to an RSS 2.0 feed in this format:

<g:artist></g:artist>

<g:brand></g:brand>

<g:label></g:label>

<g:image_link></g:image_link>

<g:product_type><g:product_type>

<g:price></g:price>

casper_the_ghost · December 29, 2006, 6:43pm

rb wrote:

doc = REXML::Document.new

feed = REXML::Document.new(xml)

Create the export XML document

doc = REXML:Document.new

You may be better off building the new XML using either direct string
concatenating, or a lib such as Jim W.'s Builder (which will help
ensure the result is proper XML). Or one of the Ruby RSS libraries.
(I’d go with populating templates and just make sure the new content is
correctly escaped.)

The general ides is to loop over the item elements in the source DOM
(using either REXML or Hpricot), extract the relevant data, and populate
a new item element in the target XML. If you have a template for the
target item element you can stuff in the new content on each pass of the
loop and append it to the resulting XML.

For example, with Hpricot (which you can install as a gem):

require ‘hpricot’

…

src_dom = Hpricot(source_rss_xml).

src_dom/‘//item’.each do |el|
title = (item/‘title’).text
title_url = (item/‘title_url’).text
# …
# Now build the new item element for the target XML
# and add it to the accumulating content
end

(For REXML it’s basically the same, but the XPath invocation is
different.)

Some considerations may be time and memory needs; if you are dealing
with large documents, a pull or stream parser would be better, but it
can be a bit harder to work with if you are new to it. But see my
article in Dr. Dobbs: http://www.ddj.com/184406385

Try the simplest approach first and see if it works, and if works well
enough.

–
James B.

http://www.rubyaz.org - Hacking in the Desert
http://www.jamesbritt.com - Playing with Better Toys

casper_the_ghost · December 29, 2006, 8:01pm

On Sat, 30 Dec 2006 02:40:18 +0900, James B.
[email protected] wrote:

(For REXML it’s basically the same, but the XPath invocation is different.)

Some considerations may be time and memory needs; if you are dealing
with large documents, a pull or stream parser would be better, but it
can be a bit harder to work with if you are new to it. But see my
article in Dr. Dobbs: http://www.ddj.com/184406385

Try the simplest approach first and see if it works, and if works well
enough.

Thanks for those tips. I’m going to study what you wrote and see if I
can make it work.

The XML file is about 2.5 Mb, and the computers running the script
have 1gb of RAM.

casper_the_ghost · December 29, 2006, 8:15pm

Hi rb,

rb wrote:

I’m trying to read an XML feed of products and convert them to a
different XML feed to upload to Froogle (Google Base).

How do I read the lines of XML and then rewrite them in the new XML?
I’ve started with rexml, but I’m not sure if I’m on the right track.

Depending on whether or not you want to make further use of the content
of your XML feed within your Ruby / Rails app (i.e., you actually
want the info stored in a database) you might be better off just
doing the transform using XSLT.

Best regards,
Bill

casper_the_ghost · December 29, 2006, 8:51pm

On 29 Dec 2006 11:12:41 -0800, “bill walton” [email protected]
wrote:

of your XML feed within your Ruby / Rails app (i.e., you actually
want the info stored in a database) you might be better off just
doing the transform using XSLT.

Thanks… I didn’t think of that. I will have to learn XSLT. For
now, I have to finish this XML for Google Base by tomorrow so I’m
going to try to finish it with Ruby. Might try XSLT next.

casper_the_ghost · January 1, 2007, 7:05pm

On Sat, 30 Dec 2006 02:40:18 +0900, James B.
[email protected] wrote:

rb wrote:

I’m trying to read an XML feed of products and convert them to a
different XML feed to upload to Froogle (Google Base).

[…]

Now build the new item element for the target XML

and add it to the accumulating content

end

Thanks… it’s working so far, but I had to use this syntax:

title = (el/‘title’).text

casper_the_ghost · December 29, 2006, 10:37pm

rb wrote:

Thanks… I didn’t think of that. I will have to learn XSLT. For
now, I have to finish this XML for Google Base by tomorrow so I’m
going to try to finish it with Ruby. Might try XSLT next.

XSLT would be painful overkill for this sort of transformation.

–
James B.

http://www.ruby-doc.org - Ruby Help & Documentation
http://beginningruby.com - Beginning Ruby: The Online Book
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

casper_the_ghost · January 1, 2007, 7:43pm

title = (el/‘title’).text

Ah, good catch. Bug in my example code.

–
James B.

“I have the uncomfortable feeling that others are making a religion
out of it, as if the conceptual problems of programming could be
solved by a single trick, by a simple form of coding discipline!”

Edsger Dijkstra

casper_the_ghost · January 2, 2007, 12:51pm

On 12/29/06, James B. [email protected] wrote:

XSLT would be painful overkill for this sort of transformation.

I’m going to have to disagree here - I’ve done a reasonable amount of
XSLT work, and this is exactly the sort of thing it’s designed to do.
In terms of effort and length of the resulting code, what you lose in
verbose syntax (although it’s nowhere near as bad as some make out),
you gain in not having to handle input/output/parsing/busy-work that
would need to be done “manually” in Ruby.