REXml help - Insert newlines into large xml file

Hello, I have a large xml file that does not have any newlines in it.
Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I’d at least be able to open the xml file in an editor
so I can read what kind of format it has. I don’t know anything about
REXML so it needs to be a somewhat complete script. thank you.

On Dec 10, 6:49 pm, Sean N. [email protected] wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I’d at least be able to open the xml file in an editor
so I can read what kind of format it has. I don’t know anything about
REXML so it needs to be a somewhat complete script. thank you.

irb(main):001:0> require ‘rexml/document’

irb(main):002:0> doc = REXML::Document.new( “</
child>” )

irb(main):003:0> doc.write $stdout
=> [<?xml ... ?>, … </

]

irb(main):004:0> doc.write $stdout, 0




You can use IO.read(“somefile.xml”) to read the contents into a single
string.
You can pass a file to the REXML::Document#write method instead of
$stdout, e.g.

File.open( “with_newlines.xml”, “w” ){ |file|
doc.write( file, 0 )
}

On Dec 10, 6:49 pm, Sean N. [email protected] wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I’d at least be able to open the xml file in an editor
so I can read what kind of format it has. I don’t know anything about
REXML so it needs to be a somewhat complete script. thank you.

For more information on REXML, see the official tutorial. It covers
this question directly and plainly, as well as a whole host of others.

http://www.germane-software.com/software/rexml/docs/tutorial.html

You could use HTML tidy for this I think.
Mikel

On Dec 11, 2007 10:54 AM, Sean N. [email protected] wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I’d at least be able to open the xml file in an editor
so I can read what kind of format it has. I don’t know anything about
REXML so it needs to be a somewhat complete script. thank you.

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

^ manveru

Michael F. wrote:

Step 3) tidy --help

REXML is very bad for handling such things.

Not at all. If you’re using an an older version of the standard library,
you prettify the XML using doc.write(output, 0), as in Phrogz’ example.

For newer versions of REXML, use the REXML::Formatter class instead
which gives you much more control over the prettifier.

Best regards,

Jari W.

On Dec 10, 8:35 pm, Michael F. [email protected] wrote:

Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

For the record, would you care to clarify and justify that statement?

On Dec 11, 2007 1:50 PM, Phrogz [email protected] wrote:

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

For the record, would you care to clarify and justify that statement?

Sure, i’ve tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn’t
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
Unfortunately tidy has a memory-leak, so i cannot recommend the
bindings if your process is running over a longer period. Of course
you could start it in another process, but then the CLI tool is good
enough already.

REXML::VERSION

“3.1.6”

http://pastie.caboo.se/126905

^ manveru

On Dec 11, 2007 10:50 PM, Jari W.
[email protected] wrote:

“3.1.6”

Don’tunderstand what you mean by “new REXML”, since you seem to be using
an old one. I’m on 3.1.7.1, and here you do, for example:

By new i mean 3.1.7 - which has formatters. But the one that ships
with ruby is still 3.1.6 - if i require a dependency then i can just
use tidy instead, no?

Michael F. wrote:

Sure, i’ve tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn’t
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
[]

REXML::VERSION

“3.1.6”

Don’tunderstand what you mean by “new REXML”, since you seem to be using
an old one. I’m on 3.1.7.1, and here you do, for example:

formatter = REXML::Formatters::Pretty.new( 3 )
formatter.compact = true
formatter.write( doc, $stdout)

Best regards,

Jari W.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs