Trouble with rexml


#1

I read XML file using REXML.
The XML file contains Korean charactors.

If I print the file using REXML::Document.write $stdout, everything is
okay.( means Korean is printed out narmaly )
But, if I try again using REXML::Element.text, Korean charactors are
broken.( some strange charactors )

ARe there any solution for this?


#2

On 2/22/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

I read XML file using REXML.
The XML file contains Korean charactors.

If I print the file using REXML::Document.write $stdout, everything is
okay.( means Korean is printed out narmaly )
But, if I try again using REXML::Element.text, Korean charactors are
broken.( some strange charactors )

ARe there any solution for this?

What encoding scheme did you use?
Show me the content of Element.text.


#3

Gyoung-Yoon N. wrote:

Dear Yoon,

Noh might have already helped you out and my suggestion here may not be
relevant, but I wonder what method, e.g. puts, print or p, you used to
print the return of element.text. The element.text method only returns
text data but doesn’t print. (You can see something on the console if
you are using something interactive, but it may not appear as the string
you expected.)

Have you tried like these lines?
$stdout.puts element.text
and
$stdout.puts element.to_s
The $stdout.puts method shows the character string obtained from the
object (the return value of element.text or element.to_s in the cases
above).
Some others like
p element.text
may not show the string you expected, but it does show a representation
of the object which is not necessarily a character string.

cheers,
nori


#4

On 2/23/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

I think it may be a problem of REXML.
If I print [element].to_s, the string is already broken.

Are there another XML library for RUBY?

What did they look like? Show the result of “p element”, for example:

irb(main):011:0> puts Iconv.iconv(‘utf-8’, ‘euc-kr’,
“\267\347\272\361”).first
���
=> nil
irb(main):012:0> p Iconv.iconv(‘utf-8’, ‘euc-kr’,
“\267\347\272\361”).first
“\353\243\250\353\271\204”
=> nil

Anyone who willing to reply your question, need to see the
internal representation of the broken strings like latter form.
Actually, that’s not broken. Don’t lose hope. :slight_smile:


#5

I think it may be a problem of REXML.
If I print [element].to_s, the string is already broken.

Are there another XML library for RUBY?