How can I search value from xml


#1

How can i search value from xml file such as I want to find from
*pubdate and
return
**biblioentry
Please give me some source code for further study
**

<?xml version="1.0" encoding="ISO-8859-15"?> Godfrey Vesey Personal Identity: A Philosophical Analysis Cornell University Press 1977 Geoffrey Madell The Identity of the Self Edinburgh University Press 1981 Sydney Shoemaker Richard Swinburne Personal Identity Basil Blackwell 1984 Jonathan Glover The Philosophy and Psychology of Personal Identity Penguin 1988 Harold W. Noonan Personal Identity Routledge 1989 Ren Marres Persoonlijke identiteit na het verval van de ziel Coutinho 1991 James Baillie Problems in Personal Identity Paragon House 1993 Brian Garrett Personal Identity and Self-Consciousness Routledge 1998 John Perry Identity, Personal Identity, and the Self Hackett 2002

*Thank You

Artit S.

http://www.rubybox.net (Thai Language)


#2

Artit S. wrote:

How can i search value from xml file such as I want to find from
*pubdate and return **biblioentry

http://www.germane-software.com/software/rexml/

robert

#3

You’ll propably want to use REXML and XPath:

require ‘rexml/document’
require ‘rexml/xpath’

include REXML

bibliography = Document.new( ARGV[0] )

XPath.each( bibliography, “/biblioentry[pubdate > 1993]”) do
|biblioentry|

do something with biblioentry here

end

Not entirely sure if that works, as the PC I’m on doesnt have Ruby
installed :frowning:

Scott


#4

Artit S. wrote:

    <firstname>Godfrey</firstname>
    <firstname>Geoffrey</firstname>
    <firstname>Sydney</firstname>
  <pubdate>1984</pubdate>
  <pubdate>1988</pubdate>
</biblioentry>

class String
def xtag(s)
scan( %r! ( < #{s} [^>]* > ) ( .*? ) </ #{s} > !mx )
end
end

gets(nil).xtag(“biblioentry”).each { |tag,data|
if data.xtag(“pubdate”)[0][1] > “1984”
print tag, data, “\n”
end
}


#5

Christian N. wrote:

print tag, data, "\n"

end
}

I hope you are joking…

I hope you’re joking.


#6

“William J.” removed_email_address@domain.invalid writes:

}
I hope you are joking…


#7

Artit S. wrote:

    <firstname>Godfrey</firstname>
    <surname>Vesey</surname>
  </author>
  <title>Personal Identity: A Philosophical Analysis</title>
  <publisher>
    <publishername>Cornell University Press</publishername>
  </publisher>
  <pubdate>1977</pubdate>

class String
def xtag(s)
scan( %r! < #{s} (?: \s+ ( [^>]* ) )? >
( .? ) </ #{s} > !mx ).
map{ |attr, data| h = { }
if attr
attr.scan( %r! ( \S+ ) = " ( [^"]
) " !x ){ |k,v|
h[k] = v }
end
[ h, data ]
}
end
end

gets(nil).xtag(“biblioentry”).each { |attr,data|
if data.xtag(“pubdate”)[0][1] > “1984”
print attr[“id”], data, “\n”
end
}


#8

William J. wrote:

class String
def xtag(s)

end

gets(nil).xtag(“biblioentry”).each { |attr,data|

}

Please stop. To the OP, use rexml.


#9

Artit S. wrote:

    <firstname>Godfrey</firstname>
    <surname>Vesey</surname>
  </author>
  <title>Personal Identity: A Philosophical Analysis</title>
  <publisher>
    <publishername>Cornell University Press</publishername>
  </publisher>
  <pubdate>1977</pubdate>

class String
def xtag(s)
scan( %r!
< #{s} (?: \s+ ( [^>]* ) )? / >
|
< #{s} (?: \s+ ( [^>]* ) )? >
( .? ) </ #{s} >
!mx ).
map{ |unpaired, attr, data| h = { }
attr = ( unpaired || attr )
if attr
attr.scan( %r! ( \S+ ) = " ( [^"]
) " !x ){ |k,v|
h[k] = v }
end
[ h, data ]
}
end
def xshow( depth=0 )
text = “”
split( /<([^>]*)>/ ).each_with_index{ |s,i|
if 0 == i % 2
text = s.strip
else
indent = " " * ( depth * 2 )
case
when s[0,1] == “/”
depth -= 1
puts text.map{|x| indent + x.strip } if text != “”
when s[-1,1] == “/”
puts indent + s
else
puts indent + s
depth += 1
end
end
}
end
end

gets(nil).xtag(“biblioentry”).each { |attr,data|
if data.xtag(“pubdate”)[0][1] > “1997”
puts attr[“id”]
data.xshow( 1 )
end
}

Output:

FHIW13C-1298-4
author
firstname
Brian
surname
Garrett
title
Personal Identity and Self-Consciousness
publisher
publishername
Routledge
pubdate
1998
FHIW13CX-1202-1
author
firstname
John
surname
Perry
title
Identity, Personal Identity, and the Self
publisher
publishername
Hackett
pubdate
2002


#10

Artit S. wrote:

    <firstname>Godfrey</firstname>
    <surname>Vesey</surname>
  </author>
  <title>Personal Identity: A Philosophical Analysis</title>
  <publisher>
    <publishername>Cornell University Press</publishername>
  </publisher>
  <pubdate>1977</pubdate>

class String
def xtag(s)
scan( %r! < #{s} (?: \s+ ( [^>]* ) )? >
( .? ) </ #{s} > !mx ).
map{ |attr, data| h = { }
if attr
attr.scan( %r! ( \S+ ) = " ( [^"]
) " !x ){ |k,v|
h[k] = v }
end
[ h, data ]
}
end
end

gets(nil).xtag(“biblioentry”).each { |attr,data|
if data.xtag(“pubdate”)[0][1] > “1984”
print attr[“id”], data, “\n”
end
}


#11

I want to use rexml or any library ,please
thankz

On 2/20/06, Artit S. removed_email_address@domain.invalid wrote:

<biblioentry id="FHIW13C-1234">
Basil Blackwell Penguin http://www.rubybox.net (Thai Language)


Artit S.

http://www.rubybox.net (Thai Language)


#12

On 2/21/06, Ross B. removed_email_address@domain.invalid wrote:

(Also, REXML does support XPath, so you should be able to modify the
above to work with that. Just to be sure, I tried it 100 times over:

XPath

                      user     system      total        real

rexml 9.840000 0.080000 9.920000 ( 10.046963)
libxml2 0.090000 0.000000 0.090000 ( 0.139592)

Every time I’ve tried to use REXML for something I’ve found it to be
incredibly slow and painful on large files. Usually I start with
REXML, get annoyed, and then install QuiXML
(http://quixml.rubyforge.org/). Though it doesn’t have bells and
whistles, it’s a heck of a lot faster. Anyhow, I’m certainly looking
forward to your libxml2 bindings!

-Pawel


#13

On Tue, 2006-02-21 at 16:17 +0900, Artit S. wrote:

I want to use rexml or any library ,please
thankz

As others say, for now REXML is probably the way to go, but very soon
now you’ll be able to use Libxml2 also if things keep going to plan over
here.

require 'xml/libxml'

d = XML::Parser.file('test.xml').parse
p d.find('//biblioentry[pubdate = 1977]').to_a

If you want to try it before we get to release go to CVS:
http://rubyforge.org/scm/?group_id=494

(Also, REXML does support XPath, so you should be able to modify the
above to work with that. Just to be sure, I tried it 100 times over:

XPath

                      user     system      total        real

rexml 9.840000 0.080000 9.920000 ( 10.046963)
libxml2 0.090000 0.000000 0.090000 ( 0.139592)

:wink:


#14

Christian N. wrote:

print tag, data, "\n"

end
}

I hope you are joking…

Actually, in real-world usage, Mark Pilgrim’s Python Feed Parser[0]
falls back to regular expressions to get the data required if the XML is
not well-formed.

Admittedly this is a real problem for RSS hackers, less so with other
XML messages, but the approach does have merit if (a) you can’t
guarantee well-formedness and (b) you absolutely have to have the data.

-dave

[0] http://feedparser.org/


#15

“William J.” removed_email_address@domain.invalid writes:

  <author>

class String
}
Still doesn’t support namespaces, entities and CDATA… :wink:
(Or nested tags like

.)