Forum: Ruby How can I search value from xml

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
82b1570fa2e3d9e0ac0890c57f705219?d=identicon&s=25 Artit Satanakulpanich (Guest)
on 2006-02-20 13:10
(Received via mailing list)
How can i search value from xml file such as I want to find from
*pubdate *and
return* **biblioentry
*Please give me some source code for further study*
**
<?xml version="1.0" encoding="ISO-8859-15"?>
<!DOCTYPE bibliography PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.2/docbookx...
<bibliography id="personal_identity">
    <biblioentry id="FHIW13C-1234">
      <author>
        <firstname>Godfrey</firstname>
        <surname>Vesey</surname>
      </author>
      <title>Personal Identity: A Philosophical Analysis</title>
      <publisher>
        <publishername>Cornell University Press</publishername>
      </publisher>
      <pubdate>1977</pubdate>
   </biblioentry>
   <biblioentry id="FHIW13C-125">
      <author>
        <firstname>Geoffrey</firstname>
        <surname>Madell</surname>
      </author>
      <title>The Identity of the Self</title>
      <publisher>
        <publishername>Edinburgh University Press</publishername>
      </publisher>
      <pubdate>1981</pubdate>
   </biblioentry>
   <biblioentry id="FHIW13C-1260">
      <author>
        <firstname>Sydney</firstname>
        <surname>Shoemaker</surname>
      </author>
      <author>
         <firstname>Richard</firstname>
         <surname>Swinburne</surname>
      </author>
      <title>Personal Identity</title>
      <publisher>
        <publishername>Basil Blackwell</publishername>
      </publisher>
      <pubdate>1984</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13C-1288-3">
      <author>
        <firstname>Jonathan</firstname>
        <surname>Glover</surname>
      </author>
      <title>The Philosophy and Psychology of Personal Identity</title>
      <publisher>
        <publishername>Penguin</publishername>
      </publisher>
      <pubdate>1988</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13C-1289-1">
      <author>
        <firstname>Harold</firstname>
        <othername>W.</othername>
        <surname>Noonan</surname>
      </author>
      <title>Personal Identity</title>
      <publisher>
        <publishername>Routledge</publishername>
      </publisher>
      <pubdate>1989</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13C-1291-2">
      <author>
        <firstname>Ren</firstname>
        <surname>Marres</surname>
      </author>
      <title>Persoonlijke identiteit na het verval van de ziel</title>
      <publisher>
        <publishername>Coutinho</publishername>
      </publisher>
      <pubdate>1991</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13C-1293-1">
      <author>
        <firstname>James</firstname>
        <surname>Baillie</surname>
      </author>
      <title>Problems in Personal Identity</title>
      <publisher>
        <publishername>Paragon House</publishername>
      </publisher>
      <pubdate>1993</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13C-1298-4">
      <author>
        <firstname>Brian</firstname>
        <surname>Garrett</surname>
      </author>
      <title>Personal Identity and Self-Consciousness</title>
      <publisher>
        <publishername>Routledge</publishername>
      </publisher>
      <pubdate>1998</pubdate>
    </biblioentry>
    <biblioentry id="FHIW13CX-1202-1">
      <author>
        <firstname>John</firstname>
        <surname>Perry</surname>
      </author>
      <title>Identity, Personal Identity, and the Self</title>
      <publisher>
        <publishername>Hackett</publishername>
      </publisher>
      <pubdate>2002</pubdate>
    </biblioentry>
</bibliography>

*Thank You
--
Artit Satanakulpanich

http://www.rubybox.net (Thai Language)
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-02-20 13:19
(Received via mailing list)
Artit Satanakulpanich wrote:
> How can i search value from xml file such as I want to find from
> *pubdate *and return* **biblioentry

http://www.germane-software.com/software/rexml/

    robert
Ee5c1f36549c4ddca2189f9c4cf36f2c?d=identicon&s=25 Scott (Guest)
on 2006-02-20 15:14
(Received via mailing list)
You'll propably want to use REXML and XPath:

require 'rexml/document'
require 'rexml/xpath'

include REXML

bibliography = Document.new( ARGV[0] )

XPath.each( bibliography, "/biblioentry[pubdate > 1993]") do
|biblioentry|
  # do something with biblioentry here
end

Not entirely sure if that works, as the PC I'm on doesnt have Ruby
installed :(

Scott
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-20 17:19
(Received via mailing list)
Artit Satanakulpanich wrote:
>         <firstname>Godfrey</firstname>
>         <firstname>Geoffrey</firstname>
>         <firstname>Sydney</firstname>
>       <pubdate>1984</pubdate>
>       <pubdate>1988</pubdate>
>     </biblioentry>

class String
  def xtag(s)
    scan( %r! ( < #{s} [^>]* > ) ( .*? )  </ #{s} > !mx )
  end
end

gets(nil).xtag("biblioentry").each { |tag,data|
  if data.xtag("pubdate")[0][1] > "1984"
    print tag, data, "\n"
  end
}
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-02-20 17:53
(Received via mailing list)
"William James" <w_a_x_man@yahoo.com> writes:

> }
I hope you are joking...
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-20 18:14
(Received via mailing list)
Christian Neukirchen wrote:
> >     print tag, data, "\n"
> >   end
> > }
>
> I hope you are joking...

I hope you're joking.
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-21 00:06
(Received via mailing list)
Artit Satanakulpanich wrote:
>         <firstname>Godfrey</firstname>
>         <surname>Vesey</surname>
>       </author>
>       <title>Personal Identity: A Philosophical Analysis</title>
>       <publisher>
>         <publishername>Cornell University Press</publishername>
>       </publisher>
>       <pubdate>1977</pubdate>
>    </biblioentry>

class String
  def xtag(s)
    scan( %r!  < #{s}  (?: \s+ (  [^>]*  )  )? >
               ( .*? )  </ #{s} >  !mx ).
      map{ |attr, data|   h = { }
        if attr
          attr.scan( %r!  ( \S+ ) = " ( [^"]* ) "  !x ){ |k,v|
            h[k] = v }
        end
        [ h, data ]
      }
  end
end

gets(nil).xtag("biblioentry").each { |attr,data|
  if data.xtag("pubdate")[0][1] > "1984"
    print attr["id"], data, "\n"
  end
}
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-21 00:06
(Received via mailing list)
Artit Satanakulpanich wrote:
>         <firstname>Godfrey</firstname>
>         <surname>Vesey</surname>
>       </author>
>       <title>Personal Identity: A Philosophical Analysis</title>
>       <publisher>
>         <publishername>Cornell University Press</publishername>
>       </publisher>
>       <pubdate>1977</pubdate>
>    </biblioentry>

class String
  def xtag(s)
    scan( %r!  < #{s}  (?: \s+ (  [^>]*  )  )? >
               ( .*? )  </ #{s} >  !mx ).
      map{ |attr, data|   h = { }
        if attr
          attr.scan( %r!  ( \S+ ) = " ( [^"]* ) "  !x ){ |k,v|
            h[k] = v }
        end
        [ h, data ]
      }
  end
end

gets(nil).xtag("biblioentry").each { |attr,data|
  if data.xtag("pubdate")[0][1] > "1984"
    print attr["id"], data, "\n"
  end
}
4c8a9bec5a27b66b28d3c5cddeb70e93?d=identicon&s=25 Guest (Guest)
on 2006-02-21 00:41
William James wrote:
> class String
>   def xtag(s)
>     <snip>
>   end
>
> gets(nil).xtag("biblioentry").each { |attr,data|
>   <snip>
> }

Please stop. To the OP, use rexml.
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-21 08:10
(Received via mailing list)
Artit Satanakulpanich wrote:
>         <firstname>Godfrey</firstname>
>         <surname>Vesey</surname>
>       </author>
>       <title>Personal Identity: A Philosophical Analysis</title>
>       <publisher>
>         <publishername>Cornell University Press</publishername>
>       </publisher>
>       <pubdate>1977</pubdate>
>    </biblioentry>

class String
  def xtag(s)
    scan( %r!
              < #{s}  (?: \s+ (  [^>]*  )  )? / >
              |
              < #{s}  (?: \s+ (  [^>]*  )  )? >
              ( .*? )  </ #{s} >
          !mx ).
      map{ |unpaired, attr, data|   h = { }
        attr = ( unpaired || attr )
        if attr
          attr.scan( %r!  ( \S+ ) = " ( [^"]* ) "  !x ){ |k,v|
            h[k] = v }
        end
        [ h, data ]
      }
  end
  def xshow( depth=0 )
    text = ""
    split( /<([^>]*)>/ ).each_with_index{ |s,i|
      if 0 == i % 2
        text = s.strip
      else
        indent = " " * ( depth * 2 )
        case
          when s[0,1] == "/"
            depth -= 1
            puts text.map{|x| indent + x.strip }  if text != ""
          when s[-1,1] == "/"
            puts indent + s
          else
            puts indent + s
            depth += 1
        end
      end
    }
  end
end

gets(nil).xtag("biblioentry").each { |attr,data|
  if data.xtag("pubdate")[0][1] > "1997"
    puts attr["id"]
    data.xshow( 1 )
  end
}

Output:

FHIW13C-1298-4
  author
    firstname
      Brian
    surname
      Garrett
  title
    Personal Identity and Self-Consciousness
  publisher
    publishername
      Routledge
  pubdate
    1998
FHIW13CX-1202-1
  author
    firstname
      John
    surname
      Perry
  title
    Identity, Personal Identity, and the Self
  publisher
    publishername
      Hackett
  pubdate
    2002
82b1570fa2e3d9e0ac0890c57f705219?d=identicon&s=25 Artit Satanakulpanich (Guest)
on 2006-02-21 08:19
(Received via mailing list)
I want to use rexml or any library ,please
thankz

On 2/20/06, Artit Satanakulpanich <rubybox@gmail.com> wrote:
>     <biblioentry id="FHIW13C-1234">
>    <biblioentry id="FHIW13C-125">
>    <biblioentry id="FHIW13C-1260">
>         <publishername>Basil Blackwell</publishername>
>         <publishername>Penguin</publishername>
>       <publisher>
>       <publisher>
>       <publisher>
>       <publisher>
>       <publisher>
> http://www.rubybox.net (Thai Language)
>
>


--
Artit Satanakulpanich

http://www.rubybox.net (Thai Language)
A9b6a93b860020caf9d2d1d58c32478f?d=identicon&s=25 Ross Bamford (Guest)
on 2006-02-21 11:39
(Received via mailing list)
On Tue, 2006-02-21 at 16:17 +0900, Artit Satanakulpanich wrote:
> I want to use rexml or any library ,please
> thankz

As others say, for now REXML is probably the way to go, but *very* soon
now you'll be able to use Libxml2 also if things keep going to plan over
here.

	require 'xml/libxml'

	d = XML::Parser.file('test.xml').parse
	p d.find('//biblioentry[pubdate = 1977]').to_a

If you want to try it before we get to release go to CVS:
http://rubyforge.org/scm/?group_id=494

(Also, REXML does support XPath, so you should be able to modify the
above to work with that. Just to be sure, I tried it 100 times over:

### XPath ###
                          user     system      total        real
rexml                 9.840000   0.080000   9.920000 ( 10.046963)
libxml2               0.090000   0.000000   0.090000 (  0.139592)

;)
501dac4c25141b9ecffecf6819fe086b?d=identicon&s=25 Pawel Szymczykowski (makenai)
on 2006-02-21 12:06
(Received via mailing list)
On 2/21/06, Ross Bamford <rossrt@roscopeco.co.uk> wrote:
> (Also, REXML does support XPath, so you should be able to modify the
> above to work with that. Just to be sure, I tried it 100 times over:
>
> ### XPath ###
>                           user     system      total        real
> rexml                 9.840000   0.080000   9.920000 ( 10.046963)
> libxml2               0.090000   0.000000   0.090000 (  0.139592)

Every time I've tried to use REXML for something I've found it to be
incredibly slow and painful on large files. Usually I start with
REXML, get annoyed, and then install QuiXML
(http://quixml.rubyforge.org/). Though it doesn't have bells and
whistles, it's a heck of a lot faster. Anyhow, I'm certainly looking
forward to your libxml2 bindings!

-Pawel
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-02-21 16:42
(Received via mailing list)
"William James" <w_a_x_man@yahoo.com> writes:

>>       <author>
> class String
> }
Still doesn't support namespaces, entities and CDATA... ;-)
(Or nested tags like <div><div></div></div>.)
D63c268960051bc17a310aa29fffd979?d=identicon&s=25 Dave Cantrell (Guest)
on 2006-02-23 04:30
(Received via mailing list)
Christian Neukirchen wrote:
>>     print tag, data, "\n"
>>   end
>> }
>
> I hope you are joking...
>

Actually, in real-world usage, Mark Pilgrim's Python Feed Parser[0]
falls back to regular expressions to get the data required if the XML is
not well-formed.

Admittedly this is a real problem for RSS hackers, less so with other
XML messages, but the approach does have merit if (a) you can't
guarantee well-formedness and (b) you absolutely have to have the data.

-dave

[0] http://feedparser.org/
This topic is locked and can not be replied to.