Parsing query parameters from hyperlink

casper_the_ghost · September 1, 2007, 7:35pm

I am trying to parse strings like this

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
arrParts = link.html.split(’?’)
cpnum = arrParts[1].split(’&amp’)
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

thanks,

Luis

casper_the_ghost · September 1, 2007, 9:03pm

On 01.09.2007 19:34, [email protected] wrote:

 cpnumparts= cpnum[0].split("=")
 drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

The std lib:

require ‘uri’

irb(main):006:0> u=URI.parse(“http://foo/bar?dodo=1&dada=2”)
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> “dodo=1&dada=2”
irb(main):008:0> u.query.split(‘&’)
=> [“dodo=1”, “dada=2”]
…

robert

casper_the_ghost · September 1, 2007, 9:15pm

On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert K. wrote:

cpnum = arrParts[1].split('&amp')
The std lib:

require ‘uri’

irb(main):006:0> u=URI.parse(“http://foo/bar?dodo=1&dada=2”)
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> “dodo=1&dada=2”
irb(main):008:0> u.query.split(‘&’)
=> [“dodo=1”, “dada=2”]
…

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

irb(main):001:0> require ‘uri’
=> true
irb(main):002:0> require ‘cgi’
=> true
irb(main):003:0> CGI.parse(URI.parse(‘http://foo/?a=b&b=c’).query)
=> {“a”=>[“b”], “b”=>[“c”]}
irb(main):004:0> CGI.parse(URI.parse(‘http://foo/?a=b;b=c’).query)
=> {“a”=>[“b”], “b”=>[“c”]}
irb(main):005:0> CGI.parse(URI.parse(‘http://foo/?b=a;b=c’).query)
=> {“b”=>[“a”, “c”]}
irb(main):006:0>

casper_the_ghost · September 1, 2007, 9:48pm

On Sun, Sep 02, 2007 at 04:30:05AM +0900, [email protected] wrote:

def get_drugId(link)
Any ideas?
=> [“dodo=1”, “dada=2”]
irb(main):003:0> CGI.parse(URI.parse(‘http://foo/?a=b&b=c’).query)

Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.

casper_the_ghost · September 1, 2007, 9:30pm

On Sep 1, 2:15 pm, Aaron P. [email protected] wrote:

arrParts = link.html.split('?')
…
=> {“a”=>[“b”], “b”=>[“c”]}
irb(main):004:0> CGI.parse(URI.parse(‘http://foo/?a=b;b=c’).query)
=> {“a”=>[“b”], “b”=>[“c”]}
irb(main):005:0> CGI.parse(URI.parse(‘http://foo/?b=a;b=c’).query)
=> {“b”=>[“a”, “c”]}
irb(main):006:0>

–
Aaron P.http://tenderlovemaking.com/- Hide quoted text -

Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.

casper_the_ghost · September 1, 2007, 9:55pm

On Sep 1, 2:29 pm, “[email protected]” [email protected] wrote:

irb(main):006:0> u=URI.parse(“http://foo/bar?dodo=1&dada=2”)

irb(main):006:0>

–
Aaron P.http://tenderlovemaking.com/-Hide quoted text -

Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.- Hide quoted text -

Show quoted text -

Sorry for the second reply. I took your suggestions and came up with
the following

require ‘uri’
require ‘cgi’

str = “”

def get_cpnum(link)
arrParts = link.split(’ ')
CGI.parse(URI.parse(arrParts[1]).query)[‘cpnum’]
end

puts get_cpnum(str)

casper_the_ghost · September 1, 2007, 10:51pm

[email protected] wrote:

This would work if the string where a proper url. But it is a
hyperlink.

Your point? A hyperlink is a URL in the WWW context.

casper_the_ghost · September 2, 2007, 1:06am

On Sep 1, 3:50 pm, Phil [email protected] wrote:

[email protected] wrote:

This would work if the string where a proper url. But it is a
hyperlink.

Your point? A hyperlink is a URL in the WWW context.

–
Phillip G.

If you try to parse URI throws an error.

casper_the_ghost · September 2, 2007, 2:00pm

On 02.09.2007 01:03, [email protected] wrote:

On Sep 1, 3:50 pm, Phil [email protected] wrote:

[email protected] wrote:

This would work if the string where a proper url. But it is a
hyperlink.
Your point? A hyperlink is a URL in the WWW context.

–
Phillip G.

If you try to parse URI throws an error.

Does it? This works for me:

irb(main):001:0> require ‘uri’
=> true
irb(main):002:0> u=URI.parse(‘foo.bar/baz?x=2’)
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> “x=2”
irb(main):004:0> u=URI.parse(‘baz?x=2’)
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> “x=2”

Cheers

robert

casper_the_ghost · September 2, 2007, 3:36pm

On Sep 2, 6:59 am, Robert K. [email protected] wrote:

irb(main):004:0> u=URI.parse(‘baz?x=2’)
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> “x=2”

Cheers
    robert

I meant if you try to parse the string
str = “”
it throws an error.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in split': bad URI(is not URI?): <a href='showmono.asp?cpnum=555&monotype=full' target='main'> (URI::InvalidURIError) from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in parse’
from uritest.rb:8

casper_the_ghost · September 2, 2007, 1:27am

On Sep 1, 2:47 pm, Aaron P. [email protected] wrote:

irb(main):008:0> u.query.split(‘&’)
=> true

Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.

–
Aaron P.http://tenderlovemaking.com/

Here’s what I ended up with

require ‘uri’
require ‘cgi’
require ‘hpricot’

def get_query_value(link, key=‘’)
doc = Hpricot(link)

 if key.empty?
    CGI.parse(URI.parse(doc.at("a")['href']).query)
 else
    CGI.parse(URI.parse(doc.at("a")['href']).query)[key]
 end

end

str = “”

p get_query_value(str)
puts get_query_value(str,‘cpnum’)
puts get_query_value(str,‘monotype’)

It allows me to ask for the complete hash or a particular key

Thanks,

Luis