Forum: Ruby SQLite3/Sinatra not returning results

Ab7126737dc26c07393d69d7fb6571e9?d=identicon&s=25 Alex Gutteridge (Guest)
on 2010-11-18 12:19
(Received via mailing list)
Hi,

This has been doing my head in all morning - I wonder if anyone can help
me work out where my problem is:

I have the following code within a Sinatra app:

$probe_db_file = "data/lumiHumanIDMapping.sqlite"
$probe_table = "HumanHT12_V4_0_R2_15002873_B"
$data_db_file  = "data/rpediff.db"

$probe_db = SQLite3::Database.new($probe_db_file)
$data_db  = SQLite3::Database.new($data_db_file)

get '/:gene' do |gene|

  STDOUT.puts gene.inspect
  STDOUT.puts gene.class
  STDOUT.puts gene == "PAX6"

  probes = $probe_db.execute("SELECT Probe_ID,Symbol from
#{$probe_table}
WHERE Symbol = ?",gene).map{|row| row[0]}

  STDOUT.puts "#{Time.now} - Gene: #{gene}"
  STDOUT.puts "#{Time.now} - Probes: #{probes.inspect}"

  gene = "PAX6"

  probes = $probe_db.execute("SELECT Probe_ID,Symbol from
#{$probe_table}
WHERE Symbol = ?",gene).map{|row| row[0]}

  STDOUT.puts "#{Time.now} - Gene: #{gene}"
  STDOUT.puts "#{Time.now} - Probes: #{probes.inspect}"

end

Running on my development machine (OSX, Ruby 1.8.7, Sinatra 1.1,
sqlite3-ruby 1.3.2) this logs (as expected):

"PAX6"
String
true
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: ["ILMN_1789905",
"ILMN_2314140"]
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: ["ILMN_1789905",
"ILMN_2314140"]

Running on my production machine (Linux, Ruby 1.9.2, Sinatra 1.1,
sqlite3-ruby 1.3.2) the same query logs:

"PAX6"
String
true
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: []
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: ["ILMN_1789905", "ILMN_2314140"]

So on this machine the first query returns an empty result for some
reason, even though the rows it should match *are* in $probe_table (as
you
can see I can find them if I set the 'gene' parameter 'manually'). All I
can think of is that somewhere in Sinatra/sqlite3-ruby the 'gene'
variable
is not quite the simple String it claims to be, but the fact that this
only
seems to happen on Ruby 1.9.2 has got me baffled! Both Probe_ID and
Symbol
are TEXT columns in Sqlite3 BTW.
D4febba21f420c09834596b6fef26614?d=identicon&s=25 zimbatm ... (zimbatm)
on 2010-11-25 12:35
Hi Alex,

the problem is probably related to the different ruby versions since
many things have changed between 1.8.7 and 1.9.2. Try installing ruby
1.9.2 with RVM ( http://rvm.beginrescueend.com ) on your development
machine and see if you have the same problem.

$ gem install rvm
$ rvm-install
$ rvm install ruby-1.9.2
$ rvm use --default ruby-1.9.2

Fix your .profile, make sure you don't export GEM_HOME in the
environment and that ~/.gemrc
doesn't list "gem: --user-install". Othewise, it works pretty well.

RVM is also able to switch ruby versions depending on the project you're
working on, which is handy if you have different projects which require
different ruby versions.

Cheers,
  zimbatm
Ab7126737dc26c07393d69d7fb6571e9?d=identicon&s=25 Alex Gutteridge (Guest)
on 2010-11-25 13:15
(Received via mailing list)
On Thu, 25 Nov 2010 20:35:13 +0900, "zimbatm ..." <jonas@pfenniger.name>
wrote:
> $ rvm use --default ruby-1.9.2
>   zimbatm
Thanks. It turned out to be an String encoding issue. Sinatra was
returning an "ASCII-8BIT" encoded string, while the SQLite3 db is
"US-ASCII" encoded. Manually setting the Sinatra string encoding to
"US-ASCII" got the DB query to match.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-29 12:39
Alex Gutteridge wrote in post #963848:
> On Thu, 25 Nov 2010 20:35:13 +0900, "zimbatm ..." <jonas@pfenniger.name>
> wrote:
>> $ rvm use --default ruby-1.9.2
>>   zimbatm
> Thanks. It turned out to be an String encoding issue. Sinatra was
> returning an "ASCII-8BIT" encoded string, while the SQLite3 db is
> "US-ASCII" encoded. Manually setting the Sinatra string encoding to
> "US-ASCII" got the DB query to match.

Ergh. I can replicate this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2
(gem) under Ubuntu Lucid x86_64.

ruby-1.9.2-p0 > require 'sqlite3'
 => true
ruby-1.9.2-p0 > db = SQLite3::Database.new("foo.db")
 => #<SQLite3::Database:0x00000002631bc0>
ruby-1.9.2-p0 > db.execute("create table foo (id integer auto_increment,
bar varchar(255))")
 => []
ruby-1.9.2-p0 > db.execute("insert into foo (bar) values ('hello')")
 => []
ruby-1.9.2-p0 > s1, s2, s3 = "hello",
"hello".force_encoding("US-ASCII"), "hello".force_encoding("ASCII-8BIT")
 => ["hello", "hello", "hello"]
ruby-1.9.2-p0 > s1 == s2
 => true
ruby-1.9.2-p0 > s1 == s3
 => true
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s1)
 => [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s2)
 => [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s3)
 => []

So s1 == s3, but the two queries give different results?? This is
barking mad. Having said that, I can't find anything in the sqlite3-ruby
documentation which says how queries may be affected by encodings, so
the behaviour is undefined.

Sinatra's documentation also doesn't seem to mention the encoding that
your |gene| parameter will have, so you have to find out by
trial-and-error that it's ASCII-8BIT. You might have guessed this
because the data comes from a socket and sockets get ASCII-8BIT by
default - but Sinatra might have decided to parse the
Content-Type:...encoding parameter and apply that.

In ruby 1.8 the value "hello" is just 5 bytes, with no hidden dimension
to worry about.
Ee2809522b2e56d0d6b656486bc5e0db?d=identicon&s=25 Markus Fischer (Guest)
on 2011-03-27 23:15
(Received via mailing list)
Content preview:  On 29.11.2010 12:39, Brian Candler wrote: > Ergh. I
can replicate
    this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2 > (gem) under
Ubuntu Lucid
    x86_64. > > ruby-1.9.2-p0 > require 'sqlite3' > => true >
ruby-1.9.2-p0 >
    db = SQLite3::Database.new("foo.db") > =>
#<SQLite3::Database:0x00000002631bc0>
    > ruby-1.9.2-p0 > db.execute("create table foo (id integer
auto_increment,
    > bar varchar(255))") > => [] > ruby-1.9.2-p0 > db.execute("insert
into foo
    (bar) values ('hello')") > => [] > ruby-1.9.2-p0 > s1, s2, s3 =
"hello",
   > "hello".force_encoding("US-ASCII"),
"hello".force_encoding("ASCII-8BIT")
    > => ["hello", "hello", "hello"] > ruby-1.9.2-p0 > s1 == s2 > =>
true > ruby-1.9.2-p0
    > s1 == s3 > => true > ruby-1.9.2-p0 > db.execute("select * from foo
where
    bar=?", s1) > => [[nil, "hello"]] > ruby-1.9.2-p0 >
db.execute("select *
   from foo where bar=?", s2) > => [[nil, "hello"]] > ruby-1.9.2-p0 >
db.execute("select
    * from foo where bar=?", s3) > => [] > > So s1 == s3, but the two
queries
    give different results?? This is > barking mad. Having said that, I
can't
    find anything in the sqlite3-ruby > documentation which says how
queries
   may be affected by encodings, so > the behaviour is undefined. [...]

 Content analysis details:   (-2.6 points, 5.0 required)

  pts rule name              description
 ---- ----------------------
--------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
  0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
information
  0.3 FILL_THIS_FORM_FRAUD_PHISH Answer suspicious question(s)
X-Cloudmark-Analysis: v=1.1
cv=kRmApA04uPli/d8B8u1M0swu98hronvs6VvAqI8uQP0= c=1 sm=0
a=svAFAT5c7LwA:10 a=n5_bRNUYKd4A:10 a=IkcTkHD0fZMA:10
a=ifuZSf0M0LJmHL7QMhgA:9 a=5xJxjBpEekpQfdk-FI8A:7
a=OicNPj6H6IkURmuq_doO-2SJYXsA:4 a=QEXdDO2ut3YA:10
a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk
Lines: 110
List-Id: ruby-talk.ruby-lang.org
List-Software: fml [fml 4.0.3 release (20011202/4.0.3)]
List-Post: <mailto:ruby-talk@ruby-lang.org>
List-Owner: <mailto:ruby-talk-admin@ruby-lang.org>
List-Help: <mailto:ruby-talk-ctl@ruby-lang.org?body=help>
List-Unsubscribe: <mailto:ruby-talk-ctl@ruby-lang.org?body=unsubscribe>
Received-SPF: none (Address does not pass the Sender Policy Framework)
  SPF=FROM;
  sender=markus@fischer.name;
  remoteip=::ffff:221.186.184.68;
  remotehost=carbon.ruby-lang.org;
  helo=carbon.ruby-lang.org;
  receiver=eq4.andreas-s.net;

On 29.11.2010 12:39, Brian Candler wrote:
> ruby-1.9.2-p0 > db.execute("insert into foo (bar) values ('hello')")
> ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s2)
>  => [[nil, "hello"]]
> ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s3)
>  => []
>
> So s1 == s3, but the two queries give different results?? This is
> barking mad. Having said that, I can't find anything in the sqlite3-ruby
> documentation which says how queries may be affected by encodings, so
> the behaviour is undefined.

I was just affected by that, Brian, rhanks for analyzing this problem in
the past! I'm running 1.9.2-p180 in my case. Now when I re-read the
other thread "A question about Ruby 1.9's "external encoding" when
Robert concludes "The rule as such is pretty clear IMHO" and "Now,
everything is clear" I feel like:

nothing.is.clear.to.me

What is this whole character encoding madness about? I never encountered
a language I stumbled into so many subtle problems with encodings left
and right (mainly working with Java in my case). The more I work with
ruby 1.9, the more confusing it gets. I'm getting worried. What am I
missing what others do know that I run into such troubles?

Getting back to this specific case, where's the problem exactly here? Is
it Sinatra? My source file? The sqlite database?

I wrote a simple testing app for the sqlite db and got this:

$ sqlite3 database.sqlite3 .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE user ( username varchar(255), password varchar(255), email
varchar(255));
INSERT INTO "user" VALUES('a','b','c');
COMMIT;


$ cat dbitest.rb
require 'dbi'
require 'pp'

dbi = DBI.connect('dbi:SQLite3:database.sqlite3')

sql = "SELECT * FROM user WHERE username = ? AND password = ?"

username = 'a'
password = 'b'

pp username.encoding, password.encoding, dbi.select_one(sql, username,
password)[0].encoding


$ ruby dbitest.rb
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
#<Encoding:UTF-8>


Within Sinatra it is as you said:

Relevant excerpt of the code:

 if @auth.provided? && @auth.basic? && @auth.credentials
     sql = "SELECT * FROM user WHERE username = ? AND password = ?"
     username = @auth.credentials[0]
     password = @auth.credentials[1]
     pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)
     username.force_encoding('US-ASCII')
     password.force_encoding('US-ASCII')
     pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)[0].encoding
 end


This is the output

#<Encoding:ASCII-8BIT>
#<Encoding:ASCII-8BIT>
nil
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
#<Encoding:UTF-8>

Stunning.

- Markus
E0c987f680cd640c14912ebfbf0f0f07?d=identicon&s=25 unknown (Guest)
on 2011-03-28 02:11
(Received via mailing list)
On Sun, Mar 27, 2011 at 5:14 PM, Markus Fischer <markus@fischer.name>
wrote:
>   pp username.encoding, password.encoding, $dbi.select_one(sql,
> #<Encoding:UTF-8>
> Getting back to this specific case, where's the
> problem exactly here? Is it Sinatra? My source
> file? The sqlite database?

The credentials in Sinatra are unpacked in Rack:

ruby 1.9.2p0:
> Encoding.default_external
=> #<Encoding:UTF-8>
> Encoding.default_internal
=> nil
> s = 'hello'
=> "hello"
> s.encoding
=> #<Encoding:UTF-8>
> [s].pack('m*')
=> "aGVsbG8=\n"
> [s].pack('m*').unpack('m*')
=> ["hello"]
> [s].pack('m*').unpack('m*').first.encoding
=> #<Encoding:ASCII-8BIT>

rack/blob/master/lib/rack/auth/basic.rb:
        def credentials
          @credentials ||= params.unpack("m*").first.split(/:/, 2)
        end

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack('M') must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

I don't know if there is any reasonable way for Rack to reliably
return a more correct encoding for the credentials, but that seems to
be where your incompatible encoding is coming from.
E0c987f680cd640c14912ebfbf0f0f07?d=identicon&s=25 unknown (Guest)
on 2011-03-28 02:12
(Received via mailing list)
> http://redmine.ruby-lang.org/issues/show/4279
> pack.c (pack_unpack): the resulted string of unpack('M') must have
> ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

Oops, that's unpack('M'), Rack is using unpack('m').
Ee2809522b2e56d0d6b656486bc5e0db?d=identicon&s=25 Markus Fischer (Guest)
on 2011-03-28 14:18
(Received via mailing list)
Content preview:  On 28.03.2011 02:10, brabuhr@gmail.com wrote: > The
credentials
    in Sinatra are unpacked in Rack: > > ruby 1.9.2p0: >>
Encoding.default_external
    > => #<Encoding:UTF-8> >> Encoding.default_internal > => nil >> s =
'hello'
    > => "hello" >> s.encoding > => #<Encoding:UTF-8> >> [s].pack('m*')
> =>
   "aGVsbG8=\n" >> [s].pack('m*').unpack('m*') > => ["hello"] >>
[s].pack('m*').unpack('m*').first.encoding
    > => #<Encoding:ASCII-8BIT> > >
rack/blob/master/lib/rack/auth/basic.rb:
   > def credentials > @credentials ||=
params.unpack("m*").first.split(/:/,
   2) > end > > http://redmine.ruby-lang.org/issues/show/4279 > pack.c
(pack_unpack):
    the resulted string of unpack('M') must have > ASCII-8BIT encoding
(and ENC_CODERANGE_VALID).
    [ruby-core:34482] > > I don't know if there is any reasonable way
for Rack
    to reliably > return a more correct encoding for the credentials,
but that
    seems to > be where your incompatible encoding is coming from. [...]

 Content analysis details:   (-2.9 points, 5.0 required)

  pts rule name              description
 ---- ----------------------
--------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-Cloudmark-Analysis: v=1.1
cv=zlRBWuFCZaNL9+WHNm1pWLowY5Lx061w2zJBJiDkNAU= c=1 sm=0
a=aofHTkXiRO8A:10 a=NfwdtDP_n2AA:10 a=8nJEP1OIZ-IA:10 a=xqWC_Br6kY4A:10
a=pGLkceISAAAA:8 a=w2PP7KgtAAAA:8 a=VwUrYYIdhK9rIgZAn9gA:9
a=Dm_dIbcVq_-eBn_k6bYA:7 a=gAU3ZQRgsWUL-DqYwqxl0kMbEBcA:4
a=wPNLvfGTeEIA:10 a=MSl-tDqOz04A:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: bulk
Lines: 68
List-Id: ruby-talk.ruby-lang.org
List-Software: fml [fml 4.0.3 release (20011202/4.0.3)]
List-Post: <mailto:ruby-talk@ruby-lang.org>
List-Owner: <mailto:ruby-talk-admin@ruby-lang.org>
List-Help: <mailto:ruby-talk-ctl@ruby-lang.org?body=help>
List-Unsubscribe: <mailto:ruby-talk-ctl@ruby-lang.org?body=unsubscribe>
Received-SPF: none (Address does not pass the Sender Policy Framework)
  SPF=FROM;
  sender=markus@fischer.name;
  remoteip=::ffff:221.186.184.68;
  remotehost=carbon.ruby-lang.org;
  helo=carbon.ruby-lang.org;
  receiver=eq4.andreas-s.net;

On 28.03.2011 02:10, brabuhr@gmail.com wrote:
> =>  #<Encoding:UTF-8>
>          end
>
> http://redmine.ruby-lang.org/issues/show/4279
> pack.c (pack_unpack): the resulted string of unpack('M') must have
> ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]
>
> I don't know if there is any reasonable way for Rack to reliably
> return a more correct encoding for the credentials, but that seems to
> be where your incompatible encoding is coming from.

On 28.03.2011 02:12, brabuhr@gmail.com wrote:
 >> http://redmine.ruby-lang.org/issues/show/4279
 >> pack.c (pack_unpack): the resulted string of unpack('M') must have
 >> ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]
 >
 > Oops, that's unpack('M'), Rack is using unpack('m').
 >

Thanks, that very informative! Guess proper encoding handling just comes
with cost and confusion. Although it's very confusing, it actually looks
right what ruby is doing. When it decodes base64, how should it know
it's encoding? My example, involving PHP:

$ php -r 'echo base64_encode("this is a t
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.