SQLite3/Sinatra not returning results

Hi,

This has been doing my head in all morning - I wonder if anyone can help
me work out where my problem is:

I have the following code within a Sinatra app:

$probe_db_file = “data/lumiHumanIDMapping.sqlite”
$probe_table = “HumanHT12_V4_0_R2_15002873_B”
$data_db_file = “data/rpediff.db”

$probe_db = SQLite3::Database.new($probe_db_file)
$data_db = SQLite3::Database.new($data_db_file)

get ‘/:gene’ do |gene|

STDOUT.puts gene.inspect
STDOUT.puts gene.class
STDOUT.puts gene == “PAX6”

probes = $probe_db.execute(“SELECT Probe_ID,Symbol from
#{$probe_table}
WHERE Symbol = ?”,gene).map{|row| row[0]}

STDOUT.puts “#{Time.now} - Gene: #{gene}”
STDOUT.puts “#{Time.now} - Probes: #{probes.inspect}”

gene = “PAX6”

probes = $probe_db.execute(“SELECT Probe_ID,Symbol from
#{$probe_table}
WHERE Symbol = ?”,gene).map{|row| row[0]}

STDOUT.puts “#{Time.now} - Gene: #{gene}”
STDOUT.puts “#{Time.now} - Probes: #{probes.inspect}”

end

Running on my development machine (OSX, Ruby 1.8.7, Sinatra 1.1,
sqlite3-ruby 1.3.2) this logs (as expected):

“PAX6”
String
true
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: [“ILMN_1789905”,
“ILMN_2314140”]
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: [“ILMN_1789905”,
“ILMN_2314140”]

Running on my production machine (Linux, Ruby 1.9.2, Sinatra 1.1,
sqlite3-ruby 1.3.2) the same query logs:

“PAX6”
String
true
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: []
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: [“ILMN_1789905”, “ILMN_2314140”]

So on this machine the first query returns an empty result for some
reason, even though the rows it should match are in $probe_table (as
you
can see I can find them if I set the ‘gene’ parameter ‘manually’). All I
can think of is that somewhere in Sinatra/sqlite3-ruby the ‘gene’
variable
is not quite the simple String it claims to be, but the fact that this
only
seems to happen on Ruby 1.9.2 has got me baffled! Both Probe_ID and
Symbol
are TEXT columns in Sqlite3 BTW.

Hi Alex,

the problem is probably related to the different ruby versions since
many things have changed between 1.8.7 and 1.9.2. Try installing ruby
1.9.2 with RVM ( http://rvm.beginrescueend.com ) on your development
machine and see if you have the same problem.

$ gem install rvm
$ rvm-install
$ rvm install ruby-1.9.2
$ rvm use --default ruby-1.9.2

Fix your .profile, make sure you don’t export GEM_HOME in the
environment and that ~/.gemrc
doesn’t list “gem: --user-install”. Othewise, it works pretty well.

RVM is also able to switch ruby versions depending on the project you’re
working on, which is handy if you have different projects which require
different ruby versions.

Cheers,
zimbatm

On Thu, 25 Nov 2010 20:35:13 +0900, “zimbatm …” [email protected]
wrote:

$ rvm use --default ruby-1.9.2
zimbatm
Thanks. It turned out to be an String encoding issue. Sinatra was
returning an “ASCII-8BIT” encoded string, while the SQLite3 db is
“US-ASCII” encoded. Manually setting the Sinatra string encoding to
“US-ASCII” got the DB query to match.

Alex G. wrote in post #963848:

On Thu, 25 Nov 2010 20:35:13 +0900, “zimbatm …” [email protected]
wrote:

$ rvm use --default ruby-1.9.2
zimbatm
Thanks. It turned out to be an String encoding issue. Sinatra was
returning an “ASCII-8BIT” encoded string, while the SQLite3 db is
“US-ASCII” encoded. Manually setting the Sinatra string encoding to
“US-ASCII” got the DB query to match.

Ergh. I can replicate this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2
(gem) under Ubuntu Lucid x86_64.

ruby-1.9.2-p0 > require ‘sqlite3’
=> true
ruby-1.9.2-p0 > db = SQLite3::Database.new(“foo.db”)
=> #SQLite3::Database:0x00000002631bc0
ruby-1.9.2-p0 > db.execute(“create table foo (id integer auto_increment,
bar varchar(255))”)
=> []
ruby-1.9.2-p0 > db.execute(“insert into foo (bar) values (‘hello’)”)
=> []
ruby-1.9.2-p0 > s1, s2, s3 = “hello”,
“hello”.force_encoding(“US-ASCII”), “hello”.force_encoding(“ASCII-8BIT”)
=> [“hello”, “hello”, “hello”]
ruby-1.9.2-p0 > s1 == s2
=> true
ruby-1.9.2-p0 > s1 == s3
=> true
ruby-1.9.2-p0 > db.execute(“select * from foo where bar=?”, s1)
=> [[nil, “hello”]]
ruby-1.9.2-p0 > db.execute(“select * from foo where bar=?”, s2)
=> [[nil, “hello”]]
ruby-1.9.2-p0 > db.execute(“select * from foo where bar=?”, s3)
=> []

So s1 == s3, but the two queries give different results?? This is
barking mad. Having said that, I can’t find anything in the sqlite3-ruby
documentation which says how queries may be affected by encodings, so
the behaviour is undefined.

Sinatra’s documentation also doesn’t seem to mention the encoding that
your |gene| parameter will have, so you have to find out by
trial-and-error that it’s ASCII-8BIT. You might have guessed this
because the data comes from a socket and sockets get ASCII-8BIT by
default - but Sinatra might have decided to parse the
Content-Type:…encoding parameter and apply that.

In ruby 1.8 the value “hello” is just 5 bytes, with no hidden dimension
to worry about.

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack(‘M’) must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

Oops, that’s unpack(‘M’), Rack is using unpack(‘m’).

Content preview: On 29.11.2010 12:39, Brian C. wrote: > Ergh. I
can replicate
this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2 > (gem) under
Ubuntu Lucid
x86_64. > > ruby-1.9.2-p0 > require ‘sqlite3’ > => true >
ruby-1.9.2-p0 >
db = SQLite3::Database.new(“foo.db”) > =>
#SQLite3::Database:0x00000002631bc0
> ruby-1.9.2-p0 > db.execute(“create table foo (id integer
auto_increment,
> bar varchar(255))”) > => [] > ruby-1.9.2-p0 > db.execute(“insert
into foo
(bar) values (‘hello’)”) > => [] > ruby-1.9.2-p0 > s1, s2, s3 =
“hello”,

“hello”.force_encoding(“US-ASCII”),
“hello”.force_encoding(“ASCII-8BIT”)
=> [“hello”, “hello”, “hello”] > ruby-1.9.2-p0 > s1 == s2 > =>
true > ruby-1.9.2-p0
s1 == s3 > => true > ruby-1.9.2-p0 > db.execute(“select * from foo
where
bar=?”, s1) > => [[nil, “hello”]] > ruby-1.9.2-p0 >
db.execute(“select *
from foo where bar=?”, s2) > => [[nil, “hello”]] > ruby-1.9.2-p0 >
db.execute(“select
* from foo where bar=?”, s3) > => [] > > So s1 == s3, but the two
queries
give different results?? This is > barking mad. Having said that, I
can’t
find anything in the sqlite3-ruby > documentation which says how
queries
may be affected by encodings, so > the behaviour is undefined. […]

Content analysis details: (-2.6 points, 5.0 required)

pts rule name description



-1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
information
0.3 FILL_THIS_FORM_FRAUD_PHISH Answer suspicious question(s)
X-Cloudmark-Analysis: v=1.1
cv=kRmApA04uPli/d8B8u1M0swu98hronvs6VvAqI8uQP0= c=1 sm=0
a=svAFAT5c7LwA:10 a=n5_bRNUYKd4A:10 a=IkcTkHD0fZMA:10
a=ifuZSf0M0LJmHL7QMhgA:9 a=5xJxjBpEekpQfdk-FI8A:7
a=OicNPj6H6IkURmuq_doO-2SJYXsA:4 a=QEXdDO2ut3YA:10
a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk
Lines: 110
List-Id: ruby-talk.ruby-lang.org
List-Software: fml [fml 4.0.3 release (20011202/4.0.3)]
List-Post: mailto:[email protected]
List-Owner: mailto:[email protected]
List-Help: mailto:[email protected]?body=help
List-Unsubscribe: mailto:[email protected]?body=unsubscribe
Received-SPF: none (Address does not pass the Sender Policy Framework)
SPF=FROM;
[email protected];
remoteip=::ffff:221.186.184.68;
remotehost=carbon.ruby-lang.org;
helo=carbon.ruby-lang.org;
receiver=eq4.andreas-s.net;

On 29.11.2010 12:39, Brian C. wrote:

ruby-1.9.2-p0 > db.execute(“insert into foo (bar) values (‘hello’)”)
ruby-1.9.2-p0 > db.execute(“select * from foo where bar=?”, s2)
=> [[nil, “hello”]]
ruby-1.9.2-p0 > db.execute(“select * from foo where bar=?”, s3)
=> []

So s1 == s3, but the two queries give different results?? This is
barking mad. Having said that, I can’t find anything in the sqlite3-ruby
documentation which says how queries may be affected by encodings, so
the behaviour is undefined.

I was just affected by that, Brian, rhanks for analyzing this problem in
the past! I’m running 1.9.2-p180 in my case. Now when I re-read the
other thread "A question about Ruby 1.9’s “external encoding” when
Robert concludes “The rule as such is pretty clear IMHO” and “Now,
everything is clear” I feel like:

nothing.is.clear.to.me

What is this whole character encoding madness about? I never encountered
a language I stumbled into so many subtle problems with encodings left
and right (mainly working with Java in my case). The more I work with
ruby 1.9, the more confusing it gets. I’m getting worried. What am I
missing what others do know that I run into such troubles?

Getting back to this specific case, where’s the problem exactly here? Is
it Sinatra? My source file? The sqlite database?

I wrote a simple testing app for the sqlite db and got this:

$ sqlite3 database.sqlite3 .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE user ( username varchar(255), password varchar(255), email
varchar(255));
INSERT INTO “user” VALUES(‘a’,‘b’,‘c’);
COMMIT;

$ cat dbitest.rb
require ‘dbi’
require ‘pp’

dbi = DBI.connect(‘dbi:SQLite3:database.sqlite3’)

sql = “SELECT * FROM user WHERE username = ? AND password = ?”

username = ‘a’
password = ‘b’

pp username.encoding, password.encoding, dbi.select_one(sql, username,
password)[0].encoding

$ ruby dbitest.rb
#Encoding:US-ASCII
#Encoding:US-ASCII
#Encoding:UTF-8

Within Sinatra it is as you said:

Relevant excerpt of the code:

if @auth.provided? && @auth.basic? && @auth.credentials
sql = “SELECT * FROM user WHERE username = ? AND password = ?”
username = @auth.credentials[0]
password = @auth.credentials[1]
pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)
username.force_encoding(‘US-ASCII’)
password.force_encoding(‘US-ASCII’)
pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)[0].encoding
end

This is the output

#Encoding:ASCII-8BIT
#Encoding:ASCII-8BIT
nil
#Encoding:US-ASCII
#Encoding:US-ASCII
#Encoding:UTF-8

Stunning.

  • Markus

On Sun, Mar 27, 2011 at 5:14 PM, Markus F. [email protected]
wrote:

pp username.encoding, password.encoding, $dbi.select_one(sql,
#Encoding:UTF-8
Getting back to this specific case, where’s the
problem exactly here? Is it Sinatra? My source
file? The sqlite database?

The credentials in Sinatra are unpacked in Rack:

ruby 1.9.2p0:

Encoding.default_external
=> #Encoding:UTF-8
Encoding.default_internal
=> nil
s = ‘hello’
=> “hello”
s.encoding
=> #Encoding:UTF-8
[s].pack(‘m*’)
=> “aGVsbG8=\n”
[s].pack(‘m*’).unpack(‘m*’)
=> [“hello”]
[s].pack(‘m*’).unpack(‘m*’).first.encoding
=> #Encoding:ASCII-8BIT

rack/blob/master/lib/rack/auth/basic.rb:
def credentials
@credentials ||= params.unpack(“m*”).first.split(/:/, 2)
end

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack(‘M’) must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

I don’t know if there is any reasonable way for Rack to reliably
return a more correct encoding for the credentials, but that seems to
be where your incompatible encoding is coming from.

Content preview: On 28.03.2011 02:10, [email protected] wrote: > The
credentials
in Sinatra are unpacked in Rack: > > ruby 1.9.2p0: >>
Encoding.default_external
> => #Encoding:UTF-8 >> Encoding.default_internal > => nil >> s =
‘hello’
> => “hello” >> s.encoding > => #Encoding:UTF-8 >> [s].pack(‘m*’)

=>
“aGVsbG8=\n” >> [s].pack(‘m*’).unpack(‘m*’) > => [“hello”] >>
[s].pack(‘m*’).unpack(‘m*’).first.encoding
=> #Encoding:ASCII-8BIT > >
rack/blob/master/lib/rack/auth/basic.rb:
def credentials > @credentials ||=
params.unpack(“m*”).first.split(/:/,

  1. end > > http://redmine.ruby-lang.org/issues/show/4279 > pack.c
    (pack_unpack):
    the resulted string of unpack(‘M’) must have > ASCII-8BIT encoding
    (and ENC_CODERANGE_VALID).
    [ruby-core:34482] > > I don’t know if there is any reasonable way
    for Rack
    to reliably > return a more correct encoding for the credentials,
    but that
    seems to > be where your incompatible encoding is coming from. […]

Content analysis details: (-2.9 points, 5.0 required)

pts rule name description



-1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
X-Cloudmark-Analysis: v=1.1
cv=zlRBWuFCZaNL9+WHNm1pWLowY5Lx061w2zJBJiDkNAU= c=1 sm=0
a=aofHTkXiRO8A:10 a=NfwdtDP_n2AA:10 a=8nJEP1OIZ-IA:10 a=xqWC_Br6kY4A:10
a=pGLkceISAAAA:8 a=w2PP7KgtAAAA:8 a=VwUrYYIdhK9rIgZAn9gA:9
a=Dm_dIbcVq_-eBn_k6bYA:7 a=gAU3ZQRgsWUL-DqYwqxl0kMbEBcA:4
a=wPNLvfGTeEIA:10 a=MSl-tDqOz04A:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: bulk
Lines: 68
List-Id: ruby-talk.ruby-lang.org
List-Software: fml [fml 4.0.3 release (20011202/4.0.3)]
List-Post: mailto:[email protected]
List-Owner: mailto:[email protected]
List-Help: mailto:[email protected]?body=help
List-Unsubscribe: mailto:[email protected]?body=unsubscribe
Received-SPF: none (Address does not pass the Sender Policy Framework)
SPF=FROM;
[email protected];
remoteip=::ffff:221.186.184.68;
remotehost=carbon.ruby-lang.org;
helo=carbon.ruby-lang.org;
receiver=eq4.andreas-s.net;

On 28.03.2011 02:10, [email protected] wrote:

=> #Encoding:UTF-8
end

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack(‘M’) must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

I don’t know if there is any reasonable way for Rack to reliably
return a more correct encoding for the credentials, but that seems to
be where your incompatible encoding is coming from.

On 28.03.2011 02:12, [email protected] wrote:

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack(‘M’) must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

Oops, that’s unpack(‘M’), Rack is using unpack(‘m’).

Thanks, that very informative! Guess proper encoding handling just comes
with cost and confusion. Although it’s very confusing, it actually looks
right what ruby is doing. When it decodes base64, how should it know
it’s encoding? My example, involving PHP:

$ php -r 'echo base64_encode("this is a t