Querying using HTTP

bodikp · April 15, 2009, 3:40pm

I’m querying a graphics database using http. On my server and on my PC,
I can successfully run a simple script that checks for the existence of
files in different formatsâ€”PDF, PNG, and TIFF. But, when my colleagues
run this same script, they get the error I display below. I’ve literally
copied the entire directory structure of my c:\ruby setup to their PCs,
and, they still get this error message. Can someone help me with this?

Thanks,
Peter

C:\Users\hv0797.INTDOM>ruby c:\scripts\checkorca.rb %1

C:/ruby/lib/ruby/1.9.1/net/http.rb:2212:in error!': 400 "Bad Request" (Net::HTTPServerExeption) from C:/ruby/lib/ruby/1.9.1/net/http.rb:2221:invalue’
from c:/scripts/checkorca.rb:23:in block in <main>' from C:/ruby/lib/ruby/1.9.1/net/http.rb:564:instart’
from C:/ruby/lib/ruby/1.9.1/net/http.rb:453:in start' from c:/scripts/checkorca.rb:12:in’
from C:/ruby/lib/ruby/1.9.1/net/http.rb:2221:in value' from c:/scripts/checkorca.rb:23:inblock in ’
from C:/ruby/lib/ruby/1.9.1/net/http.rb:564:in start' from C:/ruby/lib/ruby/1.9.1/net/http.rb:453:instart’
from c:/scripts/checkorca.rb:12:in `’

bodikp · April 16, 2009, 9:31am

Peter B. wrote:

I’m querying a graphics database using http.

… and ruby 1.9.1, it appears

On my server and on my PC,
I can successfully run a simple script that checks for the existence of
files in different formatsâ€”PDF, PNG, and TIFF. But, when my colleagues
run this same script, they get the error I display below.

ruby 1.9’s runtime behaviour of anything using String varies depending
on the environment it runs in. It is quite difficult to get it behave
sanely, and is such a mess that I stick with ruby 1.8. I think this is
particularly likely to be your problem given that you are handling
binary data.

You could try looking at the body of the 400 response to see if it has
any more detail about what’s gone wrong (that is, check Response#body
before Response#value), or you could use wireshark to look at the actual
packets going back and forth. Compare what you see on the working
machine with the non-working one.

If you are reading any data from local files on disk before posting, you
could try File.open(name,“rb”). You could also try adding “# encoding:
UTF-8” to the top of your source file, and also running your script
using ruby -Kn script.rb

If those don’t work, then I’d suggest you stick with ruby 1.8 for the
next year or so.

bodikp · April 16, 2009, 1:48pm

Brian C. wrote:

Peter B. wrote:

I’m querying a graphics database using http.

… and ruby 1.9.1, it appears

On my server and on my PC,
I can successfully run a simple script that checks for the existence of
files in different formatsâ€”PDF, PNG, and TIFF. But, when my colleagues
run this same script, they get the error I display below.

ruby 1.9’s runtime behaviour of anything using String varies depending
on the environment it runs in. It is quite difficult to get it behave
sanely, and is such a mess that I stick with ruby 1.8. I think this is
particularly likely to be your problem given that you are handling
binary data.

You could try looking at the body of the 400 response to see if it has
any more detail about what’s gone wrong (that is, check Response#body
before Response#value), or you could use wireshark to look at the actual
packets going back and forth. Compare what you see on the working
machine with the non-working one.

If you are reading any data from local files on disk before posting, you
could try File.open(name,“rb”). You could also try adding “# encoding:
UTF-8” to the top of your source file, and also running your script
using ruby -Kn script.rb

If those don’t work, then I’d suggest you stick with ruby 1.8 for the
next year or so.

Thanks, Brian. Well, ironically, I’ve got 1.9 on my PC, and it works,
and, I’m running 1.8.6 on my server, and it works there, too. My
assistant’s PC now, too, using 1.9. I increased her permissions on the
c:\ruby folder on her PC. That seemed to do it. But, our other two
colleagues still can’t get it to work. We keep getting the error above.
We’ve got them on 1.9 now. I’m going to try some of your suggestions.

bodikp · April 16, 2009, 4:55pm

On Apr 16, 2009, at 2:31 AM, Brian C. wrote:

run this same script, they get the error I display below.

ruby 1.9’s runtime behaviour of anything using String varies depending
on the environment it runs in.

That’s not accurate.

Certain encoding options have default settings relating to the
environment they run in, but none of that matters if specify the
desired encodings for your source and/or IO objects. These defaults
are provided as conveniences so that simple scripting can fit in
naturally with the rest of the environment.

Removing the defaults would just mean more work for the programmer as
you would be forced to specify all encodings even in situations where
a default makes sense. I also don’t think it’s bad to say that a
programmer must specify the encoding of data they wish to read. How
in the world can we expect Ruby to get a gets() call right on a
UTF-16LE file without us providing a warning about what the data is?

It is quite difficult to get it behave sanely, and is such a mess
that I stick with ruby 1.8.

Ruby 1.8 had a single global variable that, when set, changed the
behavior of all code in the interpreter, including the stuff I didn’t
write. I hope that isn’t your idea of a “sane” system.

Ruby 1.9 probably does require us to learn the bare minimum about how
character encodings are handled. It’s about time. How many years
have we tried to get by with crossed fingers and a prayer that it
would just work out? Character encodings should have been required
reading long before now.

For those who are ready to learn the basics, the new Pickaxe has a
solid 13 page introduction. It doesn’t take long to work through and
it covers the important stuff. If you want to go farther, I’ve
covered character encoding basics, the Ruby 1.8 system, and the new
1.9 system a bit deeper in a series of posts to my blog:

http://blog.grayproductions.net/articles/understanding_m17n

If those don’t work, then I’d suggest you stick with ruby 1.8 for the
next year or so.

For years the Ruby community has begged for more speed and robust
character encoding support. The core team delivered that and much
more this January with a production release that’s substantially
faster and has a very powerful new encoding engine. To repay their
monumental efforts, we complain and urge people to stick with Ruby
1.8. We must truly be the most ungrateful lot of bums ever.

For what it’s worth, I believe Brian is wrong. I think the best thing
we can do as a community is to move everything to Ruby 1.9 as fast as
possible. If there are barriers to us doing that, we need find ways
to tear them down. There are a lot more plusses than minuses, I
promise. Ruby 1.9 is ready for us. Come on in, the water is fine!

James Edward G. II

bodikp · April 16, 2009, 6:30pm

James G. wrote:

Certain encoding options have default settings relating to the
environment they run in, but none of that matters if specify the
desired encodings for your source and/or IO objects.

Put another way: write extra code to defend against environment
pollution, and hope that you haven’t forgotten any places where it is
required.

It is quite difficult to get it behave sanely, and is such a mess
that I stick with ruby 1.8.

Ruby 1.8 had a single global variable that, when set, changed the
behavior of all code in the interpreter, including the stuff I didn’t
write. I hope that isn’t your idea of a “sane” system.

Ruby 1.8 treated strings as sequences of 8-bit bytes unless explicitly
told otherwise. That is sane.

Here is an example of the sort of problems still being caused by String
in 1.9:
http://groups.google.com/group/rack-devel/browse_thread/thread/99628ed37ac5f5b

Ruby 1.9 probably does require us to learn the bare minimum about how
character encodings are handled.

Which is rather difficult if it’s not documented. Yes I know there have
been some third-party efforts, including your own, but I have yet to see
anything which is anywhere near complete.

It’s about time. How many years
have we tried to get by with crossed fingers and a prayer that it
would just work out? Character encodings should have been required
reading long before now.

Sure, people who process text need to understand character encodings.
But text is a small subset of data. When you’re processing JPEGs or PDFs
or ASN1 certificates or HTTP POSTs, you just want something that’s 8-bit
clean.

For years the Ruby community has begged for more speed and robust
character encoding support. The core team delivered that and much
more this January with a production release that’s substantially
faster and has a very powerful new encoding engine.

Nobody’s complaining about improved performance. I’m saying there is
still much pain to be had by using 1.9, and advising that people may
wish to avoid the pain until (hopefully) most of it has gone away.
Anyone who has had no pain with 1.9 or libraries which don’t work under
1.9 is free to speak up.

To repay their
monumental efforts, we complain and urge people to stick with Ruby
1.8. We must truly be the most ungrateful lot of bums ever.

Who’s “we”? Speakly only for myself, I didn’t ask for String to be
changed in this way. And does the amount of effort which went in mean
that I am forbidden from saying that I don’t like the result?

Regards,

Brian.

bodikp · April 16, 2009, 8:31pm

On Apr 16, 2009, at 11:30 AM, Brian C. wrote:

James G. wrote:

Ruby 1.8 had a single global variable that, when set, changed the
behavior of all code in the interpreter, including the stuff I didn’t
write. I hope that isn’t your idea of a “sane” system.

Ruby 1.8 treated strings as sequences of 8-bit bytes unless explicitly
told otherwise. That is sane.

I’m pretty sure you are in the minority with this opinion. You really
like this?

$ ruby -e ‘p “Résumé”[0…1]’
“R\303”

How often is that going to be the desired result?

There were a lot of complaints about Ruby’s encoding support over the
years. A lot. I’m pretty sure if we had all been saying, “Matz, we
love the it’s-all-bytes approach,” there would be no m17n. That just
wasn’t the case though.

Here is an example of the sort of problems still being caused by
String
in 1.9:
http://groups.google.com/group/rack-devel/browse_thread/thread/99628ed37ac5f5b

And if we combed the Web for documented problems caused by the Ruby
1.8 system, do you think we would find a few of those? I would be
willing to bet I’ve seen one encoding related problem post every
couple of weeks I’ve been on Ruby T…

I just did a quick search in one place users are prone to report
issues with my FasterCSV library and about 47% of all the issues ever
reported were character encoding issues.

Ruby 1.9 probably does require us to learn the bare minimum about how
character encodings are handled.

Which is rather difficult if it’s not documented. Yes I know there
have
been some third-party efforts, including your own, but I have yet to
see
anything which is anywhere near complete.

Can you list what’s not yet covered in my blog series? I’m aware of
two very small things that I’ve never once seen used in the wild.
I’ll add those, but let’s say I feel my current coverage is about 98%
complete. How am I still failing to meet your needs?

clean.
Ruby 1.9 has an encoding for that too and it’s very well documented.

For years the Ruby community has begged for more speed and robust
character encoding support. The core team delivered that and much
more this January with a production release that’s substantially
faster and has a very powerful new encoding engine.

Nobody’s complaining about improved performance.

That’s a relief. Now we just need to get everyone over to Ruby 1.9
and those issues will be a thing of the past. Thus, it would make me
happy if you stop telling people not to do that.

I’m saying there is still much pain to be had by using 1.9, and
advising that people may wish to avoid the pain until (hopefully)
most of it has gone away.
Anyone who has had no pain with 1.9 or libraries which don’t work
under
1.9 is free to speak up.

I am speaking up. That’s the point.

I had quite a bit of pain when I adapted FasterCSV to be the standard
CSV library. There were two reasons for that. First, the m17n
implementation was still pretty raw and I ran into bugs and
complications. Those are almost completely resolved now. The second
reason was the lack of documentation, so I wrote some from what I had
learned in converting the code.

Now there’s a lot less pain.

To repay their monumental efforts, we complain and urge people to
stick with Ruby 1.8. We must truly be the most ungrateful lot of
bums ever.

Who’s “we”? Speakly only for myself, I didn’t ask for String to be
changed in this way. And does the amount of effort which went in mean
that I am forbidden from saying that I don’t like the result?

It means that I think your comments are doing harm to the 1.9
migration and I can’t find the good you are doing to balance that.

James Edward G. II

bodikp · April 16, 2009, 9:44pm

James G. wrote:

I’m pretty sure you are in the minority with this opinion.

Quite possibly

You really like this?

$ ruby -e ‘p “Rï¿½sumï¿½”[0…1]’
“R\303”

How often is that going to be the desired result?

Well, if I were extracting the first two bytes from a JPEG header, that
would be exactly what I’d expect. I’ve very rarely wanted to extract the
first two characters from a string. I can think of one example: a
string truncation helper in a web page.

def trunc(string, maxlen=50)
  if string.length > maxlen
    string = string[0,maxlen-3] + "..."
  end
  string
end

I’ll certainly agree that’s something you’d want to do, and /.{,50}/u is
an ugly way of doing it. In any case, I’m not saying there shouldn’t be
any m17n support, or even that tagging strings with encodings is in
itself wrong, as long as the semantic implications are made clear.

The number one bugbear I have is that (unless you take a number of
specific steps to avoid it), program behaviour is inconsistent. You can
run the same program with exactly the same input data on two
different machines, and they will process it differently, possibly even
crashing in one case. If someone has a problem running your app, it’s
now insufficient just to ask what O/S and ruby version they are running
in order to be able to replicate the problem.

Consider an app which is bundled with HTML templates, which the app
reads using File.read(). The templates happen to be written using, say,
UTF-8. It all works fine on my machine, and passes all tests. However it
barfs when run on someone else’s machine, because their environment
variables are different.

I think that LC_ALL is a very poor predictor of what encoding a specific
file is in. Ruby doesn’t trust it for source files (it uses #encoding
tags instead), so why trust it for data?

Now, if the default external encoding were fixed as (say) UTF-8, that
would be more sane. The default behaviour would then be the same on any
machine where ruby is installed:

File#gets returns a string with encoding=‘UTF-8’
File#read returns a string with encoding=‘BINARY’

unless explicitly overridden, e.g. when the file is opened. So if these
hypothetical HTML templates are written in ISO-8859-15, you would be
forced to declare this in your program.

In any case, I’m used to having my data treated as binary unless I
explicitly ask otherwise. e.g.

$ echo “ÃŸÃŸÃŸ” | wc
1 1 7
$ echo “ÃŸÃŸÃŸ” | wc -m
4

[Ubuntu Hardy, default setup with LANG=en_GB.UTF-8]

Can you list what’s not yet covered in my blog series?

I’ve posted a bunch of lists before. Every time I try out some feature,
because it’s undocumented, the test turns up more questions than it
answers. Maybe I really should go ahead and document it all, but that
would be a very large project.

Trying out in irb used to be a good way to test ruby, but that’s no good
in ruby 1.9 because it’s not consistent with script behaviour. For
example:

$ irb19
irb(main):001:0> “foo”.encoding
=> #Encoding:US-ASCII
irb(main):002:0> /foo/.encoding
=> #Encoding:US-ASCII
irb(main):003:0> “fooÃŸ”.encoding
=> #Encoding:UTF-8
irb(main):004:0> /fooÃŸ/.encoding
=> #Encoding:UTF-8

Now try running this program:

p “foo”.encoding
p /foo/.encoding
p “fooÃŸ”.encoding
p /fooÃŸ/.encoding

It barfs on the multi-byte chars. That’s reasonable in the absence of
knowledge about the source file, so now add an #encoding line:

#encoding: UTF-8
p “foo”.encoding
p /foo/.encoding
p “fooÃŸ”.encoding
p /fooÃŸ/.encoding

and you still get a different answer to IRB. The first string gets an
encoding as UTF-8 instead of US-ASCII; and yet the /foo/ regexp gets an
encoding of US-ASCII in both cases.

This is compounded by the hidden state which remembers whether a
particular string is all 7-bit characters or not. That is, although
“foo” and “fooÃŸ” are both marked as having identical encoding UTF-8,
they are actually treated differently by the encoding rules. You have
to test using the #ascii_only? method. And yet a regexp literal
apparently follows a different rule. Except when you are in IRB.

It means that I think your comments are doing harm to the 1.9
migration and I can’t find the good you are doing to balance that.

I don’t think what I’m saying would stop any library author from
modifying their library to work with 1.9 if they so wish. They have to
make up their own minds.

I believe the worst long-term problems are likely to be C extensions. I
have seen no hints at all for C extension writers on how to handle
strings properly (especially the hidden ascii_only? state) so I believe
these are likely to have obscure bugs for some time.

Regards,

Brian.

bodikp · April 16, 2009, 10:12pm

On Apr 16, 2009, at 3:07 PM, James G. wrote:

one, plus some more. I’m sure trying to help you.
I meant “had now addressed…”

James Edward G. II

bodikp · April 16, 2009, 10:15pm

James G. wrote:

But you bundled those files. You know the encoding much better than
Ruby. Is it really too much to ask for?

html = File.read(“my_template.html”, external_encoding: “UTF-8”)

Sure, if you remember everywhere this is needed. If you don’t, then
your program will work fine, and pass all your tests, until you run it
somewhere else and it dies.

bodikp · April 16, 2009, 10:07pm

On Apr 16, 2009, at 2:44 PM, Brian C. wrote:

James G. wrote:

Consider an app which is bundled with HTML templates, which the app
reads using File.read(). The templates happen to be written using,
say,
UTF-8. It all works fine on my machine, and passes all tests.
However it
barfs when run on someone else’s machine, because their environment
variables are different.

But you bundled those files. You know the encoding much better than
Ruby. Is it really too much to ask for?

html = File.read(“my_template.html”, external_encoding: “UTF-8”)

That’s more correct than any magic behavior would be and self-
documenting to boot.

File#read returns a string with encoding=‘BINARY’

File.binread() was added for exactly this purpose.

Can you list what’s not yet covered in my blog series?

I’ve posted a bunch of lists before.

Yeah, I’ve read those. I responded to your last one yesterday telling
you that I had no addressed all the concerns I saw in that one, plus
some more. I’m sure trying to help you.

I guess it’s time for a new list of what you’re still missing…

Trying out in irb used to be a good way to test ruby, but that’s no
good
in ruby 1.9 because it’s not consistent with script behaviour.

While I agree that IRb may need some more integration, there has
always been some minor differences between how code runs in it and how
it runs in a real Ruby script. I don’t think this means 1.9 isn’t
read for the masses.

James Edward G. II

bodikp · April 16, 2009, 10:23pm

On Apr 16, 2009, at 3:15 PM, Brian C. wrote:

James G. wrote:

But you bundled those files. You know the encoding much better than
Ruby. Is it really too much to ask for?

html = File.read(“my_template.html”, external_encoding: “UTF-8”)

Sure, if you remember everywhere this is needed. If you don’t, then
your program will work fine, and pass all your tests, until you run it
somewhere else and it dies.

Well, I definitely don’t think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn’t cross-
platform and I love fork().

James Edward G. II

bodikp · April 17, 2009, 2:27am

On Thursday 16 April 2009 15:23:05 James G. wrote:

Well, I definitely don’t think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn’t cross-
platform and I love fork().

Indeed, and there are win32-specific things on Windows. Even something
as
simple as pathnames isn’t universal unless you always use FIle.join –
or
better yet, Pathname. How often do you do that, instead of just:

open ‘foo/bar.txt’

This is a weak example, now that Windows supports / as well as \ as a
directory delimiter, but I think I’ve made my point. Even Java programs
have
platform-specific quirks, and this one is quite avoidable.

Given that most other software on a given system (including Perl) will
obey a
default encoding, unless it has a specific reason to believe otherwise
(like a
byte-order mark), I think it’s reasonable for Ruby to do the same. Your
suggestion to default to UTF8 really only makes sense on English systems
(where encoding is likely to be set to that anyway) – and even that
doesn’t
save you from having to specify binary for binary files.

For that matter, if you’ve written all your tests, and they pass on one
system, and fail on another, your tests are working as designed – in
this
case, exposing a platform-specific bug, either in your program or the
interpreter.

bodikp · April 17, 2009, 8:15am

James G. wrote:

Well, I definitely don’t think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn’t cross-
platform and I love fork().

But it’s not a different “platform”. Someone could be running exactly
the same version of Ruby under exactly the same operating system and
version, but with different localisation the program will break.

That’s how this thread started: the bemused OP wrote

| But, when my colleagues
| run this same script, they get the error I display below. I’ve literally
| copied the entire directory structure of my c:\ruby setup to their PCs,