Irb and ruby giving different results


#1

in IRB,
ASCII = (0…255).map{|c| c.chr }
PRINTABLE = ASCII.grep(/[[:print:]]/)
PRINTABLE.length

191

However, inside the ruby program PRINTABLE.length only gives 95 !! ???

#!/opt/local/bin/ruby
ASCII = (0…255).map{|c| c.chr }
puts(ASCII.length)
PRINTABLE = ASCII.grep(/[[:print:]]/)
puts(PRINTABLE.length)

-> 95 instead of 191

(Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
Both use /opt/local/bin/ruby.

Why this difference? I ran irb with -f (so irbrc would not be loaded and
still got the same result, so its not some require that is causing the
difference).

p.s. sorry for cross-posting from roguelike thread – this is getting
lost there.


#2

This won’t help much, but when I executed:

ASCII = (0…255).map{|c| c.chr }
PRINTABLE = ASCII.grep(/[[:print:]]/)
PRINTABLE.length

191

in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.

What were the 191 characters displayed when computed the PRINTABLE
expression?

As a totally random theory, I wonder if [[:print:]] might take into
account
what device is attached to stdout and recognize that your terminal is
capable of and use that to decide what is printable or not.

It would be quite surprising (and, perhaps unfortunate), if that’s
what’s
going on, but it might explain what you saw.

A slightly more plausible explanation might be that [[:print:]] alters
its
behavior based on the TERM environment variable. What is ENV[“TERM”] in
the
two cases?

That’s all I’ve got. I warned you at the beginning that this wouldn’t
help
much.

–wpd


#3

Patrick D. wrote:

This won’t help much, but when I executed:

ASCII = (0…255).map{|c| c.chr }
PRINTABLE = ASCII.grep(/[[:print:]]/)
PRINTABLE.length

191

in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.

What were the 191 characters displayed when computed the PRINTABLE
expression?

A slightly more plausible explanation might be that [[:print:]] alters
its
behavior based on the TERM environment variable. What is ENV[“TERM”] in
the
two cases?

That’s all I’ve got. I warned you at the beginning that this wouldn’t
help
much.

–wpd

I mentioned that i used the same terminal to verify that it was not a
terminal issue. I tried both out with TERM=screen (my usual), then
xterm-color, xterm-256color and perhaps VT100 and VT200 as well.

One of the characters in the 191 for example is 165 or “\245” which is
the code generated by Alt-A on my MAC OSX (powerpc, 10.5.5, darwin).

(This is when i have not enabled “Use alt as meta” - if you dont know
what that is just ignore, its a MAC default).

Here’s the dump, since you asked:

irb(main):030:0> PRINTABLE
[" “, “!”, “””, “#”, “$”, “%”, “&”, “’”, “(”, “)”, “*”, “+”, “,”, “-”,
“.”, “/”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “:”, “;”,
“<”, “=”, “>”, “?”, “@”, “A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”,
“J”, “K”, “L”, “M”, “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”,
“X”, “Y”, “Z”, “[”, “\”, “]”, “^”, “_”, “`”, “a”, “b”, “c”, “d”, “e”,
“f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”, “n”, “o”, “p”, “q”, “r”, “s”,
“t”, “u”, “v”, “w”, “x”, “y”, “z”, “{”, “|”, “}”, “~”, “\240”, “\241”,
“\242”, “\243”, “\244”, “\245”, “\246”, “\247”, “\250”, “\251”, “\252”,
“\253”, “\254”, “\255”, “\256”, “\257”, “\260”, “\261”, “\262”, “\263”,
“\264”, “\265”, “\266”, “\267”, “\270”, “\271”, “\272”, “\273”, “\274”,
“\275”, “\276”, “\277”, “\300”, “\301”, “\302”, “\303”, “\304”, “\305”,
“\306”, “\307”, “\310”, “\311”, “\312”, “\313”, “\314”, “\315”, “\316”,
“\317”, “\320”, “\321”, “\322”, “\323”, “\324”, “\325”, “\326”, “\327”,
“\330”, “\331”, “\332”, “\333”, “\334”, “\335”, “\336”, “\337”, “\340”,
“\341”, “\342”, “\343”, “\344”, “\345”, “\346”, “\347”, “\350”, “\351”,
“\352”, “\353”, “\354”, “\355”, “\356”, “\357”, “\360”, “\361”, “\362”,
“\363”, “\364”, “\365”, “\366”, “\367”, “\370”, “\371”, “\372”, “\373”,
“\374”, “\375”, “\376”, “\377”]


#4

Nit K. wrote:

in IRB,
ASCII = (0…255).map{|c| c.chr }
PRINTABLE = ASCII.grep(/[[:print:]]/)
PRINTABLE.length

191

However, inside the ruby program PRINTABLE.length only gives 95 !! ???

#!/opt/local/bin/ruby
ASCII = (0…255).map{|c| c.chr }
puts(ASCII.length)
PRINTABLE = ASCII.grep(/[[:print:]]/)
puts(PRINTABLE.length)

-> 95 instead of 191

(Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
Both use /opt/local/bin/ruby.

Why this difference?

FWIW, I get 95 with irb187 under Ubuntu Dapper.

Looking at source code, the [[:print:]] character class uses isascii©
&& isprint©

man isprint says:

NOTE
The details of what characters belong into which class depend
on the
current locale. For example, isupper() will not recognize an
A-umlaut
(Ä) as an uppercase letter in the default C locale.

So look at what ENV.grep(/^LC/) shows. You could try setting
ENV[‘LC_ALL’]=‘C’ in irb, or export LC_ALL=C before running it. Or try
‘POSIX’ instead of ‘C’.

Finally, be completely sure that your irb is running the right ruby.
Check RUBY_VERSION within irb.


#5

Brian C. wrote:

FWIW, I get 95 with irb187 under Ubuntu Dapper.

Looking at source code, the [[:print:]] character class uses isascii©
&& isprint©

man isprint says:

NOTE
The details of what characters belong into which class depend
on the
current locale. For example, isupper() will not recognize an
A-umlaut
(Ä) as an uppercase letter in the default C locale.

So look at what ENV.grep(/^LC/) shows. You could try setting
ENV[‘LC_ALL’]=‘C’ in irb, or export LC_ALL=C before running it. Or try
‘POSIX’ instead of ‘C’.

Finally, be completely sure that your irb is running the right ruby.
Check RUBY_VERSION within irb.

1.8.7 both.

ENV.grep(/^LC/) show nothing in both irb and ruby
ENV[‘LC_ALL’]=‘C’ ‘POSIX’ etc has no effect in both

However, “echo $LC_ALL” on my prompt gives en_US.UTF-8.
So when i did LC_ALL=‘C’, i get only 95 in both ruby and irb.

Is there any way i get can ruby to also give 191 ?
Tried ENV[‘LC_ALL’]=‘en_US.UTF-8’ at the start of my ruby program but it
had no effect. Anyway, thanks for pointing this out.


#6

Nit K. wrote:

ENV.grep(/^LC/) show nothing in both irb and ruby

My bad; try

ENV.select{|k,v| k=~/^LC/}

ENV[‘LC_ALL’]=‘C’ ‘POSIX’ etc has no effect in both

However, “echo $LC_ALL” on my prompt gives en_US.UTF-8.
So when i did LC_ALL=‘C’, i get only 95 in both ruby and irb.

Is there any way i get can ruby to also give 191 ?

Perhaps then:

env LC_ALL=en_US.UTF-8 ruby foo.rb

Also, looking through source: it seems that ruby doesn’t normally call
setlocale() by itself, but maybe some third-party library which irb is
invoking is doing this for you. “readline” is a likely candidate. So you
could try:

require ‘readline’

in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
same modules in your ruby file.


#7

Brian C. wrote:

Nit K. wrote:

ENV.grep(/^LC/) show nothing in both irb and ruby

My bad; try

ENV.select{|k,v| k=~/^LC/}

ENV[‘LC_ALL’]=‘C’ ‘POSIX’ etc has no effect in both

However, “echo $LC_ALL” on my prompt gives en_US.UTF-8.
So when i did LC_ALL=‘C’, i get only 95 in both ruby and irb.

Is there any way i get can ruby to also give 191 ?

Perhaps then:

env LC_ALL=en_US.UTF-8 ruby foo.rb

Also, looking through source: it seems that ruby doesn’t normally call
setlocale() by itself, but maybe some third-party library which irb is
invoking is doing this for you. “readline” is a likely candidate. So you
could try:

require ‘readline’

in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
same modules in your ruby file.

Very strange:

  1. ENV.select{|k,v| k=~/^LC/} give en_US.UTF-8 in both irb and ruby. I
    get LC_ALL AND LC_CTYPE.

  2. env LC_ALL=en_US.UTF-8 ruby foo.rb
    still gives 95

  3. I copied $LOADED_FEATURES, and then tried out (I hope i have this
    correct):

[“enumerator.so”, “e2mmap.rb”, “irb/init.rb”, “irb/workspace.rb”,
“irb/context.rb”, “irb/extend-command.rb”, “irb/output-method.rb”,
“irb/notifier.rb”, “irb/slex.rb”, “irb/ruby-token.rb”,
“irb/ruby-lex.rb”, “readline.bundle”, “irb/input-method.rb”,
“irb/locale.rb”, “irb.rb”, “irb/completion.rb”,
“irb/ext/save-history.rb”, “stringio.bundle”, “yaml/error.rb”,
“syck.bundle”, “yaml/ypath.rb”, “yaml/basenode.rb”, “yaml/syck.rb”,
“yaml/tag.rb”, “yaml/stream.rb”, “yaml/constants.rb”, “rational.rb”,
“date/format.rb”, “date.rb”, “yaml/rubytypes.rb”, “yaml/types.rb”,
“yaml.rb”].each do |rr|

require "#{rr}"

end

I still get 95.


#8

Nit K. wrote:

I still get 95.

Possibly readline isn’t calling setlocale until you actually
invoke/initialise the library.

Here’s an alternative test. Install the RubyInline gem, and then stick
this in front of your test program:

require ‘rubygems’
require ‘inline’

class MyTest

inline do |builder|
builder.include ‘<locale.h>’
builder.c "
void set_locale(void) {
setlocale(LC_ALL, 0);
}"

end
end

MyTest.new.set_locale

If that works, you can remove the dependency on the LC_ALL environment
variable by changing to: setlocale(LC_ALL, “en_US.UTF-8”); or whatever.

However, this dependence on the C stdlib’s half-baked idea of “locale”
is very hairy. I understand why Ruby doesn’t call setlocale() normally -
it means that at least the normal behaviour is (a) sane, and (b) not
affected randomly by global environment variable settings.

To be honest, if you want a character class which always matches 0x20 to
0x7e and 0xa0 to 0xff, then you might as well just say so directly:

[\x20-\x7e\xa0-\xff]