Forum: Ruby irb and ruby giving different results

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
R. K. (Guest)
on 2008-11-05 05:03
in IRB,
ASCII     = (0..255).map{|c| c.chr }
PRINTABLE = ASCII.grep(/[[:print:]]/)
PRINTABLE.length
>>> 191

However, inside the ruby program PRINTABLE.length only gives 95 !! ???

#!/opt/local/bin/ruby
 ASCII     = (0..255).map{|c| c.chr }
 puts(ASCII.length)
 PRINTABLE = ASCII.grep(/[[:print:]]/)
 puts(PRINTABLE.length)
# -> 95 instead of 191

(Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
Both use /opt/local/bin/ruby.

Why this difference? I ran irb with -f (so irbrc would not be loaded and
still got the same result, so its not some require that is causing the
difference).

p.s. sorry for cross-posting from roguelike thread -- this is getting
lost there.
Patrick D. (Guest)
on 2008-11-05 14:50
(Received via mailing list)
This won't help much, but when I executed:

>
> ASCII     = (0..255).map{|c| c.chr }
> PRINTABLE = ASCII.grep(/[[:print:]]/)
> PRINTABLE.length
> >>> 191
>

in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.

What were the 191 characters displayed when computed the PRINTABLE
expression?

As a totally random theory, I wonder if [[:print:]] might take into
account
what device is attached to stdout and recognize that your terminal is
capable of and use that to decide what is printable or not.

It would be quite surprising (and, perhaps unfortunate), if that's
what's
going on, but it might explain what you saw.

A slightly more plausible explanation might be that [[:print:]] alters
its
behavior based on the TERM environment variable.  What is ENV["TERM"] in
the
two cases?

That's all I've got.  I warned you at the beginning that this wouldn't
help
much.

--wpd
R. K. (Guest)
on 2008-11-05 15:10
Patrick D. wrote:
> This won't help much, but when I executed:
>
>>
>> ASCII     = (0..255).map{|c| c.chr }
>> PRINTABLE = ASCII.grep(/[[:print:]]/)
>> PRINTABLE.length
>> >>> 191
>>
>
> in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.
>
> What were the 191 characters displayed when computed the PRINTABLE
> expression?
>
>
> A slightly more plausible explanation might be that [[:print:]] alters
> its
> behavior based on the TERM environment variable.  What is ENV["TERM"] in
> the
> two cases?
>
> That's all I've got.  I warned you at the beginning that this wouldn't
> help
> much.
>
> --wpd

I mentioned that i used the same terminal to verify that it was not a
terminal issue. I tried both out with TERM=screen (my usual), then
xterm-color, xterm-256color and perhaps VT100 and VT200 as well.

One of the characters in the 191 for example is 165 or "\245" which is
the code generated by Alt-A on my MAC OSX (powerpc, 10.5.5, darwin).

(This is when i have *not* enabled "Use alt as meta" - if you dont know
what that is just ignore, its a MAC default).

Here's the dump, since you asked:

irb(main):030:0> PRINTABLE
[" ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-",
".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";",
"<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I",
"J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W",
"X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e",
"f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s",
"t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", "\240", "\241",
"\242", "\243", "\244", "\245", "\246", "\247", "\250", "\251", "\252",
"\253", "\254", "\255", "\256", "\257", "\260", "\261", "\262", "\263",
"\264", "\265", "\266", "\267", "\270", "\271", "\272", "\273", "\274",
"\275", "\276", "\277", "\300", "\301", "\302", "\303", "\304", "\305",
"\306", "\307", "\310", "\311", "\312", "\313", "\314", "\315", "\316",
"\317", "\320", "\321", "\322", "\323", "\324", "\325", "\326", "\327",
"\330", "\331", "\332", "\333", "\334", "\335", "\336", "\337", "\340",
"\341", "\342", "\343", "\344", "\345", "\346", "\347", "\350", "\351",
"\352", "\353", "\354", "\355", "\356", "\357", "\360", "\361", "\362",
"\363", "\364", "\365", "\366", "\367", "\370", "\371", "\372", "\373",
"\374", "\375", "\376", "\377"]
Brian C. (Guest)
on 2008-11-05 16:29
Nit K. wrote:
> in IRB,
> ASCII     = (0..255).map{|c| c.chr }
> PRINTABLE = ASCII.grep(/[[:print:]]/)
> PRINTABLE.length
>>>> 191
>
> However, inside the ruby program PRINTABLE.length only gives 95 !! ???
>
> #!/opt/local/bin/ruby
>  ASCII     = (0..255).map{|c| c.chr }
>  puts(ASCII.length)
>  PRINTABLE = ASCII.grep(/[[:print:]]/)
>  puts(PRINTABLE.length)
> # -> 95 instead of 191
>
> (Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
> Both use /opt/local/bin/ruby.
>
> Why this difference?

FWIW, I get 95 with irb187 under Ubuntu Dapper.

Looking at source code, the [[:print:]] character class uses isascii(c)
&& isprint(c)

man isprint says:

NOTE
       The  details  of  what characters belong into which class depend
on the
       current locale.  For example, isupper() will not recognize an
A-umlaut
       (Ä) as an uppercase letter in the default C locale.

So look at what ENV.grep(/^LC/) shows. You could try setting
ENV['LC_ALL']='C' in irb, or export LC_ALL=C before running it. Or try
'POSIX' instead of 'C'.

Finally, be completely sure that your irb is running the right ruby.
Check RUBY_VERSION within irb.
R. K. (Guest)
on 2008-11-05 16:53
Brian C. wrote:
> FWIW, I get 95 with irb187 under Ubuntu Dapper.
>
> Looking at source code, the [[:print:]] character class uses isascii(c)
> && isprint(c)
>
> man isprint says:
>
> NOTE
>        The  details  of  what characters belong into which class depend
> on the
>        current locale.  For example, isupper() will not recognize an
> A-umlaut
>        (Ä) as an uppercase letter in the default C locale.
>
> So look at what ENV.grep(/^LC/) shows. You could try setting
> ENV['LC_ALL']='C' in irb, or export LC_ALL=C before running it. Or try
> 'POSIX' instead of 'C'.
>
> Finally, be completely sure that your irb is running the right ruby.
> Check RUBY_VERSION within irb.

1.8.7 both.

ENV.grep(/^LC/) show nothing in both irb and ruby
ENV['LC_ALL']='C' 'POSIX' etc has no effect in both

However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
So when i did LC_ALL='C', i get only 95 in both ruby and irb.

Is there any way i get can ruby to also give 191 ?
Tried ENV['LC_ALL']='en_US.UTF-8' at the start of my ruby program but it
had no effect. Anyway, thanks for pointing this out.
Brian C. (Guest)
on 2008-11-05 17:13
Nit K. wrote:
> ENV.grep(/^LC/) show nothing in both irb and ruby

My bad; try

ENV.select{|k,v| k=~/^LC/}

> ENV['LC_ALL']='C' 'POSIX' etc has no effect in both
>
> However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
> So when i did LC_ALL='C', i get only 95 in both ruby and irb.
>
> Is there any way i get can ruby to also give 191 ?

Perhaps then:

  env LC_ALL=en_US.UTF-8 ruby foo.rb

Also, looking through source: it seems that ruby doesn't normally call
setlocale() by itself, but maybe some third-party library which irb is
invoking is doing this for you. "readline" is a likely candidate. So you
could try:

  require 'readline'

in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
same modules in your ruby file.
R. K. (Guest)
on 2008-11-05 17:41
Brian C. wrote:
> Nit K. wrote:
>> ENV.grep(/^LC/) show nothing in both irb and ruby
>
> My bad; try
>
> ENV.select{|k,v| k=~/^LC/}
>
>> ENV['LC_ALL']='C' 'POSIX' etc has no effect in both
>>
>> However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
>> So when i did LC_ALL='C', i get only 95 in both ruby and irb.
>>
>> Is there any way i get can ruby to also give 191 ?
>
> Perhaps then:
>
>   env LC_ALL=en_US.UTF-8 ruby foo.rb
>
> Also, looking through source: it seems that ruby doesn't normally call
> setlocale() by itself, but maybe some third-party library which irb is
> invoking is doing this for you. "readline" is a likely candidate. So you
> could try:
>
>   require 'readline'
>
> in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
> same modules in your ruby file.

Very strange:

1. > ENV.select{|k,v| k=~/^LC/} give en_US.UTF-8 in both irb and ruby. I
get LC_ALL AND LC_CTYPE.

2. >   env LC_ALL=en_US.UTF-8 ruby foo.rb
still gives 95

3. I copied $LOADED_FEATURES, and then tried out (I hope i have this
correct):

["enumerator.so", "e2mmap.rb", "irb/init.rb", "irb/workspace.rb",
"irb/context.rb", "irb/extend-command.rb", "irb/output-method.rb",
"irb/notifier.rb", "irb/slex.rb", "irb/ruby-token.rb",
"irb/ruby-lex.rb", "readline.bundle", "irb/input-method.rb",
"irb/locale.rb", "irb.rb", "irb/completion.rb",
"irb/ext/save-history.rb", "stringio.bundle", "yaml/error.rb",
"syck.bundle", "yaml/ypath.rb", "yaml/basenode.rb", "yaml/syck.rb",
"yaml/tag.rb", "yaml/stream.rb", "yaml/constants.rb", "rational.rb",
"date/format.rb", "date.rb", "yaml/rubytypes.rb", "yaml/types.rb",
"yaml.rb"].each do |rr|

    require "#{rr}"
end

I still get 95.
Brian C. (Guest)
on 2008-11-08 00:12
Nit K. wrote:
> I still get 95.

Possibly readline isn't calling setlocale until you actually
invoke/initialise the library.

Here's an alternative test. Install the RubyInline gem, and then stick
this in front of your test program:

require 'rubygems'
require 'inline'

class MyTest

  inline do |builder|
    builder.include '<locale.h>'
    builder.c "
    void set_locale(void) {
       setlocale(LC_ALL, 0);
    }"

  end
end

MyTest.new.set_locale

If that works, you can remove the dependency on the LC_ALL environment
variable by changing to: setlocale(LC_ALL, "en_US.UTF-8"); or whatever.

However, this dependence on the C stdlib's half-baked idea of "locale"
is very hairy. I understand why Ruby doesn't call setlocale() normally -
it means that at least the normal behaviour is (a) sane, and (b) not
affected randomly by global environment variable settings.

To be honest, if you want a character class which always matches 0x20 to
0x7e and 0xa0 to 0xff, then you might as well just say so directly:

    [\x20-\x7e\xa0-\xff]
This topic is locked and can not be replied to.