Is there a bug in the ActiveRecord method validates_format_o


#1

Hi All,

I think that I might have found a bug in the validates_format_of
method. Below is my code and the test case that threw an error.
Please let me know if there is a bug in my code that I just didn’t
catch.

Here is the code for my User model. As you can see, it has one
attribute called name. I wanted to make sure that the names had to be
at least 6 characters long and consisted only of letters, numbers,
dots (.), underscores (_) and at symbols (@).

class User < ActiveRecord::Base
validates_presence_of :name
validates_uniqueness_of :name
validates_format_of :name, :with => /\A[.\w@]{6,}\z/
end

Here is the code for a unit test that I wrote. This unit test tries
to create a fairly comprehensive set of invalid names by creating
several tests for each ASCII character that shouldn’t be part of a
valid name such as ~ or `. None of the bad names that are created
should be valid.

def test_invalid_name
bad_characters = []
for i in 0…255
bad_characters << i.chr unless i.chr =~ /[.\w@]/
end

bad_names = []
bad_characters.each do |c|
  bad_names << (c + "abcdef")
  bad_names << ("abcdef" + c)
  bad_names << ("abc" + c + "def")
  bad_names << (c + "abc" + c + "def" + c)
end
bad_names.each do |name|
  user = User.new(:name => name, :password =>

“password”, :password_confirmation => “password”)
assert !user.valid?
assert user.errors.invalid?(:name), “Name:” + name
end
end

When I ran this test, it failed for bad characters that are extended
ASCII characters (i.e., characters that have an ASCII value between
128 and 255).

For example, user.valid? had a value of true when the name was
Çabcdef The ASCII value of Ç is 128. User.valid? should have had a
value of false because the name doesn’t match the regex that I used in
the validates_format_of method call. In fact, the result of user.name
=~ /\A[.\w@]{6,}\z/ is false.

Can anyone tell me if there is a bug in my code or not? Any help
would be appreciated!


#2

From the documentation on Ruby regular expressions, I find:

/[single character]
Set the character encoding, character should be one of ‘neus’: none,
EUC, UTF-8 or SJIS.

So, perhaps what you’re looking for is of the syntax:

/expr/u

See if it works.


#3

Hi,

Thanks for the advice. I actually figured out how to make it work. I
changed the regex for the validates_format_of code.

Here’s the original code:
validates_format_of :name, :with => /\A[.\w@]{6,}\z/

Here’s the new code:
validates_format_of :name, :with => /\A[A-Za-z0-9_.@]{6,}\z/

When I used the new code, all tests passed. It’s really strange
though because [\w] is supposed to equal [A-Za-z0-9_]


#4

Thanks for the info! I think that there must be a problem with the
UnitTest code then. I tried to force all RoR environments to use a
$KCODE value of “n” by setting that value in the environment.rb file.
But the tests still fail when I use [\w@.]. In my particular case,
it’s not a big deal. But it is sort of annoying to think that the
test harness may have some bugs in it since it’s supposed to help you
catch bugs in your own code!


#5

Not when $KCODE == “UTF8”, your tests must be running in a different
environment somehow…

irb(main):001:0> $KCODE = “u”
=> “u”
irb(main):002:0> (0…255).select { |c| c.chr =~ /[\w@.]/ }.map { |c|
c.chr }
=> [".", “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “@”, “A”,
“B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”, “N”, “O”,
“P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”, “_”, “a”, “b”,
“c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”, “n”, “o”, “p”,
“q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”, “\302”, “\303”,
“\304”, “\305”, “\306”, “\307”, “\310”, “\311”, “\312”, “\313”,
“\314”, “\315”, “\316”, “\317”, “\320”, “\321”, “\322”, “\323”,
“\324”, “\325”, “\326”, “\327”, “\330”, “\331”, “\332”, “\333”,
“\334”, “\335”, “\336”, “\337”, “\340”, “\341”, “\342”, “\343”,
“\344”, “\345”, “\346”, “\347”, “\350”, “\351”, “\352”, “\353”,
“\354”, “\355”, “\356”, “\357”, “\360”, “\361”, “\362”, “\363”,
“\364”, “\365”, “\366”, “\367”, “\370”, “\371”, “\372”, “\373”,
“\374”, “\375”]