Formatting a long regex: can a character class [] be split over lines?

Hello,
i am wandering if it is possible to split a character class ([…]) in
Ruby regex over multiple lines.

I know that the /x option allows to ignore whitespace, so i can write :

email_format = /\A(
                  [A-Za-z\d\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+
                  \.)*
                  [A-Za-z\d\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+
                  @([a-z\d\-]+\.)+[a-z\d\-]+\z/x

However, if i try to split inside a character class:

name_format = /\A[A-Za-z\d
                  \!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]\z/x

i get the warning:

warning: character class has duplicated range

(apparently it is about the space character being included multiple
times inside []).
I want the space and newlines to be disregarded inside [] to format it
over multiple lines, is this possible?

Thanks,

Alexey.

http://es.w3support.net/index.php?db=so&id=150095


Jose Calderon-Celis

2011/5/1 Alexey M. [email protected]

Alexey M. wrote in post #996071:

Hello,
i am wandering if it is possible to split a character class ([…]) in
Ruby regex over multiple lines.

I know that the /x option allows to ignore whitespace, so i can write :

email_format = /\A(
                  [A-Za-z\d\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+
                  \.)*
                  [A-Za-z\d\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+
                  @([a-z\d\-]+\.)+[a-z\d\-]+\z/x

However, if i try to split inside a character class:

name_format = /\A[A-Za-z\d
                  \!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]\z/x

i get the warning:

warning: character class has duplicated range

(apparently it is about the space character being included multiple
times inside []).

I don’t get that warning with ruby 1.9.2.

I want the space and newlines to be disregarded inside [] to format it
over multiple lines, is this possible?

Thanks,

Alexey.

  1. Never write a regex with thousands of escapes. Are you aware that
    inside a character class, the special regex characters lose their
    special meaning?

  2. Break up long regexes into smaller pieces.

my_char_class = ‘[A-Za-z#\d!#$%&’*±/=?^_`{|}~]’

There are actually delimiters for the string (or a regex) that are not
part of the character class, e.g. a period or the @ symbol. However,
delimiters like that are too confusing, so I just escaped the single
quote mark inside the string. One escape is all that’s needed to
properly form the string. If needed, which is not the case here, you
could also call Regexp.escape() on the string.

my_regex = /
\A
#{my_char_class}
[.]
#{my_char_class}
\z
/x

if my_regex.match “?./”
puts ‘yes’
end

–output:–
yes

You could also use a here doc and avoid having to escape any characters
inside the string:

str = <<‘LOTS_OF_SYMBOLS’
[A-Za-z\d!#$%&’*±/=?^_`{|}~]
LOTS_OF_SYMBOLS

puts str.chomp

–output:–
[A-Za-z\d!#$%&’*±/=?^_`{|}~]

It’s also possible to escape a newline:

name_format = /\A[A-Za-z\d
!#$%&’*+-/=?^_`{|}~]\z/x

p name_format

–output:–
/\A[A-Za-z\d!#$%&’*±/=?^_`{|}~]\z/x

…but then you can’t indent the second line or else your regex will
contain a bunch of spaces.