Regex to match whitespace, but not newline

luislavena · July 10, 2011, 10:06pm

Hi,

How would I write a regex to match whitespace which occurs at the end of
a line, but not to match the newline itself?

i.e.
This should match: “abcdefg \n”
This shouldn’t: “abcdefg\n”

I tried: line.match(/[!\S\n]$/), i.e. match not-not-whitespace or
not-newline, but this didn’t work.

Would be grateful for any help.

jimb · July 10, 2011, 10:26pm

how about this -

a = “abcdefg \n”
b = “abcdefg\n”

c = [a, b]

c.each{|e|
if e =~ / \n/
puts “index #{c.index(e)} matches”
else
puts “index #{c.index(e)} doesn’t match”
end
}

j

jimb · July 10, 2011, 10:31pm

Maybe easier to use grep rather than regexp. Assuming var contains your
two examples:

var.grep(/\s\n/) # returns what you want.

jimb · July 10, 2011, 10:35pm

I had originally written a simple regex using the above method:

do_stuff if line.match(/ \n$/)

But unfortunately this doesn’t pick up on tabs, (e.g. “abcdefg\t\n”
doesn’t match) and I was just wondering if there was a “tidier” way to
do it.

jimb · July 10, 2011, 10:49pm

No probs.

jimb · July 10, 2011, 10:38pm

Ah, the grep method does what I was after. Thanks muchly!

jimb · July 11, 2011, 2:38am

Jim B. wrote in post #1009976:

Hi,

How would I write a regex to match whitespace which occurs at the end of
a line, but not to match the newline itself?

data = [
“abcdefg \n”,
“abcdefg\t\n”,
“abc \ndefg\t\n”,
“abcdefg\n”,
“abcdefg”,
]

data.each do |str|
if str.match(/
\w+ #letter,number,underscore 1 or more times
\s+ #whitespace 1 or more times
\n #newline
\z #end of string
/xms)

puts "#{str.inspect} --> match"

else
puts “#{str.inspect} --> no match”
end
end

–output:–
“abcdefg \n” --> match
“abcdefg\t\n” --> match
“abc \ndefg\t\n” --> match
“abcdefg\n” --> no match
“abcdefg” --> no match

jimb · July 11, 2011, 2:55am

…and for this string:

“abc\t\ndefg\n” --> no match

jimb · July 11, 2011, 3:01am

Jim B. wrote in post #1009976:

Hi,

How would I write a regex to match whitespace which occurs at the end of
a line, but not to match the newline itself?

i.e.
This should match: “abcdefg \n”
This shouldn’t: “abcdefg\n”

Note that you’ve phrased the question wrong. You first state that you
don’t want to match the newline itself, but then you say that “abcdefg
\n” should match. Well, that string has a newline at the end, so the
newline is part of the match. What you seem to be asking is how to
match strings that have whitespace before the newline. In other words,
the end of line pattern has to be:

whitespace newline

To require at least one whitespace you write: \s+

…followed by one newline: \n

…followed by the end of the string: \z

But then you need to consider what you want to happen in this case:

“abcdefg\n\n”

If the pattern you want is actually:

whitespace that is not a newline
newline
end of string

Then, all white space can be written as: [ \t\n]
…so whitespace minus newlines is just: [ \t]

And the pattern becomes: \w+ [ \t]+ \n \z

jimb · July 11, 2011, 3:46am

Jim B. wrote in post #1009976:

I tried: line.match(/[!\S\n]$/), i.e. match not-not-whitespace

not-not-whitespace is equal to whitespace, right? \S is anything but
whitespace, and if you negate that, you just get whitespace, which is
the same as: \s

not-newline, but this didn’t work.

Regexes are made up of two things: special regex
characters such as *, +, \s, [], etc.; and regular characters. The
special
regex characters have special meaning, and therefore do not match
themselves. Regular characters just match
themselves. An exclamation mark is not a special regex character, so it
is a regular character, and if it appears in a regex, it just matches
itself; it does not negate anything.

Would be grateful for any help.

The way to negate pre-defined character groups like \s, \d is to
capitalize them: \S, \D

The way to negate a custom character group is like this: [^acd14!]. The
carrot at the front of the character class negates the group, so the
character class matches any character that is not one of the characters
inside the brackets.

The way to negate a single character is like this: [^a], which matches
any character that is not an ‘a’.

jimb · July 11, 2011, 9:23pm

7stud,
What an excellent reply.
Thank you very much for that.
I read through everything you wrote and implemented your suggestions.
Now I definitely have a robust solution.
Thanks again!