Redefining word boundaries?


#1

Is it possible to redefine (temporarily) the meaning of \b in RegExp
situations?

Here’s the scenario: I’ve got some data that looks something like,
“ABCD.E”, which I stuff into an array as I encounter each new
instance. Eventually, I will join(", ") these together having
surrounded each of them with single quotes, for use in an SQL “IN
(…)” clause.

At first, I tried doing this with gsub(/\b/, “’”), but found out the
hard way that the period was wreaking havoc with my intended results,
and I got this: ‘ABCD’.‘E’ instead.

My next attempt was to do it by hand, like this: gsub(/^/, “’”).gsub(/
$/, “’”). This worked, but seems a bit like a hack.

So my question is whether there’s any way to temporarily declare that
a period is NOT to be considered a word boundary.

Thanks, in advance,
jmb-d


#2

On 6/17/07, jmb-d removed_email_address@domain.invalid wrote:

My next attempt was to do it by hand, like this: gsub(/^/, “’”).gsub(/
$/, “’”). This worked, but seems a bit like a hack.

Does this give you the proper format?

arr = [“ABCD.E”,“FGHI.J”,“KLMN.O”]
puts arr.map{|x|"’#{x}’"}.join(",")

Harry

A Look into Japanese Ruby List in English
http://www.kakueki.com/


#3

On Jun 16, 9:06 pm, jmb-d removed_email_address@domain.invalid wrote:

Is it possible to redefine (temporarily) the meaning of \b in RegExp
situations?

I think the only way to do this would be to redefine all the
circumstances in which a \b could occur yourself. For instance the 6
cases that I can think of are: ^\w, \W\w, \s\w, \w$, \w\W, \w\s. This
is a pain as you can see here I’ve done four and it’s very ugly:

“ABCD.E foo FG.HI”.gsub(/^(\w)/, “’\1”).gsub(/(\w)$/, “\1’”).gsub(/(\w)(\s)/, “\1’\2”).gsub(/(\s)(\w)/, “\1’\2”)
=> “‘ABCD.E’ ‘foo’ ‘FG.HI’”

Alternatively you could pre and post process by replacing “.” with
something that doesn’t break the boundary “zyzyzy” and then replace
back with “.” when you’re done inserting the ticks:

“ABCD.E foo FG.HI”.gsub(/./, “zyzyzy”).gsub(/\b/, “’”).gsub(/zyzyzy/, “.”)
=> “‘ABCD.E’ ‘foo’ ‘FG.HI’”

Neither are very nice solutions.

  • Byron

#4

On Jun 17, 12:03 am, “Harry K.” removed_email_address@domain.invalid wrote:

Does this give you the proper format?

arr = [“ABCD.E”,“FGHI.J”,“KLMN.O”]
puts arr.map{|x|"’#{x}’"}.join(",")

Harry –

Yes, that’s the result I’m looking for.

Thanks!