DRY gsub

Josselin · September 25, 2007, 10:39pm

On 1/12/07, Josselin [email protected] wrote:

I wrote the following ruby statements… I get the result I need , I
tried to DRY it for 2 hours without being successfull ,
  d = d.gsub(/\r\n/,' ')   # get rid of carriage return
  d = d.gsub(/;/,' ')  # replace column by space
  d = d.gsub(/,/,' ') # replace comma by space
  a  = d.split(' ')   # split into component , space as divider
tfyl

How about a = d.gsub(/\r\n|;|,/,’ ‘).split(’ ') ?

Josselin · September 25, 2007, 10:41pm

Josselin wrote:

Joss

I would probably go with

a = d.chomp.split(/[\s,;]/)

Best regards,
Henrik S.

Josselin · September 25, 2007, 10:41pm

On 1/12/07, Josselin [email protected] wrote:

Joss

I know you got lots of answers but what about

a = d.gsub(/;|,/," ").split

If I am not mistaken it will work on Unix too.

HTH
Robert

Josselin · September 25, 2007, 10:42pm

On Jan 12, 2007, at 11:35 AM, Robert D. wrote:

tfyl

Joss

I know you got lots of answers but what about

a = d.gsub(/;|,/," ").split

No need for a Regexp there:

a = d.tr(";,", " ").split

James Edward G. II

Josselin · September 25, 2007, 10:38pm

[email protected] wrote:

a = d.split(/(?:\r\n|[;, ])/)

Hope that helps.

Out of interest, what does the ?: do in there? I’ve googled, etc, honest!

It says, “Please don’t capture the stuff in parentheses here, because
that changes what split returns”.

irb(main):001:0> s = “a b,c”
=> “a b,c”
irb(main):002:0> s.split( /( |,)/ )
=> [“a”, " ", “b”, “,”, “c”]
irb(main):003:0> s.split( /(?: |,)/ )
=> [“a”, “b”, “c”]

Josselin · September 25, 2007, 10:43pm

On 1/12/07, James Edward G. II [email protected] wrote:

  d = d.gsub(/,/,' ') # replace comma by space
No need for a Regexp there:

a = d.tr(“;,”, " ").split

James Edward G. II

Nice one ( I thought I got it optimal, naaa)
and faster too of course

robert@PC:~/log/ruby/ML 18:46:00
535/35 > ruby split.rb split.rb
Rehearsal ---------------------------------------------
regex 4.891000 0.000000 4.891000 ( 4.602000)
translate 3.812000 0.000000 3.812000 ( 3.685000)
------------------------------------ total: 8.703000sec

            user     system      total        real

regex 5.016000 0.000000 5.016000 ( 4.669000)
translate 3.859000 0.000000 3.859000 ( 3.805000)

Cheers
Robert

Josselin · September 25, 2007, 10:44pm

On 1/12/07, Josselin [email protected] wrote:

I wrote the following ruby statements… I get the result I need , I
tried to DRY it for 2 hours without being successfull ,
  d = d.gsub(/\r\n/,' ')   # get rid of carriage return
  d = d.gsub(/;/,' ')  # replace column by space
  d = d.gsub(/,/,' ') # replace comma by space
  a  = d.split(' ')   # split into component , space as divider

what about:

a = d.gsub(/\r\n|;|,/,’ ‘).split(’ ')

or

a = d.split(/\r\n|;|,| /)

(not tested, I’m too lazy/busy to write the tests now)

Josselin · September 25, 2007, 10:44pm

Josselin wrote:

Joss

Specific to this example, everyone else is right, and the best way is to
consolidate the regex or simply use a condensed split call. However, in
the general case, you could do this

[ /\r\n/ , /;/ , /,/].inject(d) { |s,reg| s.gsub(reg,’ ‘) }.split(’ ')

Josselin · September 25, 2007, 10:44pm

On 1/12/07, Simon S. [email protected] wrote:

On 1/12/07, [email protected] [email protected] wrote:
[snip]

Out of interest, what does the ?: do in there? I’ve googled, etc, honest!

(?: ) is a non-capturing group

example if you want to match a repeating pattern,
but don’t want the repeating stuff in your output

“abcde xyx xyx xyx abcde”.scan(/(?:xyx ){2,}.*(b.*d)/)
#=> [[“bcd”]]

if you use ( ) then it shows up in the output

“abcde xyx xyx xyx abcde”.scan(/(xyx ){2,}.*(b.*d)/)
#=> [["xyx ", “bcd”]]

Josselin · September 25, 2007, 10:48pm

Phrogz wrote:

James Edward G. II wrote:

a = d.split(/(?:\r\n|[;, ])/)

Way more elegant. Way to see beyond the step-by-step process to the end
goal.

Except that there’s no need for the non-capturing group, so
(simplifying, not golfing):

a = d.split( /\r\n|[;, ]/ )

Unless, of course, you have a string like this:
d = “foo; bar\r\n\r\nwhee”
and you only wanted [ “foo”, “bar”, “whee” ], in which case:
a = d.split(/(?:\r\n|[;, ])+/)

\Man, I’m the king of multiple posting today
\Fark slashies ftw!
\\Back to work

Josselin · September 25, 2007, 10:45pm

On Sat, Jan 13, 2007 at 09:51:47AM +0900, James B. wrote:

“\r\n”,
“;”,
“,”
].map{ |c| Regexp.escape© }.join("|"))

a = d.split(DELIMITERS)

Unless these chunks of code are right next to each other, it may be hard
to know the purpose for the delimiters or what’s driving the split.

The cleaned up version includes the delimiters in an array of individual
strings. Your original complaint was about readability and code
maintenance. While I agree that a long literal Regexp can be hard to
read
and hard to maintain, you can achieve the same efficiency of that Regexp
without sacrificing readability using the solution above. Perhaps the
following would make you happier?

module Whatever
DELIMITERS = [
" ",
“\r\n”,
“;”,
“,”
]

def split_string(str)
@delimiter_regexp ||= Regexp.new(DELIMITERS.map{ |c|
Regexp.escape© }.join("|"))
str.split(@delimiter_regexp)
end
extend self
end

a = Whatever.split_string(d)

(If you want to make it even fancier so you can modify DELIMITERS at
runtime you’ll have to do something clever with hashes.)

If the code above does not fulfill what you were intending, please do
explain why; if I’ve missed the point, I’d like to know it and to try
again
at understanding.

James B.
–Greg

Josselin · September 25, 2007, 10:50pm

On 1/12/07, Phrogz [email protected] wrote:

\Man, I’m the king of multiple posting today
\Fark slashies ftw!
\\Back to work

+1

Josselin · September 25, 2007, 10:49pm

Josselin [email protected] writes:

I wrote the following ruby statements… I get the result I need , I
tried to DRY it for 2 hours without being successfull ,
 d = d.gsub(/\r\n/,' ')   # get rid of carriage return
 d = d.gsub(/;/,' ')  # replace column by space
 d = d.gsub(/,/,' ') # replace comma by space
 a  = d.split(' ')   # split into component , space as divider

What’s wrong with:

a = d.split(/\r\n|[;, ]/)

Or do you need d to be mangled as before?

Although I probably would do something even shorter like this:

a = d.split(/[;,\s]+/)

However, for certain inputs that won’t give exactly the same as your
initial multi-step procedure.

Also, any time you write:

d = d.gsub(…)

You’re probably better off with:

d.gsub!(…)

Josselin · September 25, 2007, 10:52pm

James Edward G. II wrote:

a = d.split(/(?:\r\n|[;, ])/)

Way more elegant. Way to see beyond the step-by-step process to the end
goal.

Josselin · September 25, 2007, 10:52pm

I wrote the following ruby statements… I get the result I need , I
tried to DRY it for 2 hours without being successfull ,
  d = d.gsub(/\r\n/,' ')   # get rid of carriage return
  d = d.gsub(/;/,' ')  # replace column by space
  d = d.gsub(/,/,' ') # replace comma by space
  a  = d.split(' ')   # split into component , space as divider

There’s nothing in these four lines of code that violates the idea of
DRY. There is no repeated code. Multiple calls to the same method
are perfectly OK.

xoa