Forum: Ruby DRY gsub...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Josselin (Guest)
on 2007-01-19 17:31
(Received via mailing list)
I wrote the following ruby statements..  I get the result I need , I
tried to DRY it for 2 hours without being successfull ,

      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
      d = d.gsub(/;/,' ')  # replace column by space
      d = d.gsub(/,/,' ') # replace comma by space
      a  = d.split(' ')   # split into component , space as divider

tfyl

Joss
Josselin (Guest)
on 2007-09-26 00:29
(Received via mailing list)
On 2007-01-12 17:11:54 +0100, "Phrogz" <removed_email_address@domain.invalid> 
said:

> method repeatedly but with different parameters is not "repeating
> code more compact (but not necessarily golfing).
Thanks to all of you... as a newbie I try to keep this kind of useful
comment in my mind  DRY vs WET
(it's now engraved...)
unknown (Guest)
on 2007-09-26 00:29
(Received via mailing list)
> a = d.split(/(?:\r\n|[;, ])/)
>
> Hope that helps.

Out of interest, what does the ?: do in there? I've googled, etc,
honest!

Cheers,
  Benjohn
James B. (Guest)
on 2007-09-26 00:29
(Received via mailing list)
Gregory S. wrote:

>
> Cleaned up:

The whole point was *not* to clean it up, but to make obvious what and
why something was happening in the code.

Brevity is the soul of wit, but it can play havoc with code maintenance.

>
> DELIMITERS = Regexp.new([
>   " ",
>   "\r\n",
>   ";",
>   ","
> ].map{ |c| Regexp.escape(c) }.join("|"))
>
> a = d.split(DELIMITERS)
>

Unless these chunks of code are right next to each other, it may be hard
to know the purpose for the delimiters or what's driving the split.
Simon S. (Guest)
on 2007-09-26 00:30
(Received via mailing list)
On 1/12/07, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

(?: )   is a non-capturing group

its not necessary here
"ab;c,xx\r\nyy zz".split(/\r\n|[;, ]/)
#=> ["ab", "c", "xx", "yy", "zz"]
James G. (Guest)
on 2007-09-26 00:30
(Received via mailing list)
On Jan 12, 2007, at 10:00 AM, Josselin wrote:

> I wrote the following ruby statements..  I get the result I need ,
> I tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider

a = d.split(/(?:\r\n|[;, ])/)

Hope that helps.

James Edward G. II
Bruno M. (Guest)
on 2007-09-26 00:32
(Received via mailing list)
Bira a écrit :
>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

Or a = d.split(/\r\n|,|;| /) ?
Gerardo S. Gómez Garrido (Guest)
on 2007-09-26 00:33
(Received via mailing list)
2007/1/12, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid>:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

( )  groups and capture
(?:  )  groups but does not capture

Non-capturing groupings:
http://perldoc.perl.org/perlretut.html

It's irrelevant in this case anyways
James B. (Guest)
on 2007-09-26 00:33
(Received via mailing list)
Phrogz wrote:
> method repeatedly but with different parameters is not "repeating
> yourself".


Looking at this, and some of the suggested alternatives, I can see how
it would get tedious to add more characters to the "replace with space"
set.

The use of compact regular expressions doesn't make the code easier to
read or maintain.

It may be useful to define the set of special characters, then use that
to  drive a string transformation.

REPLACE_WITH_SPACE = %w{
   \r\n
   ;
   ,
}.map{ |c| Regexp.new(c) }

class String
    def swap_to_spaces
      s = self.dupe
      REPLACE_WITH_SPACE.each do |re|
          s.gsub!( re, ' ')
      end
      s
    end
end


a =  d.swap_to_spaces.split( ' ' )



Or something along those lines.

--
James B.

http://www.ruby-doc.org    - Ruby Help & Documentation
http://www.rubystuff.com   - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com  - Playing with Better Toys
Gregory S. (Guest)
on 2007-09-26 00:33
(Received via mailing list)
On Sat, Jan 13, 2007 at 02:04:31AM +0900, James B. wrote:
[...]
> REPLACE_WITH_SPACE = %w{
>      end
>      s
>    end
> end
>
>
> a =  d.swap_to_spaces.split( ' ' )
>
>
>
> Or something along those lines.

Cleaned up:

DELIMITERS = Regexp.new([
  " ",
  "\r\n",
  ";",
  ","
].map{ |c| Regexp.escape(c) }.join("|"))

a = d.split(DELIMITERS)

> James B.
--Greg
Robert K. (Guest)
on 2007-09-26 00:34
(Received via mailing list)
On 12.01.2007 17:26, removed_email_address@domain.invalid wrote:
>> a = d.split(/(?:\r\n|[;, ])/)
>>
>> Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

It's a non capturing group.  You cannot get the characters from it which
is at times more efficient because the RX engine does not need to do the
bookkeeping and storing of the group.

  robert
Phrogz (Guest)
on 2007-09-26 00:35
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

BTW, that is already reasonably DRY, in my opinion. Calling the same
method repeatedly but with different parameters is not "repeating
yourself". It would be WET (hrm...Way Extra Toomuchcode) if you had
something like:

d = d.gsub( /\r\n/, ' ' )
e = e.gsub( /\r\n/, ' ' )
f = f.gsub( /\r\n/, ' ' )
g = g.gsub( /\r\n/, ' ' )
etc.

It's just semantics, but IMO what you're asking for is to make your
code more compact (but not necessarily golfing).
Phrogz (Guest)
on 2007-09-26 00:35
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

d = d.gsub( /\r\n|[;,]/, ' ' ).split
James G. (Guest)
on 2007-09-26 00:36
(Received via mailing list)
On Jan 12, 2007, at 10:26 AM, removed_email_address@domain.invalid wrote:

>> a = d.split(/(?:\r\n|[;, ])/)
>>
>> Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc,
> honest!

(?: ... ) is just ( ... ) without capturing the contents into a
variable.

James Edward G. II
Jan S. (Guest)
on 2007-09-26 00:36
(Received via mailing list)
On 1/12/07, Daniel M. <removed_email_address@domain.invalid> wrote:
>
> However, for certain inputs that won't give exactly the same as your
> initial multi-step procedure.
>
> Also, any time you write:
>
>   d = d.gsub(...)
>
> You're probably better off with:
>
>   d.gsub!(...)

...unless you don't want to modify the original object passed as
argument (I'm not sure if this is proper English construct ;-) I mean
in that case the caller will see the modifications as well)
Henrik S. (Guest)
on 2007-09-26 00:37
(Received via mailing list)
Josselin wrote:
> Joss
>

I would probably go with

a = d.chop.split(/[\s,;]/)

Best regards,
   Henrik S.
James G. (Guest)
on 2007-09-26 00:37
(Received via mailing list)
On Jan 12, 2007, at 10:30 AM, Phrogz wrote:

>
> a = d.split( /\r\n|[;, ]/ )

You're right, it's not needed.  I'm just in the habit of always
surrounding | options of a regex with grouping to control their
scope.  I guess I've been bitten by those matching issues one time
too many.

James Edward G. II
Alex Y. (Guest)
on 2007-09-26 00:38
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider
>
a = d.split(/(\r\n)|([;, ])/)
Simon S. (Guest)
on 2007-09-26 00:38
(Received via mailing list)
On 1/12/07, Bira <removed_email_address@domain.invalid> wrote:
>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

If you don't care about \r then maybe this

"ab;c,xx\r\nyy zz".scan(/[^ ;,\n]+/)
#=> ["ab", "c", "xx\r", "yy", "zz"]
Josselin (Guest)
on 2007-09-26 00:38
(Received via mailing list)
On 2007-01-12 17:05:34 +0100, Bira <removed_email_address@domain.invalid> said:

>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

thanks .. did not notice that I could use the  '|' inside the gsub.....
 get stuck to [. and...]
Phrogz (Guest)
on 2007-09-26 00:38
(Received via mailing list)
removed_email_address@domain.invalid wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

It says, "Please don't capture the stuff in parentheses here, because
that changes what split returns".

irb(main):001:0> s = "a b,c"
=> "a b,c"
irb(main):002:0> s.split( /( |,)/ )
=> ["a", " ", "b", ",", "c"]
irb(main):003:0> s.split( /(?: |,)/ )
=> ["a", "b", "c"]
Bira (Guest)
on 2007-09-26 00:39
(Received via mailing list)
On 1/12/07, Josselin <removed_email_address@domain.invalid> wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider
>
> tfyl

How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?
Henrik S. (Guest)
on 2007-09-26 00:41
(Received via mailing list)
Josselin wrote:
> Joss
>
I would probably go with

a = d.chomp.split(/[\s,;]/)

Best regards,
   Henrik S.
Robert D. (Guest)
on 2007-09-26 00:41
(Received via mailing list)
On 1/12/07, Josselin <removed_email_address@domain.invalid> wrote:
>
> Joss


I know you got lots of answers but what about

a = d.gsub(/;|,/," ").split

If I am not mistaken it will work on Unix too.



HTH
Robert
James G. (Guest)
on 2007-09-26 00:42
(Received via mailing list)
On Jan 12, 2007, at 11:35 AM, Robert D. wrote:

>> tfyl
>>
>> Joss
>
>
> I know you got lots of answers but what about
>
> a = d.gsub(/;|,/," ").split

No need for a Regexp there:

a = d.tr(";,", " ").split

James Edward G. II
Robert D. (Guest)
on 2007-09-26 00:43
(Received via mailing list)
On 1/12/07, James Edward G. II <removed_email_address@domain.invalid> wrote:
> >>       d = d.gsub(/,/,' ') # replace comma by space
>
> No need for a Regexp there:
>
> a = d.tr(";,", " ").split
>
> James Edward G. II
>
> Nice one ( I thought I got it optimal, naaa)
and faster too of course :)

robert@PC:~/log/ruby/ML 18:46:00
535/35 > ruby split.rb split.rb
Rehearsal ---------------------------------------------
regex       4.891000   0.000000   4.891000 (  4.602000)
translate   3.812000   0.000000   3.812000 (  3.685000)
------------------------------------ total: 8.703000sec

                user     system      total        real
regex       5.016000   0.000000   5.016000 (  4.669000)
translate   3.859000   0.000000   3.859000 (  3.805000)

Cheers
Robert
Mike H. (Guest)
on 2007-09-26 00:44
(Received via mailing list)
Josselin wrote:

> Joss
>
>
>
Specific to this example, everyone else is right, and the best way is to
consolidate the regex or simply use a condensed split call.  However, in
the general case, you could do this

[ /\r\n/ , /;/ , /,/].inject(d) { |s,reg| s.gsub(reg,' ') }.split(' ')
Jan S. (Guest)
on 2007-09-26 00:44
(Received via mailing list)
On 1/12/07, Josselin <removed_email_address@domain.invalid> wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

what about:

a = d.gsub(/\r\n|;|,/,' ').split(' ')

or

a = d.split(/\r\n|;|,| /)

(not tested, I'm too lazy/busy to write the tests now)
Simon S. (Guest)
on 2007-09-26 00:44
(Received via mailing list)
On 1/12/07, Simon S. <removed_email_address@domain.invalid> wrote:
> On 1/12/07, removed_email_address@domain.invalid <removed_email_address@domain.invalid> 
wrote:
[snip]
> > Out of interest, what does the ?: do in there? I've googled, etc, honest!
>
> (?: )   is a non-capturing group

example if you want to match a repeating pattern,
but don't want the repeating stuff in your output

"abcde xyx xyx xyx abcde".scan(/(?:xyx ){2,}.*(b.*d)/)
#=> [["bcd"]]



if you use ( ) then it shows up in the output

"abcde xyx xyx xyx abcde".scan(/(xyx ){2,}.*(b.*d)/)
#=> [["xyx ", "bcd"]]
Gregory S. (Guest)
on 2007-09-26 00:45
(Received via mailing list)
On Sat, Jan 13, 2007 at 09:51:47AM +0900, James B. wrote:
> >  "\r\n",
> >  ";",
> >  ","
> >].map{ |c| Regexp.escape(c) }.join("|"))
> >
> >a = d.split(DELIMITERS)
>
> Unless these chunks of code are right next to each other, it may be hard
> to know the purpose for the delimiters or what's driving the split.

The cleaned up version includes the delimiters in an array of individual
strings. Your original complaint was about readability and code
maintenance. While I agree that a long literal Regexp can be hard to
read
and hard to maintain, you can achieve the same efficiency of that Regexp
without sacrificing readability using the solution above. Perhaps the
following would make you happier?

module Whatever
  DELIMITERS = [
    " ",
    "\r\n",
    ";",
    ","
  ]

  def split_string(str)
    @delimiter_regexp ||= Regexp.new(DELIMITERS.map{ |c|
Regexp.escape(c) }.join("|"))
    str.split(@delimiter_regexp)
  end
  extend self
end

a = Whatever.split_string(d)

(If you want to make it even fancier so you can modify DELIMITERS at
runtime you'll have to do something clever with hashes.)

If the code above does not fulfill what you were intending, please do
explain why; if I've missed the point, I'd like to know it and to try
again
at understanding.

> James B.
--Greg
Phrogz (Guest)
on 2007-09-26 00:48
(Received via mailing list)
Phrogz wrote:
> James Edward G. II wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
>
> Way more elegant. Way to see beyond the step-by-step process to the end
> goal.

Except that there's no need for the non-capturing group, so
(simplifying, not golfing):

a = d.split( /\r\n|[;, ]/ )

Unless, of course, you have a string like this:
d = "foo; bar\r\n\r\nwhee"
and you only wanted [ "foo", "bar", "whee" ], in which case:
a = d.split(/(?:\r\n|[;, ])+/)

\Man, I'm the king of multiple posting today
\\Fark slashies ftw!
\\\Back to work
Daniel M. (Guest)
on 2007-09-26 00:49
(Received via mailing list)
Josselin <removed_email_address@domain.invalid> writes:

> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider
>

What's wrong with:

  a = d.split(/\r\n|[;, ]/)

Or do you need d to be mangled as before?

Although I probably would do something even shorter like this:

  a = d.split(/[;,\s]+/)

However, for certain inputs that won't give exactly the same as your
initial multi-step procedure.

Also, any time you write:

  d = d.gsub(...)

You're probably better off with:

  d.gsub!(...)
Jason M. (Guest)
on 2007-09-26 00:50
(Received via mailing list)
On 1/12/07, Phrogz <removed_email_address@domain.invalid> wrote:
>
>
> \Man, I'm the king of multiple posting today
> \\Fark slashies ftw!
> \\\Back to work


+1


:)
Andy L. (Guest)
on 2007-09-26 00:52
(Received via mailing list)
>>
>> I wrote the following ruby statements..  I get the result I need , I
>> tried to DRY it for 2 hours without being successfull ,
>>
>>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>>       d = d.gsub(/;/,' ')  # replace column by space
>>       d = d.gsub(/,/,' ') # replace comma by space
>>       a  = d.split(' ')   # split into component , space as divider

There's nothing in these four lines of code that violates the idea of
DRY.  There is no repeated code.  Multiple calls to the same method
are perfectly OK.

xoa
Phrogz (Guest)
on 2007-09-26 00:52
(Received via mailing list)
James Edward G. II wrote:
> a = d.split(/(?:\r\n|[;, ])/)

Way more elegant. Way to see beyond the step-by-step process to the end
goal.
This topic is locked and can not be replied to.