Forum: Ruby DRY gsub...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
6b2144362fffd4f71cca755d4045846f?d=identicon&s=25 Josselin (Guest)
on 2007-01-19 16:31
(Received via mailing list)
I wrote the following ruby statements..  I get the result I need , I
tried to DRY it for 2 hours without being successfull ,

      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
      d = d.gsub(/;/,' ')  # replace column by space
      d = d.gsub(/,/,' ') # replace comma by space
      a  = d.split(' ')   # split into component , space as divider

tfyl

Joss
6b2144362fffd4f71cca755d4045846f?d=identicon&s=25 Josselin (Guest)
on 2007-09-25 22:29
(Received via mailing list)
On 2007-01-12 17:11:54 +0100, "Phrogz" <gavin@refinery.com> said:

> method repeatedly but with different parameters is not "repeating
> code more compact (but not necessarily golfing).
Thanks to all of you... as a newbie I try to keep this kind of useful
comment in my mind  DRY vs WET
(it's now engraved...)
Ffcb418e17cac2873d611c2b8d8d891c?d=identicon&s=25 unknown (Guest)
on 2007-09-25 22:29
(Received via mailing list)
> a = d.split(/(?:\r\n|[;, ])/)
>
> Hope that helps.

Out of interest, what does the ?: do in there? I've googled, etc,
honest!

Cheers,
  Benjohn
Ff9e18f0699bf079f1fc91c8d4506438?d=identicon&s=25 James Britt (Guest)
on 2007-09-25 22:29
(Received via mailing list)
Gregory Seidman wrote:

>
> Cleaned up:

The whole point was *not* to clean it up, but to make obvious what and
why something was happening in the code.

Brevity is the soul of wit, but it can play havoc with code maintenance.

>
> DELIMITERS = Regexp.new([
>   " ",
>   "\r\n",
>   ";",
>   ","
> ].map{ |c| Regexp.escape(c) }.join("|"))
>
> a = d.split(DELIMITERS)
>

Unless these chunks of code are right next to each other, it may be hard
to know the purpose for the delimiters or what's driving the split.
896cfc242a7762467c2a0b2af86598e5?d=identicon&s=25 Simon Strandgaard (Guest)
on 2007-09-25 22:30
(Received via mailing list)
On 1/12/07, benjohn@fysh.org <benjohn@fysh.org> wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

(?: )   is a non-capturing group

its not necessary here
"ab;c,xx\r\nyy zz".split(/\r\n|[;, ]/)
#=> ["ab", "c", "xx", "yy", "zz"]
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-09-25 22:30
(Received via mailing list)
On Jan 12, 2007, at 10:00 AM, Josselin wrote:

> I wrote the following ruby statements..  I get the result I need ,
> I tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider

a = d.split(/(?:\r\n|[;, ])/)

Hope that helps.

James Edward Gray II
Fb4aecfcc7940bfae6f1a471244f51d8?d=identicon&s=25 Bruno Michel (Guest)
on 2007-09-25 22:32
(Received via mailing list)
Bira a écrit :
>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

Or a = d.split(/\r\n|,|;| /) ?
5a601582df3b42b65a5e8353fc9305da?d=identicon&s=25 Gerardo Santana Gómez Garrido (Guest)
on 2007-09-25 22:33
(Received via mailing list)
2007/1/12, benjohn@fysh.org <benjohn@fysh.org>:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

( )  groups and capture
(?:  )  groups but does not capture

Non-capturing groupings:
http://perldoc.perl.org/perlretut.html

It's irrelevant in this case anyways
Ff9e18f0699bf079f1fc91c8d4506438?d=identicon&s=25 James Britt (Guest)
on 2007-09-25 22:33
(Received via mailing list)
Phrogz wrote:
> method repeatedly but with different parameters is not "repeating
> yourself".


Looking at this, and some of the suggested alternatives, I can see how
it would get tedious to add more characters to the "replace with space"
set.

The use of compact regular expressions doesn't make the code easier to
read or maintain.

It may be useful to define the set of special characters, then use that
to  drive a string transformation.

REPLACE_WITH_SPACE = %w{
   \r\n
   ;
   ,
}.map{ |c| Regexp.new(c) }

class String
    def swap_to_spaces
      s = self.dupe
      REPLACE_WITH_SPACE.each do |re|
          s.gsub!( re, ' ')
      end
      s
    end
end


a =  d.swap_to_spaces.split( ' ' )



Or something along those lines.

--
James Britt

http://www.ruby-doc.org    - Ruby Help & Documentation
http://www.rubystuff.com   - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com  - Playing with Better Toys
Bf6862e2a409078e13a3979c00bba1d6?d=identicon&s=25 Gregory Seidman (Guest)
on 2007-09-25 22:33
(Received via mailing list)
On Sat, Jan 13, 2007 at 02:04:31AM +0900, James Britt wrote:
[...]
> REPLACE_WITH_SPACE = %w{
>      end
>      s
>    end
> end
>
>
> a =  d.swap_to_spaces.split( ' ' )
>
>
>
> Or something along those lines.

Cleaned up:

DELIMITERS = Regexp.new([
  " ",
  "\r\n",
  ";",
  ","
].map{ |c| Regexp.escape(c) }.join("|"))

a = d.split(DELIMITERS)

> James Britt
--Greg
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-09-25 22:34
(Received via mailing list)
On 12.01.2007 17:26, benjohn@fysh.org wrote:
>> a = d.split(/(?:\r\n|[;, ])/)
>>
>> Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

It's a non capturing group.  You cannot get the characters from it which
is at times more efficient because the RX engine does not need to do the
bookkeeping and storing of the group.

  robert
Ec9233451f7c6ba37a83388b87a1f565?d=identicon&s=25 Phrogz (Guest)
on 2007-09-25 22:35
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

BTW, that is already reasonably DRY, in my opinion. Calling the same
method repeatedly but with different parameters is not "repeating
yourself". It would be WET (hrm...Way Extra Toomuchcode) if you had
something like:

d = d.gsub( /\r\n/, ' ' )
e = e.gsub( /\r\n/, ' ' )
f = f.gsub( /\r\n/, ' ' )
g = g.gsub( /\r\n/, ' ' )
etc.

It's just semantics, but IMO what you're asking for is to make your
code more compact (but not necessarily golfing).
Ec9233451f7c6ba37a83388b87a1f565?d=identicon&s=25 Phrogz (Guest)
on 2007-09-25 22:35
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

d = d.gsub( /\r\n|[;,]/, ' ' ).split
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-09-25 22:36
(Received via mailing list)
On Jan 12, 2007, at 10:26 AM, benjohn@fysh.org wrote:

>> a = d.split(/(?:\r\n|[;, ])/)
>>
>> Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc,
> honest!

(?: ... ) is just ( ... ) without capturing the contents into a
variable.

James Edward Gray II
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-09-25 22:36
(Received via mailing list)
On 1/12/07, Daniel Martin <martin@snowplow.org> wrote:
>
> However, for certain inputs that won't give exactly the same as your
> initial multi-step procedure.
>
> Also, any time you write:
>
>   d = d.gsub(...)
>
> You're probably better off with:
>
>   d.gsub!(...)

...unless you don't want to modify the original object passed as
argument (I'm not sure if this is proper English construct ;-) I mean
in that case the caller will see the modifications as well)
8ef610a75918fc724bec1155f6d5a5ab?d=identicon&s=25 Henrik Schmidt (Guest)
on 2007-09-25 22:37
(Received via mailing list)
Josselin wrote:
> Joss
>

I would probably go with

a = d.chop.split(/[\s,;]/)

Best regards,
   Henrik Schmidt
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-09-25 22:37
(Received via mailing list)
On Jan 12, 2007, at 10:30 AM, Phrogz wrote:

>
> a = d.split( /\r\n|[;, ]/ )

You're right, it's not needed.  I'm just in the habit of always
surrounding | options of a regex with grouping to control their
scope.  I guess I've been bitten by those matching issues one time
too many.

James Edward Gray II
Ad7805c9fcc1f13efc6ed11251a6c4d2?d=identicon&s=25 Alex Young (regularfry)
on 2007-09-25 22:38
(Received via mailing list)
Josselin wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider
>
a = d.split(/(\r\n)|([;, ])/)
896cfc242a7762467c2a0b2af86598e5?d=identicon&s=25 Simon Strandgaard (Guest)
on 2007-09-25 22:38
(Received via mailing list)
On 1/12/07, Bira <u.alberton@gmail.com> wrote:
>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

If you don't care about \r then maybe this

"ab;c,xx\r\nyy zz".scan(/[^ ;,\n]+/)
#=> ["ab", "c", "xx\r", "yy", "zz"]
6b2144362fffd4f71cca755d4045846f?d=identicon&s=25 Josselin (Guest)
on 2007-09-25 22:38
(Received via mailing list)
On 2007-01-12 17:05:34 +0100, Bira <u.alberton@gmail.com> said:

>
> How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?

thanks .. did not notice that I could use the  '|' inside the gsub.....
 get stuck to [. and...]
Ec9233451f7c6ba37a83388b87a1f565?d=identicon&s=25 Phrogz (Guest)
on 2007-09-25 22:38
(Received via mailing list)
benj...@fysh.org wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
> >
> > Hope that helps.
>
> Out of interest, what does the ?: do in there? I've googled, etc, honest!

It says, "Please don't capture the stuff in parentheses here, because
that changes what split returns".

irb(main):001:0> s = "a b,c"
=> "a b,c"
irb(main):002:0> s.split( /( |,)/ )
=> ["a", " ", "b", ",", "c"]
irb(main):003:0> s.split( /(?: |,)/ )
=> ["a", "b", "c"]
439c401f95ee2fac0be4c1727dd74dea?d=identicon&s=25 Bira (Guest)
on 2007-09-25 22:39
(Received via mailing list)
On 1/12/07, Josselin <josselin@wanadoo.fr> wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider
>
> tfyl

How about a = d.gsub(/\r\n|;|,/,' ').split(' ') ?
8ef610a75918fc724bec1155f6d5a5ab?d=identicon&s=25 Henrik Schmidt (Guest)
on 2007-09-25 22:41
(Received via mailing list)
Josselin wrote:
> Joss
>
I would probably go with

a = d.chomp.split(/[\s,;]/)

Best regards,
   Henrik Schmidt
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-09-25 22:41
(Received via mailing list)
On 1/12/07, Josselin <josselin@wanadoo.fr> wrote:
>
> Joss


I know you got lots of answers but what about

a = d.gsub(/;|,/," ").split

If I am not mistaken it will work on Unix too.



HTH
Robert
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-09-25 22:42
(Received via mailing list)
On Jan 12, 2007, at 11:35 AM, Robert Dober wrote:

>> tfyl
>>
>> Joss
>
>
> I know you got lots of answers but what about
>
> a = d.gsub(/;|,/," ").split

No need for a Regexp there:

a = d.tr(";,", " ").split

James Edward Gray II
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-09-25 22:43
(Received via mailing list)
On 1/12/07, James Edward Gray II <james@grayproductions.net> wrote:
> >>       d = d.gsub(/,/,' ') # replace comma by space
>
> No need for a Regexp there:
>
> a = d.tr(";,", " ").split
>
> James Edward Gray II
>
> Nice one ( I thought I got it optimal, naaa)
and faster too of course :)

robert@PC:~/log/ruby/ML 18:46:00
535/35 > ruby split.rb split.rb
Rehearsal ---------------------------------------------
regex       4.891000   0.000000   4.891000 (  4.602000)
translate   3.812000   0.000000   3.812000 (  3.685000)
------------------------------------ total: 8.703000sec

                user     system      total        real
regex       5.016000   0.000000   5.016000 (  4.669000)
translate   3.859000   0.000000   3.859000 (  3.805000)

Cheers
Robert
97cbca14d17274370cce501bbea7980a?d=identicon&s=25 Mike Harris (gfunk911)
on 2007-09-25 22:44
(Received via mailing list)
Josselin wrote:

> Joss
>
>
>
Specific to this example, everyone else is right, and the best way is to
consolidate the regex or simply use a condensed split call.  However, in
the general case, you could do this

[ /\r\n/ , /;/ , /,/].inject(d) { |s,reg| s.gsub(reg,' ') }.split(' ')
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-09-25 22:44
(Received via mailing list)
On 1/12/07, Josselin <josselin@wanadoo.fr> wrote:
> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>       d = d.gsub(/;/,' ')  # replace column by space
>       d = d.gsub(/,/,' ') # replace comma by space
>       a  = d.split(' ')   # split into component , space as divider

what about:

a = d.gsub(/\r\n|;|,/,' ').split(' ')

or

a = d.split(/\r\n|;|,| /)

(not tested, I'm too lazy/busy to write the tests now)
896cfc242a7762467c2a0b2af86598e5?d=identicon&s=25 Simon Strandgaard (Guest)
on 2007-09-25 22:44
(Received via mailing list)
On 1/12/07, Simon Strandgaard <neoneye@gmail.com> wrote:
> On 1/12/07, benjohn@fysh.org <benjohn@fysh.org> wrote:
[snip]
> > Out of interest, what does the ?: do in there? I've googled, etc, honest!
>
> (?: )   is a non-capturing group

example if you want to match a repeating pattern,
but don't want the repeating stuff in your output

"abcde xyx xyx xyx abcde".scan(/(?:xyx ){2,}.*(b.*d)/)
#=> [["bcd"]]



if you use ( ) then it shows up in the output

"abcde xyx xyx xyx abcde".scan(/(xyx ){2,}.*(b.*d)/)
#=> [["xyx ", "bcd"]]
Bf6862e2a409078e13a3979c00bba1d6?d=identicon&s=25 Gregory Seidman (Guest)
on 2007-09-25 22:45
(Received via mailing list)
On Sat, Jan 13, 2007 at 09:51:47AM +0900, James Britt wrote:
> >  "\r\n",
> >  ";",
> >  ","
> >].map{ |c| Regexp.escape(c) }.join("|"))
> >
> >a = d.split(DELIMITERS)
>
> Unless these chunks of code are right next to each other, it may be hard
> to know the purpose for the delimiters or what's driving the split.

The cleaned up version includes the delimiters in an array of individual
strings. Your original complaint was about readability and code
maintenance. While I agree that a long literal Regexp can be hard to
read
and hard to maintain, you can achieve the same efficiency of that Regexp
without sacrificing readability using the solution above. Perhaps the
following would make you happier?

module Whatever
  DELIMITERS = [
    " ",
    "\r\n",
    ";",
    ","
  ]

  def split_string(str)
    @delimiter_regexp ||= Regexp.new(DELIMITERS.map{ |c|
Regexp.escape(c) }.join("|"))
    str.split(@delimiter_regexp)
  end
  extend self
end

a = Whatever.split_string(d)

(If you want to make it even fancier so you can modify DELIMITERS at
runtime you'll have to do something clever with hashes.)

If the code above does not fulfill what you were intending, please do
explain why; if I've missed the point, I'd like to know it and to try
again
at understanding.

> James Britt
--Greg
Ec9233451f7c6ba37a83388b87a1f565?d=identicon&s=25 Phrogz (Guest)
on 2007-09-25 22:48
(Received via mailing list)
Phrogz wrote:
> James Edward Gray II wrote:
> > a = d.split(/(?:\r\n|[;, ])/)
>
> Way more elegant. Way to see beyond the step-by-step process to the end
> goal.

Except that there's no need for the non-capturing group, so
(simplifying, not golfing):

a = d.split( /\r\n|[;, ]/ )

Unless, of course, you have a string like this:
d = "foo; bar\r\n\r\nwhee"
and you only wanted [ "foo", "bar", "whee" ], in which case:
a = d.split(/(?:\r\n|[;, ])+/)

\Man, I'm the king of multiple posting today
\\Fark slashies ftw!
\\\Back to work
Cf6d0868b2b4c69bac3e6f265a32b6a7?d=identicon&s=25 Daniel Martin (Guest)
on 2007-09-25 22:49
(Received via mailing list)
Josselin <josselin@wanadoo.fr> writes:

> I wrote the following ruby statements..  I get the result I need , I
> tried to DRY it for 2 hours without being successfull ,
>
>      d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>      d = d.gsub(/;/,' ')  # replace column by space
>      d = d.gsub(/,/,' ') # replace comma by space
>      a  = d.split(' ')   # split into component , space as divider
>

What's wrong with:

  a = d.split(/\r\n|[;, ]/)

Or do you need d to be mangled as before?

Although I probably would do something even shorter like this:

  a = d.split(/[;,\s]+/)

However, for certain inputs that won't give exactly the same as your
initial multi-step procedure.

Also, any time you write:

  d = d.gsub(...)

You're probably better off with:

  d.gsub!(...)
35b0b4029fd4387842ec88a8e99d84de?d=identicon&s=25 Jason Mayer (slamboy)
on 2007-09-25 22:50
(Received via mailing list)
On 1/12/07, Phrogz <gavin@refinery.com> wrote:
>
>
> \Man, I'm the king of multiple posting today
> \\Fark slashies ftw!
> \\\Back to work


+1


:)
D1588981e0248aaa0174906c99df180e?d=identicon&s=25 Andy Lester (Guest)
on 2007-09-25 22:52
(Received via mailing list)
>>
>> I wrote the following ruby statements..  I get the result I need , I
>> tried to DRY it for 2 hours without being successfull ,
>>
>>       d = d.gsub(/\r\n/,' ')   # get rid of carriage return
>>       d = d.gsub(/;/,' ')  # replace column by space
>>       d = d.gsub(/,/,' ') # replace comma by space
>>       a  = d.split(' ')   # split into component , space as divider

There's nothing in these four lines of code that violates the idea of
DRY.  There is no repeated code.  Multiple calls to the same method
are perfectly OK.

xoa
Ec9233451f7c6ba37a83388b87a1f565?d=identicon&s=25 Phrogz (Guest)
on 2007-09-25 22:52
(Received via mailing list)
James Edward Gray II wrote:
> a = d.split(/(?:\r\n|[;, ])/)

Way more elegant. Way to see beyond the step-by-step process to the end
goal.
This topic is locked and can not be replied to.