Is there a replacement for sub?

Michael_WSRyder · July 20, 2007, 9:05pm

Martin DeMello wrote:

to an object.
pointing to it

b = a.sub(" ", “”) # => b is now “ab c d e f”, leaves a unchanged

a = a.sub(" ", “”) # => a is now “ab c d e f”, the old string is
garbage collected

a = “a b c d e f”
b = a
a = a.sub(" ", “”) # => a is now “ab c d e f”, b is still “a b c d e f”

Wrong. Unless you use b = a.dup b will be the same as a as it is
pointing to the same object. The result of the sub operation will
change b as well as a.

Note the last bit carefully - sub! modifies the object, and returns

10.times {a = a.sub(" ", ‘’)} # creates a series of 10 strings, the
intermediate ones being GCd

try this:

a = “a b c d e f”
intermediate = [a]
10.times {|i| intermediate[i+1] = intermediate[i].sub(" ", ‘’)}
p intermediate

I guess my problem is that the sub operations return a nil if they don’t
make a change rather than the unchanged object. This seems to be
inconsistent as you get the changed object back if it works but nothing
if it doesn’t. Maybe I need to write a version of sub or a method for
it that returns the string regardless of whether it was changed or not.

Michael_WSRyder · July 20, 2007, 6:23pm

On Jul 20, 2007, at 3:15 AM, Michael W. Ryder wrote:

Is there nothing in regular expressions where you can tell it to do
something up to n times?

OK, is this what you’re looking for?

result = []
7.times do |n|
result << “a b c d e f”.sub(/(\S\s){0,#{n}}/) { |m| m.delete(" ") }
end
result # =>
[“a b c d e f”, “ab c d e f”, “abc d e f”, “abcd e f”, “abcde f”,
“abcdef”, “abcdef”]

Regards, Morton

Michael_WSRyder · July 20, 2007, 9:17pm

Michael W. Ryder wrote:

Martin DeMello wrote:

a = “a b c d e f”
b = a
a = a.sub(" ", “”) # => a is now “ab c d e f”, b is still “a b c d e f”

Wrong.

Nope.

Unless you use b = a.dup b will be the same as a as it is
pointing to the same object.

Yes.

The result of the sub operation will
change b as well as a.

No, sub won’t change the actual string - that’s what sub! would do. It
will
create a new string which will be assigned to a, at which point a and b
won’t
be pointing at the same object anymore.

I guess my problem is that the sub operations return a nil if they don’t
make a change rather than the unchanged object.

They don’t. sub! (the destructive method) will return nil, sub will
return the
unchanged object.

Maybe I need to write a version of sub or a method for
it that returns the string regardless of whether it was changed or not.

sub does that already.

Michael_WSRyder · July 20, 2007, 9:24pm

On 7/20/07, Michael W. Ryder [email protected] wrote:

a = “a b c d e f”
b = a
a = a.sub(" ", “”) # => a is now “ab c d e f”, b is still “a b c d e f”

Wrong. Unless you use b = a.dup b will be the same as a as it is
pointing to the same object. The result of the sub operation will
change b as well as a.

The original author was correct. b points to “a b c d e”
a gets replaced with “ab c d e” and b still points to the original
string.

You’re thinking something like

a = [1,3,4]
b = a
a[0] = 5

a and b are both [5,3,4]

irb(main):001:0> a = “a b c d e”
=> “a b c d e”
irb(main):002:0> b = a
=> “a b c d e”
irb(main):003:0> a = a.sub(" ", “”)
=> “ab c d e”
irb(main):004:0> a
=> “ab c d e”
irb(main):005:0> b
=> “a b c d e”
irb(main):006:0>

Matt

Michael_WSRyder · July 20, 2007, 10:17pm

Sebastian H. wrote:

Unless you use b = a.dup b will be the same as a as it is

My mistake. Too many tests with and without the ! operator. I guess
that my test used sub! instead of sub. This just proves my point that
the whole thing is too confusing.

Michael_WSRyder · July 20, 2007, 10:22pm

Hi –

On Sat, 21 Jul 2007, Michael W. Ryder wrote:

be pointing at the same object anymore.

My mistake. Too many tests with and without the ! operator. I guess that my
test used sub! instead of sub. This just proves my point that the whole
thing is too confusing.

Keep in mind that the ! means that the method is the “dangerous”
version. In this case, the danger consists of both the changing of
the object in place, and the returning of nil when there’s no change.
Any time you see a ! in a method name you’ve been warned that there’s
probably something you need to be particular aware of and careful
about.

David

Michael_WSRyder · July 22, 2007, 11:11am

On 22.07.2007 04:36, Michael W. Ryder wrote:

b = a

that my test used sub! instead of sub. This just proves my point
returns nil if nothing is done. C has the ++ operator which can be very
dangerous but it never returns nil. It may be safer to use a = a + 1 or
even better b = a + 1, but I don’t have to worry that a++ will return 0
unless a was -1 before. I realize that the ! operator is a shortcut, but
if it is going to be a shortcut that is all it should be.

It seems there is some confusion around: the only “! operator” there is
in Ruby is the logical “not” as in “if ! (x > 10) then…”. The
exclamation mark you are talking about is a part of the method
identifier just like the “s”.

They current
way it is used with sub and gsub is that sometimes it returns the
results in the specified object and other times it wipes out the
object.

There is no wiping out going on. gsub!, sub! and other methods are
defined to return “self” (the object you invoke the method on) if there
were changes and “nil” if there were no changes. That’s the contract
and it’s documented. Moreover that’s pretty consistently adhered to
although there are other methods that change the receiver that do not
have an exclamation mark in their identifier.

This makes it Very hard to write a program where you are not
always sure of the data, which in my case is a lot of the time.
For example if I was cleaning a data file from a customer where he used
dashes in a Social Security Number and I only wanted to remove up to the
first two dashes in the number, knowing that if more are present that
the number is invalid, I would use a variant of sub. I would not want
to have to program for cases where there were no or one dash in the
number.

You don’t have to. Just do

2.times { ssn.sub! /-/, ‘’ }

or

ssn.sub! /-([^-)*)-/, ‘\1’

True, I could spend a lot of time trying to catch every
abnormality, but then where is the advantage in using a language like
Ruby, when I could do less work with a “primitive” language that does
what I want without having to check what it is doing.

As I said, you do not have to. It seems you haven’t accustomed yourself
to Ruby or maybe OO programming in general. It took me quite some time
to grok OO when I was first exposed to it (using Borland’s Turbo Pascal
at the time), but once you get the hang of it things fall into place
very nicely.

I am not saying that Ruby is a “bad” language, as it has a lot of very
nice features. My problem is that I am not always expecting some of the
idiosyncrasies of the language. When I see something like sub! I expect
that it will return my string with the substitution made or the
unchanged string, not a nil.

As you say, it’s /your/ problem. I think this will go away pretty
soon as you get accustomed to the language.

Kind regards

robert

Michael_WSRyder · July 22, 2007, 11:45am

Hi –

On Sun, 22 Jul 2007, Michael W. Ryder wrote:

b = a
The result of the sub operation will change b as well as a.
whole thing is too confusing.
nil if nothing is done.
Think about the implications of that, though: what you’re saying, in
effect, is that Ruby should adapt itself in great detail to everyone’s
background – which really means that Ruby should do everything the
same way all other languages (since someone has a background in
every language) do it. Of course, there’s no such thing

It’s not that Ruby is above criticism, of course; but it’s best to
take it on its own terms as much as possible, and you’ll find that
things are very well planned and thought through. The ! isn’t
supposed to be a trap or a surprise; on the contrary, it’s an explicit
indicator of exactly what you’re concerned about – namely, the fact
that the behavior of this method may do “dangerous” things that the
the non-! version doesn’t do.

Have a look at: Coming From Ruby for
more encouragement along these lines

C has the ++ operator which can be very dangerous but it never
returns nil.

It also doesn’t tell you, right in its name, that you should be
cautious about danger.

It may be safer to use a = a + 1 or even better b = a + 1, but I
don’t have to worry that a++ will return 0 unless a was -1 before. I
realize that the ! operator is a shortcut, but if it is going to be
a shortcut that is all it should be.

It’s not a shortcut; as Robert K. said, it’s just the last character
in the method name. It has no language-level significance, but the
contract in place is that it means “dangerous”.

David

Michael_WSRyder · July 22, 2007, 4:42am

[email protected] wrote:

a = a.sub(" ", “”) # => a is now “ab c d e f”, b is still "a b c d
The result of the sub operation will change b as well as a.
the whole thing is too confusing.

Keep in mind that the ! means that the method is the “dangerous”
version. In this case, the danger consists of both the changing of
the object in place, and the returning of nil when there’s no change.
Any time you see a ! in a method name you’ve been warned that there’s
probably something you need to be particular aware of and careful
about.

Unfortunately, my background has always been that a function never
returns nil if nothing is done. C has the ++ operator which can be very
dangerous but it never returns nil. It may be safer to use a = a + 1 or
even better b = a + 1, but I don’t have to worry that a++ will return 0
unless a was -1 before. I realize that the ! operator is a shortcut, but
if it is going to be a shortcut that is all it should be. They
current way it is used with sub and gsub is that sometimes it returns
the results in the specified object and other times it wipes out the
object. This makes it Very hard to write a program where you are not
always sure of the data, which in my case is a lot of the time.
For example if I was cleaning a data file from a customer where he used
dashes in a Social Security Number and I only wanted to remove up to the
first two dashes in the number, knowing that if more are present that
the number is invalid, I would use a variant of sub. I would not want
to have to program for cases where there were no or one dash in the
number. True, I could spend a lot of time trying to catch every
abnormality, but then where is the advantage in using a language like
Ruby, when I could do less work with a “primitive” language that does
what I want without having to check what it is doing.
I am not saying that Ruby is a “bad” language, as it has a lot of very
nice features. My problem is that I am not always expecting some of the
idiosyncrasies of the language. When I see something like sub! I expect
that it will return my string with the substitution made or the
unchanged string, not a nil.

Michael_WSRyder · September 25, 2007, 11:00pm

Hi –

On Sun, 22 Jul 2007, Bernard K. wrote:

Date:

the pattern. gsub will not do this - you need to run sub in a loop.
for sub! I would override the sub! method as follow
anything so long as the object is in the desired state.

For an array such as arr = [1,2,3,4,5,6], I can safely chain the bang!
methods without worrying about nil returns from flatten! and uniq!

arr.flatten!.uniq!.sort! # NO ERROR MSG “undefined method `uniq!’ for
nil:NilClass (NoMethodError)” because flatten! would have returned nil
p arr => [1, 2, 3, 4, 5, 6]

I would very, very strongly advise you, and everyone else, not to do
this. You will break any code (inside the standard library and/or any
other code you load, or any code that uses your code) that depends on
the documented behavior of these methods. You may not like how sub!
and friends work, but it’s a very bad idea to make the decision to
change them on behalf of everyone else, over and above the language
documentation.

David

Michael_WSRyder · September 25, 2007, 11:05pm

[email protected] wrote:

n.times { self.sub!(pattern, replacement) }
  n.times { @str = @str.sub(pattern, replacement) }
As I mentioned in an earlier post times seems to work fine with zero
or negative values not changing the string. The replacement code
seems “cleaner” but I may be missing some gotcha that your code prevents.
Again, thanks for improving my knowledge of Ruby.

You are absolutely correct!!! Your improvement does what you expect it
to do.

Michael_WSRyder · September 25, 2007, 11:04pm

On 7/23/07, bbiker [email protected] wrote:

> My primary objection to have a nil return is that it prevents me from > safely chaining bang! methods. > > The nil return is counter-intuitive and violates the Principle of > Least Surprise Yours yes, mine not at all

… unless gsub!(…, …)

is quite useful to me.

As I said before given an array such as arr = [1, 2, 3, 4, 5, 6], I
can do

new_arr = arr.flatten.uniq.sort => [1, 2, 3, 4, 5, 6]

Intuitively I would think that I should be able to do
arr.flatten!.uniq!.sort!; however because of the nil return by
#flatten!, a NoMethodError is raised by #uniq! since the nil object
does not have a uniq! method.
That is a different case, I adhere to your POV on this one.

Note that not all bang! methods return nil when nothing was changed in
the receiver. Array#sort! return self if self was already sorted.

Hopefully Matz might be reading this thread and might consider
changing the behavior of bang! methods returns.
Why should he? He might consider it if it were a largely discussed
and backed up RCR, there seems some way to go…

Robert

We’re on a mission from God. ~ Elwood,

Michael_WSRyder · September 25, 2007, 11:05pm

bbiker wrote:

def subn(pattern, replacement, n = 1)
return self if n < 1
@str = self.sub(pattern, replacement)
(n-1).times { @str = @str.sub(pattern, replacement) }

Is there any reason you used the above three lines instead of:
@str = self
n.times { @str = @str.sub(pattern, replacement) }

As I mentioned in an earlier post times seems to work fine with zero or
negative values not changing the string. The replacement code seems
“cleaner” but I may be missing some gotcha that your code prevents.
Again, thanks for improving my knowledge of Ruby.

@str

end

Michael_WSRyder · September 25, 2007, 11:06pm

[email protected] wrote:

Subject:

=> “a b c d e f”
I have similar problem with Array#flatten!, Array#uniq! since it
self
for nil:NilClass (NoMethodError)" because flatten! would have returned

What I planned to do was get rid of the alias and change the name of the
new sub! method to something like subf! for just that reason. I still
prefer this version much better than the original version as there are
no surprises. It is very hard to change 30+ years of practice
overnight.

Michael_WSRyder · September 25, 2007, 11:05pm

Bernard K. wrote:

Date:

the pattern. gsub will not do this - you need to run sub in a loop.

for sub! I would override the sub! method as follow

class String
alias old_sub! sub!
def sub!(pattern, replacement)
self.old_sub!(pattern, replacement)
self
end
end

And this can easily be expanded to do what I was trying to do in the
first place! Adding the following method with your replacement method:

class String
def subn!(pattern, replacement, n=1)
n.times do
self.sub!(pattern, replacement)
end
end
end

With this I can specify a.subn!(’ ', ‘’, 10) and it will return ‘abcdef’
just like I was looking for. It will work with zero or negative values
returning ‘a b c d e f’, again like I would expect. Of course if you
enter a non-integer value it will error out in its current state but the
error is from the times method.

As a user of a method, I really do not care if the method did not have
to do anything so long as the object is in the desired state.

I too hate surprises.

Michael_WSRyder · September 25, 2007, 11:08pm

On Jul 23, 2:53 pm, “Michael W. Ryder” [email protected]
wrote:

From:

must have missed something obvious!!!
made.
end
this. You will break any code (inside the standard library and/or any
David- Hide quoted text -
self # or return self
Note that sub and sub! are the original definitions
definitions for when I want to do single replacements that are free of
surprises. Maybe I will someday be able to use the original definitions
with confidence but when I am in the middle of a programming run I hate
to stop and look up language definitions to avoid problems. It just
ruins the flow of thought.- Hide quoted text -

Show quoted text -

for single substitution you can use subn!(pattern, replacement) since
n defaults to 1. Ditto for subn.

Michael_WSRyder · September 25, 2007, 11:08pm

From: bbiker [mailto:[email protected]]

str = “hello”

str.downcase!.capitalize! => NoMethodError: undefined method

`captitalize!’ for nil:NilClass

My point is that method chaining is a common ruby idiom and nil

returns get in the way!!!

imnsh opinion,

bang methods do not look nice on chains. In fact, they look like they
cut chains

use non-bang methods instead. syntax is very clean.

these are some stupid examples,

irb(main):001:0> “asdf”.downcase
=> “asdf”
irb(main):002:0> “asdf”.downcase!
=> nil
irb(main):003:0> “asdf”.downcase.capitalize.capitalize
=> “Asdf”
irb(main):004:0> “asdf”.downcase!
=> nil
irb(main):005:0> “asdf”.downcase!.capitalize!
NoMethodError: undefined method capitalize!' for nil:NilClass from (irb):5 from :0 irb(main):006:0> "asdfX".downcase!.capitalize! => "Asdfx" irb(main):007:0> "asdfX".downcase!.capitalize!.capitalize! => nil irb(main):008:0> "asdfX".downcase!.capitalize!.capitalize!.capitalize! NoMethodError: undefined method capitalize!’ for nil:NilClass
from (irb):8
from :0
irb(main):009:0> “asdfX”.downcase.capitalize.capitalize.capitalize
=> “Asdfx”

kind regards -botp

Michael_WSRyder · September 25, 2007, 11:09pm

On Jul 23, 4:38 pm, “Robert D.” [email protected] wrote:

… unless gsub!(…, …)

is quite useful to me.

well how about this counter example?

str = “hello”

str.downcase!.capitalize! => NoMethodError: undefined method
`captitalize!’ for nil:NilClass

My point is that method chaining is a common ruby idiom and nil
returns get in the way!!!

I believe that there was/is a thread regarding chaining when a method
returns nil.

Hopefully Matz might be reading this thread and might consider
changing the behavior of bang! methods returns.

Why should he? He might consider it if it were a largely discussed
and backed up RCR, there seems some way to go…

That’s for sure. I know that I am whistling in the wind. :<o)

Michael_WSRyder · September 25, 2007, 11:10pm

Hi –

On Mon, 23 Jul 2007, Michael W. Ryder wrote:

What I planned to do was get rid of the alias and change the name of the new
sub! method to something like subf! for just that reason. I still prefer
this version much better than the original version as there are no surprises.
It is very hard to change 30+ years of practice overnight.

I suspect you’re not giving yourself enough credit Anyway, soon
it won’t be overnight anymore Meanwhile, I think doing it in an
additive way is definitely better, though not entirely clash-proof.

David

Michael_WSRyder · September 25, 2007, 11:09pm

[email protected] wrote:

Fri, 20 Jul 2007 20:49:07 +0900

I was reading this whole thread and I kind of think to be dreaming, I
(This is where perl’s “continue matching where you left off” would
have been a nice optimisation)

The primary problem is that sub! returns nil when no substitutions are
made.

I have similar problem with Array#flatten!, Array#uniq! since it causes
problem when chaining!
My solution has been to override these bang! methods to return self even
when the bang! method did not have to change the receiver.

for sub! I would override the sub! method as follow

class String
alias old_sub! sub!
def sub!(pattern, replacement)
self.old_sub!(pattern, replacement)
self
end
end

As a user of a method, I really do not care if the method did not have
to do anything so long as the object is in the desired state.

For an array such as arr = [1,2,3,4,5,6], I can safely chain the bang!
methods without worrying about nil returns from flatten! and uniq!

arr.flatten!.uniq!.sort! # NO ERROR MSG “undefined method `uniq!’ for
nil:NilClass (NoMethodError)” because flatten! would have returned nil
p arr => [1, 2, 3, 4, 5, 6]