Code Review: StringGsubEach

tfpt review “/shelveset:StringGsubEach;REDMOND\dremy”
Comment :
Changes to for string.gsub and string.each (and each_line) to run
clean. The most significant change is to track version in
MutableString. This version is bumped on any mutation and functions
that iterate (each, each_line, gsub) check the version before and after
to assure there has been no version change during the iteration. After
these changes all gsub, each, and each_line specs run clean (no
excludes). Note that although the specs run clean the each behavior
does not match MRI. The spec test contains a new line in the iterating
string (“hello\nworld”) and MRI does throw a runtime exception if this
string is iterated. However if there is no new line in the string MRI
does not throw an exception if the underlying string is mutated. This
seems inconsistent but worth noting.

String#gsub with pattern and block sets $~ for access from the block
String#gsub with pattern and block restores $~ after leaving the block
String#gsub with pattern and block sets $~ to MatchData of last match
and nil when there’s none for access from outside
String#gsub with pattern and block raises a RuntimeError if the string
is modified while substituting
String#each raises a RuntimeError if the string is modified while
substituting
String#each_line raises a RuntimeError if the string is modified while
substituting

Looks good.

Why isn’t MutableString.version called MutableString.Version? That would
be the consistent naming convention, right?

The current versioning implementation is not thread-safe. A thread-safe
implementation would require using InterlockedCompareAndExchange to
update version, right? If we are deferring nailing of the threading
story, do we have a bug to track the issue so that we can revisit this?

Similarly, what is the semantics of freezing? The MutableStringOps
methods like Append call MutableStringOps.RequireNotFrozenAndIncrVersion
and then do the update without a lock. Since atomicity is not
guaranteed, there could be race with another thread freezing the object
half way through the update which could result in weird behavior (for
eg, if the update is a multiple step operation).

This change does not have to be blocked on this issue, but if we can
decide how we want thread-safety to work, we can start acting on it
instead of building up debt in the library.

Should incrementing the version check for overflow? If not, a comment
would be useful to make the intent obvious that overflow would cause an
issue only in extreme cases that will surely never happen for real since
most updates will be pretty quick. OTOH, if any of the updates runs user
code, then overflow is a real issue, though still quite unlikely.

Can MutableStringOps.RequireNotFrozenAndIncrVersion and
MutableStringOps.RequireNoVersionChange be methods on MutableString?
Inspecting/updating the version outside of MutableString seems to break
the encapsulation pretty badly.

Thanks,
Shri

Hmm… now that I think about it, it’s really only “thread safe enough”
if you restrict yourself to 2 threads. With three threads performing
simultaneous access, I can work out a sequence that breaks.

Our current threading story is to ensure that shared state is protected,
but we haven’t really done any work to protect individual objects from
threading issues.

The version implementation is “thread safe enough” because the absolute
value of the version isn’t important, only the fact that it’s changed.
With an initial value of 1, simultaneous updates from two threads could
result in a value of 2 instead of 3 – but this still tells the iterator
that a change has occurred. In any event, I’m pretty sure that the
subsequent mutation of the string object itself isn’t thread-safe at
all.

I agree with moving the helper functions onto the MutableString class
itself and then making the version private.

I don’t think MutableString needs to be thread safe. User should ensure
that a string instance is not shared among threads or use
synchronization primitives of ‘thread’ library to access it. It would
maybe make sense to make frozen string thread-safe since it is supposed
to be read-only.

Like Dictionary<K,V> is not thread safe. It does increment its version
like this: this.version++

We should ensure that runtime structures (RubyExecutionContext,
RubyModule, …) are thead-safe (which they are not now, we need to fix
that). Other than that any mutable structure is thread-unsafe unless
specified otherwise (e.g. Queue is thread-safe).

RW overflow: I thought we compile in “checked” mode, so that all
operations are checked for overflow/underflow unless marked unchecked.
I’ve just double-checked and apparently we don’t. I think we should flip
the bit.

RE encapsulation - I agree. The version advancing and checks should be
on MutableString. String freezing also needs some improvements, so let’s
file a bug for both.

Tomas

If we use Interlocked.Increment and Interlocked.Decrement, it will
automatically wrap the value around without throwing an exception.

I don’t generally think about cache coherency issues, so Shri is
absolutely right. I assume a lock statement would generate the
appropriate memory barriers?

If MRI strings are not meant to be thread-safe, we don’t have to worry
about it. Has anyone confirmed that the intent is to move the
synchronization responsibility to user code? Note that Python lists
guarantee thread-safety and so IronPython had to do work to support
that.

Regarding Curt’s comment, I think the code is broken even with 2 threads
(if we care about thread-safety). If you have a multi-processor machine,
and the version field and the string data are on separate cache lines,
then when the first thread on the first processor does a mutating
action, only the string data cache line might get written to main
memory. The second thread on the second processor could pick up the
mutated data but not the new version number. Beware of hand-rolling your
own synchronization primitives. Unless you have read all the ugly
details of memory models, you are very likely to get it wrong. See
Concurrency: What Every Dev Must Know About Multithreaded Apps | Microsoft Learn. Just use a lock.

Thanks,
Shri

Another good reference by the same author is at