Forum: IronRuby Code Review: EncodingsFinal

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
Cb51033949ffccd982ae32c9f890f25a?d=identicon&s=25 Tomas Matousek (Guest)
on 2009-03-13 19:36
(Received via mailing list)
Attachment: EncodingsFinal.diff (300 KB)
tfpt review "/shelveset:EncodingsFinal;REDMOND\tomat"

Outer DLR:

-          Adds Invariant, Ensures, Result, Parameter and Out stubs to
ContractUtils mimicking Dev10 contracts. These allow us to specify
post-conditions and object invariants in code rather than comments.


-          Implements infrastructure for $KCODE variable. There are only
3 encodings settable to KCODE (UTF8, SJIS, EUC). These encodings are
implemented as special encodings (aka "k-codings", RubyEncoding.KCode*
singletons) and need to be special cased. For example, String#size on a
string containing a single UTF8 2-byte character returns 1 if its
encoding is UTF8, but 2 if it is KCodeUTF8. This emulates MRI 1.8 where
strings have no associated encoding.

-          $KCODE is in general considered obsolete and is not available
in Silverlight build.

-          Replaces List<byte> and StringBuilder MutableString
representations with byte[] and char[]. Reimplements basic
char/byte/string buffer operations and moves them to Utils.cs.

-          Improves implementation of MutableString.GetHashCode - the
hashcode is now cached on the string until the string is modified. The
hash code calculation includes encoding if there are any non-ASCII
characters in the string. Otherwise the encoding is not part of the

-          Adds support for multi-byte identifiers in source code if the
file has non-binary encoding or k-coding. Any non-ASCII character is
considered a lower case letter for the purpose of identifier
classification (constant, global var, instance var, class var, local,
method name).

-          Fixes \xXX escapes in encoded strings - subsequent escaped
bytes can form a single character or part if a character. In both cases
the string's representation is switched to binary so that no information
is lost. StringContentBuilder takes care of construction such strings.
At runtime a string with an incomplete character suffix can be
concatenated with a string with the missing part of the character and
together these bytes might form a valid character.

-          Adds bunch of unit tests for MutableString and encodings.

-          Reimplements String#dump and String#inspect to handle encoded
strings correctly. Moves the implementation to MutableString so that we
can use it as a debug view for MutableString as well.

-          Fixes specs - KCODE was set to UTF8 by one spec and not
restored, which affected subsequent specs.

This topic is locked and can not be replied to.