Code Review: EncodingsFinal

tfpt review “/shelveset:EncodingsFinal;REDMOND\tomat”

Outer DLR:

  •      Adds Invariant, Ensures, Result, Parameter and Out stubs to 

ContractUtils mimicking Dev10 contracts. These allow us to specify
post-conditions and object invariants in code rather than comments.


  •      Implements infrastructure for $KCODE variable. There are only 

3 encodings settable to KCODE (UTF8, SJIS, EUC). These encodings are
implemented as special encodings (aka “k-codings”, RubyEncoding.KCode*
singletons) and need to be special cased. For example, String#size on a
string containing a single UTF8 2-byte character returns 1 if its
encoding is UTF8, but 2 if it is KCodeUTF8. This emulates MRI 1.8 where
strings have no associated encoding.

  •      $KCODE is in general considered obsolete and is not available 

in Silverlight build.

  •      Replaces List<byte> and StringBuilder MutableString 

representations with byte[] and char[]. Reimplements basic
char/byte/string buffer operations and moves them to Utils.cs.

  •      Improves implementation of MutableString.GetHashCode - the 

hashcode is now cached on the string until the string is modified. The
hash code calculation includes encoding if there are any non-ASCII
characters in the string. Otherwise the encoding is not part of the

  •      Adds support for multi-byte identifiers in source code if the 

file has non-binary encoding or k-coding. Any non-ASCII character is
considered a lower case letter for the purpose of identifier
classification (constant, global var, instance var, class var, local,
method name).

  •      Fixes \xXX escapes in encoded strings - subsequent escaped 

bytes can form a single character or part if a character. In both cases
the string’s representation is switched to binary so that no information
is lost. StringContentBuilder takes care of construction such strings.
At runtime a string with an incomplete character suffix can be
concatenated with a string with the missing part of the character and
together these bytes might form a valid character.

  •      Adds bunch of unit tests for MutableString and encodings.
  •      Reimplements String#dump and String#inspect to handle encoded 

strings correctly. Moves the implementation to MutableString so that we
can use it as a debug view for MutableString as well.

  •      Fixes specs - KCODE was set to UTF8 by one spec and not 

restored, which affected subsequent specs.