Code Review: EncodingsFinal


#1

tfpt review “/shelveset:EncodingsFinal;REDMOND\tomat”

Outer DLR:

  •      Adds Invariant, Ensures, Result, Parameter and Out stubs to 
    

ContractUtils mimicking Dev10 contracts. These allow us to specify
post-conditions and object invariants in code rather than comments.

Ruby:

  •      Implements infrastructure for $KCODE variable. There are only 
    

3 encodings settable to KCODE (UTF8, SJIS, EUC). These encodings are
implemented as special encodings (aka “k-codings”, RubyEncoding.KCode*
singletons) and need to be special cased. For example, String#size on a
string containing a single UTF8 2-byte character returns 1 if its
encoding is UTF8, but 2 if it is KCodeUTF8. This emulates MRI 1.8 where
strings have no associated encoding.

  •      $KCODE is in general considered obsolete and is not available 
    

in Silverlight build.

  •      Replaces List<byte> and StringBuilder MutableString 
    

representations with byte[] and char[]. Reimplements basic
char/byte/string buffer operations and moves them to Utils.cs.

  •      Improves implementation of MutableString.GetHashCode - the 
    

hashcode is now cached on the string until the string is modified. The
hash code calculation includes encoding if there are any non-ASCII
characters in the string. Otherwise the encoding is not part of the
hash.

  •      Adds support for multi-byte identifiers in source code if the 
    

file has non-binary encoding or k-coding. Any non-ASCII character is
considered a lower case letter for the purpose of identifier
classification (constant, global var, instance var, class var, local,
method name).

  •      Fixes \xXX escapes in encoded strings - subsequent escaped 
    

bytes can form a single character or part if a character. In both cases
the string’s representation is switched to binary so that no information
is lost. StringContentBuilder takes care of construction such strings.
At runtime a string with an incomplete character suffix can be
concatenated with a string with the missing part of the character and
together these bytes might form a valid character.

  •      Adds bunch of unit tests for MutableString and encodings.
    
  •      Reimplements String#dump and String#inspect to handle encoded 
    

strings correctly. Moves the implementation to MutableString so that we
can use it as a debug view for MutableString as well.

  •      Fixes specs - KCODE was set to UTF8 by one spec and not 
    

restored, which affected subsequent specs.

Tomas