Code Review: EncodingsFinal

tfpt review “/shelveset:EncodingsFinal;REDMOND\tomat”

Outer DLR:

  •      Adds Invariant, Ensures, Result, Parameter and Out stubs to 

ContractUtils mimicking Dev10 contracts. These allow us to specify
post-conditions and object invariants in code rather than comments.


  •      Implements infrastructure for $KCODE variable. There are only 

3 encodings settable to KCODE (UTF8, SJIS, EUC). These encodings are
implemented as special encodings (aka “k-codings”, RubyEncoding.KCode*
singletons) and need to be special cased. For example, String#size on a
string containing a single UTF8 2-byte character returns 1 if its
encoding is UTF8, but 2 if it is KCodeUTF8. This emulates MRI 1.8 where
strings have no associated encoding.

  •      $KCODE is in general considered obsolete and is not available 

in Silverlight build.

  •      Replaces List<byte> and StringBuilder MutableString 

representations with byte[] and char[]. Reimplements basic
char/byte/string buffer operations and moves them to Utils.cs.

  •      Improves implementation of MutableString.GetHashCode - the 

hashcode is now cached on the string until the string is modified. The
hash code calculation includes encoding if there are any non-ASCII
characters in the string. Otherwise the encoding is not part of the

  •      Adds support for multi-byte identifiers in source code if the 

file has non-binary encoding or k-coding. Any non-ASCII character is
considered a lower case letter for the purpose of identifier
classification (constant, global var, instance var, class var, local,
method name).

  •      Fixes \xXX escapes in encoded strings - subsequent escaped 

bytes can form a single character or part if a character. In both cases
the string’s representation is switched to binary so that no information
is lost. StringContentBuilder takes care of construction such strings.
At runtime a string with an incomplete character suffix can be
concatenated with a string with the missing part of the character and
together these bytes might form a valid character.

  •      Adds bunch of unit tests for MutableString and encodings.
  •      Reimplements String#dump and String#inspect to handle encoded 

strings correctly. Moves the implementation to MutableString so that we
can use it as a debug view for MutableString as well.

  •      Fixes specs - KCODE was set to UTF8 by one spec and not 

restored, which affected subsequent specs.


This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs