Foreign characters in strings

Hi there,

I have a test that checks to make sure a string has been
URI.unescape’d. I’m testing two different strings. One string that I’m
dealing with contains foreign characters, and the other string
contains ‘regular’ English characters. Here are the tests:

describe Song, “should be valid” do
before(:each) do
@song = Song.new
end

it “when location= has been sanitized dealing with foreign characters”
do
@song.location =
“file://localhost/Users/dave/Music/iTunes/iTunes%20Music/Sigur%20Ros/Agaetis%20Byrjun/05%20Ny%CC%81%20batteri%CC%81.m4a”
@song.save
@song.location.should == “/Users/dave/Music/iTunes/iTunes
Music/Sigur Ros/Agaetis Byrjun/05 Ný batterí.m4a”
end

it “when location= has been sanitized dealing with non-foreign
characters” do
@song.location =
“file://localhost/Users/dave/Music/iTunes/iTunes%20Music/Mogwai/Mr.%20Beast/Travel%20Is%20Dangerous.mp3”
@song.save
@song.location.should eql(“/Users/dave/Music/iTunes/iTunes
Music/Mogwai/Mr. Beast/Travel Is Dangerous.mp3”)
end

end

Here’s the following error I get when running the example:

when location= has been sanitized dealing with foreign characters
expected: “/Users/dave/Music/iTunes/iTunes Music/Sigur Ros/Agaetis
Byrjun/05 Ný batterí.m4a”,
got: “/Users/dave/Music/iTunes/iTunes Music/Sigur Ros/Agaetis
Byrjun/05 Ný batterí.m4a” (using ==)

And for what it’s worth, here’s the location= method:

def location=(location)
self[:location] = URI.unescape(location).gsub!(“file://localhost”, “”)
end

Obviously those strings are not equal in the result above, but when I
view the output using Textmate’s Rspec bundle, I see this:
http://antrover.com/data/rspec_results.png
(screenshot)

The strings look identical using Textmate’s Rspec bundle until I copy
and paste the output into some other text editor only to reveal the
fact the strings are not equal. This leads me to believe there’s an
encoding issue somewhere, but I’m not sure where. Textmate and the
terminal are both in UTF8.

When I copy the above error message into vim from Textmate’s Rspec
bundle I see this:
expected: “/Users/dave/Music/iTunes/iTunes Music/Sigur Ros/Agaetis
Byrjun/05 Ný batterí.m4a”,
got: “/Users/dave/Music/iTunes/iTunes Music/Sigur Ros/Agaetis
Byrjun/05 Ny?~A batteri?~A.m4a” (using ==)

Anyone have any ideas on how to make this (simple!) test pass with
flying colors?

Thank you,
Dave

On Mar 22, 2008, at 4:17 pm, Dave wrote:

Anyone have any ideas on how to make this (simple!) test pass with
flying colors?

You will need to use a unicode library. Judging by the length of the
strings being compared, I think what is happening is the ý and í
characters are being entered in different ways. URI is encoding them
as letter+accent (total length 84) whereas your test string is
encoding them as single character (total length 82). (Actual length
of the string is 80, I counted by hand).

I’m no unicode expert but I do know that some characters can be
encoded either the form <é> or the form <e + ´>

Without resorting to a unicode library you would have to do

URI.should_receive(:unescape).with("…")

to test it, which is more of a unit test than a functional test.

Hope this helps.

Ashley

Here’s the code I wrote as a test:

Error:
Song when location= has been sanitized dealing with foreign character
expected: 82, got: 84 (using ==)

$KCODE = “u”

require ‘uri’

class Song
attr_reader :location

 def location=(location)
  @location = URI.unescape(location).gsub!("file://localhost", "")
 end

end

describe Song do
before(:each) do
@song = Song.new
end

 it "when location= has been sanitized dealing with foreign

characters" do
location =
"file://localhost/Users/dave/Music/iTunes/iTunes%20Music/Sigur%20Ros/Agaetis%20Byrjun/05%20Ny%CC%81%20batteri%CC%81.m4a
"
@song.location = location

   expected_sanitized_location = "/Users/dave/Music/iTunes/iTunes

Music/Sigur Ros/Agaetis Byrjun/05 Ný batterí.m4a"
@song.location.length.should ==
expected_sanitized_location.length
@song.location.should == expected_sanitized_location
end

  it "when location= has been sanitized dealing with non-foreign

characters" do
location =
"file://localhost/Users/dave/Music/iTunes/iTunes%20Music/Mogwai/Mr.%20Beast/Travel%20Is%20Dangerous.mp3
"
@song.location = location

    expected_sanitized_location = "/Users/dave/Music/iTunes/iTunes

Music/Mogwai/Mr. Beast/Travel Is Dangerous.mp3"
@song.location.length.should ==
expected_sanitized_location.length
@song.location.should eql(expected_sanitized_location)
end
end