The future of the character-encodings library

dubstep · March 16, 2011, 2:27pm

Hi!

As some of you know the character-encodings library is a bit stale.
It currently can’t be used from Ruby 1.9 (you may ask yourself why you
would, I suppose) because of the Encoding namespace being taken, there
have been some compilation problems where gcc on Cygwin/MingW doesn’t
support the visibility attribute, and the tests depend on an ancient
version of RSpec. I am in the process of fixing these wrongs, but I
need your help.

The big problem for me is figuring out how to namespace it. But
before anyone tries to come up with a solution, let me describe my
vision of this libraries future.

Character-encodings will be a library that allows you to deal with
UTF-8-encoded Strings in Ruby 1.8 and with collation, normalization,
Unicode-table lookup and other Unicode-specific tasks in Ruby 1.[89].
My original vision was that this library would support many more
encodings, but the internet has spoken and UTF-8 is the future. (I
also had a hope that Ruby programmers were going to begin namespacing
their projects a bit better, but Ruby programmers prefer libraries
called “Hpricot” over libraries called “Parsers::HTML”.) Ruby 1.9 adds
support for a range of encodings that I’m not at all interested in and
I think that this library needs to be more focused to have any sort of
future.

Therefore, I would like to rename the library and its namespaces to
reflect this change. The apt name “Unicode” is, sadly, already taken.
I was thinking of “Runicode”, but that’s perhaps a bit lame.

A second question is one of API design. How should you, from Ruby
1.8, be able to create a UTF-8-aware String? Currently you write
either u"äbc" or +“äbc”. I don’t like this style anymore. I don’t
want to pollute Kernel or String unnecessarily. I would like to be
able to provide an API that would allow you to run the same .rb file
in both 1.8 and 1.9 and get the same results. This is, perhaps, not
possible, given that 1.9 uses a dizzying array of methods to determine
the encoding of a String. One could, of course, make Kernel#u a no-op
for 1.9. Could any of the users of this library please provide me
with some input on this point.

I’m looking forward to receiving your input!

Nikolai_W · March 16, 2011, 9:47pm

On Mar 16, 2011, at 6:26 AM, Nikolai W. wrote:

Could any of the users of this library please provide me with some input on this
point.

Im looking forward to receiving your input!

There don’t appear to be many users of character-encodings:

https://rubygems.org/gems/character-encodings

Nikolai_W · March 16, 2011, 11:23pm

Eric, could you please reply to all in the future? I have “skip” set
for this mailing list as, as you point out below, it’s rather high in
noise. It makes it rather hard to stitch things together when I can’t
easily reply to your reply.

Eric H. wrote:

On Mar 16, 2011, at 6:26 AM, Nikolai W. wrote:

Could any of the users of this library please provide me with some input on
this point.

I’m looking forward to receiving your input!

There don’t appear to be many users of character-encodings:

character-encodings | RubyGems.org | your community gem host

I don’t see how this is relevant, but thank you for pointing out my
failure in selling and maintaining my library.

Nikolai_W · March 17, 2011, 12:56am

On Mar 16, 2011, at 3:15 PM, Nikolai W. wrote:

Eric, could you please reply to all in the future?

No. I don’t know two of the email addresses in your To header so I
can’t judge if my response is topical for them.

The third appears to be a mailing list to which I am not subscribed. I
don’t wish to fend off possible “you must subscribe” bounces.

I have skip set for this mailing list

I don’t know what this means.

I think it means that you don’t want to see messages from this mailing
list. If this is true why did you post to it?

as, as you point out below, its rather high in noise.

I don’t see where I made this assertion.

It makes it rather hard to stitch things together when I cant easily reply to
your reply.

I don’t see why I should be inconvenienced to make it easier for you to
see responses you do not want to see.

I dont see how this is relevant, but thank you for pointing out my
failure in selling and maintaining my library.

I was attempting to suggest that since there aren’t many downloads for
your gem maybe there’s no need for you to continue to maintain it in its
current form (if at all).

Some of the functionality of your gem has been taken up by ruby 1.9.
Anyone seriously considering handling encodings other than US ASCII
should move to 1.9. I would rebuild character-encodings atop 1.9 if I
were in the maintainer and had such a need.

Due to the low number of downloads you have an excellent opportunity to
throw out your existing API and rebuild your library to integrate well
with the encoding features of ruby 1.9.

I don’t see why you would consider a low number of downloads to be any
failure on your part. I simply made a statement of fact. I have many,
many gems that nobody uses and I no longer maintain. It would be
ridiculous for me to attempt to attach any judgements to such a fact.

Nikolai_W · March 17, 2011, 11:45am

On Thu, Mar 17, 2011 at 00:56, Eric H. [email protected] wrote:

On Mar 16, 2011, at 3:15 PM, Nikolai W. wrote:

Eric, could you please reply to all in the future?

No. I don’t know two of the email addresses in your To header so I can’t judge
if my response is topical for them.

But I do and I made the judgment call for you.

The third appears to be a mailing list to which I am not subscribed. I don’t
wish to fend off
possible “you must subscribe” bounces.

That is a valid point. I should have cross-posted my request for help
instead.

I have “skip” set for this mailing list

I think it means that you don’t want to see messages from this mailing list.

Correct.

If this is true why did you post to it?

Because I wanted this to reach as many (interested) people as
possible. If I’m going to make a big change here I want as many to
know about it as possible.

I know that people have used the library in the past, especially in
back-ends, which makes it a lot harder to know how many users I
actually have. I have, believe it or not, even been paid (minute
amounts) to work on this library. I figured that perhaps there were
some hidden users that I didn’t know about that were still using it
and I therefore posted to the most public Ruby forum that I know of.

as, as you point out below, it’s rather high in noise.

I don’t see where I made this assertion.

You implicitly made (I thought at the time, see below) it by saying
that the library in question doesn’t have that many users and, as
such, my posting wasn’t relevant to the majority of the readers of
this list. This low level of relevancy is something that I have
judged to be the case for many topics on this list.

It makes it rather hard to stitch things together when I can’t easily reply to
your reply.

I don’t see why I should be inconvenienced to make it easier for you to see
responses you do not want to see.

The inconvenience that you would have to endure by pressing Reply to
all and removing the char-encodings list from the Cc list must surely
not be as great as that which you have put me through by not including
me in the Cc list so that I would receive your response to the posting
that I made (that I, of course, do want).

Either way, this is a moot point, as I’ve now set noskip. (I was
hoping that either those that replied would include me or that the
mailing list software would be intelligent enough to not skip replies
to my postings. I was wrong.)

I don’t see how this is relevant, but thank you for pointing out my
failure in selling and maintaining my library.

I was attempting to suggest that since there aren’t many downloads for your gem
maybe there’s no need for you to continue to maintain it in its current form (if
at all).

Then, for my sake, please say so. A short – easily interpreted as
snide – remark like that can easily be misinterpreted.

Some of the functionality of your gem has been taken up by ruby 1.9. Anyone
seriously considering handling encodings other than US ASCII should move to 1.9.
I would rebuild character-encodings atop 1.9 if I were in the maintainer and had
such a need.

To what need are you referring?

The whole point of the library was to provide UTF-8 support for 1.8.

I now want to shift focus to both providing support for UTF-8 for
those of us stuck with 1.8 (due to 1.9’s horrendous I/O and require
performance on Windows) and as an extension to 1.9’s built-in Unicode
support.

Looking at 1.9 it is now (because it sure wasn’t in 2006 when I began
developing this library) clear that Ruby won’t be supporting a lot of
features that would be desirable. You can, for example, not easily
perform collation, normalization, or character-class lookup. Even
such a thing as String#upcase doesn’t seem to be able do the right
thing. I might be doing something wrong, but

-- coding: utf-8 --

puts “äbc”.upcase

prints “äBC”, not “ÄBC”.

Due to the low number of downloads you have an excellent opportunity to throw
out your existing API and rebuild your library to integrate well with the encoding
features of ruby 1.9.

I know that this type of behavior is popular in the Ruby community,
but I wanted to give my, albeit few, users a chance to have their say
on this matter.

I don’t see why you would consider a low number of downloads to be any failure
on your part. I simply made a statement of fact.

There are many statements of fact that one can make that are often
best not made.

As I already noted above, you need to contextualize such a statement
so that it’s not open for interpretation. Had you written something
along the lines of “Judging from the statistics at rubygems.org,
perhaps you can get away with your proposed changes without too many
users becoming upset?”, I would have known what you were trying to get
at. As you wrote it it only stands as a pointless remark.

I have many, many gems that nobody uses and I no longer maintain.

I actually wish to continue maintaining this library and I actually do
have active users.

It would be ridiculous for me to attempt to attach any judgements to such a
fact.

Am I ridiculous for not sharing your level of detachment from your work?

I don’t know if you actually looked at the source code, but it’s
actually quite a few lines of (sometimes rather complex) code, and for
me to throw it away without at least considering its future utility is
not something that I could easily do.