On 12/14/05, Eero S. [email protected] wrote:
Jacob F. wrote:
I tried integrating that, but the 150 char limit meant that most – if
not all – of the bodies couldn’t be translated. Having translated
titles would be a slight improvement, but not enough. Thanks for the
You could split the strings into 150-char sections and loop over
them sending a new request for each. A Google translation might
not have such limits (quality of translation notwithstanding).
Well, truth be told, I couldn’t even get the snippet working for me,
regardless of the 150ish-char limit. The oddity is this: I can echo
out the URI I’m going to request, past that URI into my browser and
get a page with a decent translation. However, if I take that same URI
and run it through wget (or Ruby’s open-uri) I get a different result.
And the difference isn’t just in encoding, there are differences in
the returned HTML!
For instance, the URI
relating to the Japanese string: “Webrick DoSè??å¼±æ?§ã«ã¤ã?ã¦”. If I put that
URI in my browser, the resulting page says “About Webrick DoS
If I fetch that same URI via wget, the HTML returned has “Webrick DoS
^@ <86> ^@ Ã?Â± ^@ SECT. ^@ ^@ ^@ ^@ ^@ <84> ^@ ^@” instead (looking at
it in a non-unicode vim over putty). Even stranger, the page in my
browser has a hidden input with name=“q” and value=“About Webrick DoS
vulnerability”. The HTML returned by wget has a hidden input in the
same position (surrounding HTML identical) but with name=“kls” and
Any one have any ideas what’s going on?
 The limit imposed by bablefish is actually 150 words. However,
as a GET request, URI raises an exception about the URL being too long
far before 150 words are actually reached.