[ANN] How to spy on the Japanese Rubists

"Their Ruby code is surrounded by a shrouded of Japanese symbols. You
know there is gold in there, but its left to the reader to interpret
the purpose of the article. Happy Japanese man? Cranky Japanese man?

"The creator of Ruby is Japanese, the Rubist magazine is in Japanese,
and a great many users of Ruby are Japanese, yet I can’t understand a
word they are saying.

“Wonder no longer…”

http://drnicwilliams.com/2006/08/29/ann-spy-on-the-japanese-rubists/

Cheers
Nic

Dr Nic wrote:

"Their Ruby code is surrounded by a shrouded of Japanese symbols. You
know there is gold in there, but its left to the reader to interpret
the purpose of the article. Happy Japanese man? Cranky Japanese man?

"The creator of Ruby is Japanese, the Rubist magazine is in Japanese,
and a great many users of Ruby are Japanese, yet I can’t understand a
word they are saying.

“Wonder no longer…”

http://drnicwilliams.com/2006/08/29/ann-spy-on-the-japanese-rubists/

Cheers
Nic

From the translation:

  • Q. [ru] [bi] [ma], [tsu] [te] â??green onion [ma]!â?It is stealing?
  • A. it is different. Perhaps. Of as for the people who thought the â??[ru] [bi] [ma]â? â??the green onion [ma]!â?You did not know, until (or, it is said, you did not become aware).

Yeah, that works great. Thanks for sharing!

I’m just kidding. Thanks for that link. I already had a firefox
plugin, but the bookmarklet is nice for whole pages.

William C. wrote:

I’m just kidding. Thanks for that link. I already had a firefox
plugin, but the bookmarklet is nice for whole pages.

What is the FF extension?

I like FoxLingo
https://addons.mozilla.org/firefox/2444/
http://www.concisefreeware.com/foxlingo.php

and rikaichan
https://addons.mozilla.org/firefox/2471/
http://www.polarcloud.com/rikaichan/

– Tom.

On 8/29/06, William C. [email protected] wrote:


Posted via http://www.ruby-forum.com/.


“Nothing will ever be attempted, if all
possible objections must first be
overcome.” - Samuel Johnson

“Luck is what happens when
preparation meets opportunity.” - Seneca

Dr Nic wrote:

William C. wrote:

I’m just kidding. Thanks for that link. I already had a firefox
plugin, but the bookmarklet is nice for whole pages.

What is the FF extension?

I use gTranslate. https://addons.mozilla.org/firefox/918/

Max M. wrote:

Translating the Rubist Magazine FAQ came up with the following gem:

Q. there is no article of the ~~ in the [ru] [bi] [ma] why?
Motivation bean jam?

I too get Motivation bean jam sometimes… it’s not something I would
talk about in mixed company, though.

Different cultures there Max. They love the bean jam. Talk about it
constantly. In southern Osaka there is a joke that starts, “3 bean jam
factory workers walk into a bar…”.

Translating the Rubist Magazine FAQ came up with the following gem:

Q. there is no article of the ~~ in the [ru] [bi] [ma] why?
Motivation bean jam?

I too get Motivation bean jam sometimes… it’s not something I would
talk about in mixed company, though.

:wink:

Max

Paul R. wrote:

Why is machine translation so bad anyway? It’s only changing one set
of vocabulary and grammar for an equivalent set - it would seem that
the biggest problem with most tools out there is ridiculously small
foreign language dictionaries and tiny grammar rule sets. Very odd,
and probably quite easy to fix.

I think language translation is up there with AI in terms of complexity.
Even if a machine can translate 80% (made up number) of grammar
rulesets, the results are still going to give you Bean Jam-like results,
everytime. Why? We’re picky about this stuff - we’ll stop incorrect
grammar and terminology and laugh at it and wonder “Why is machine
translation so hard?” :slight_smile:

Trivial trivia: India - home of 1.2b people - has 1100 different spoken
dialects.

On 29 Aug 2006, at 23:50, Max M. wrote:

Translating the Rubist Magazine FAQ came up with the following gem:

Q. there is no article of the ~~ in the [ru] [bi] [ma] why?
Motivation bean jam?

I too get Motivation bean jam sometimes… it’s not something I would
talk about in mixed company, though.

That’s such a great name for a Ruby blog…

In fact, we could turn ‘Engrish’ back on itself and do thousands of
new t-shirts off this tool. I’d wear a t-shirt depicting Motivation
bean jam. Maybe.

Why is machine translation so bad anyway? It’s only changing one set
of vocabulary and grammar for an equivalent set - it would seem that
the biggest problem with most tools out there is ridiculously small
foreign language dictionaries and tiny grammar rule sets. Very odd,
and probably quite easy to fix.

Paul R. wrote:

On 29 Aug 2006, at 23:50, Max M. wrote:

Translating the Rubist Magazine FAQ came up with the following gem:

Q. there is no article of the ~~ in the [ru] [bi] [ma] why?
Motivation bean jam?

I too get Motivation bean jam sometimes… it’s not something I would
talk about in mixed company, though.

That’s such a great name for a Ruby blog…

In fact, we could turn ‘Engrish’ back on itself and do thousands of
new t-shirts off this tool. I’d wear a t-shirt depicting Motivation
bean jam. Maybe.

Why is machine translation so bad anyway? It’s only changing one set
of vocabulary and grammar for an equivalent set - it would seem that
the biggest problem with most tools out there is ridiculously small
foreign language dictionaries and tiny grammar rule sets. Very odd,
and probably quite easy to fix.

Their language is based more upon sounds than most. Each character
represents a sound, and then sounds together create a word. Our letters
do have sounds, but the complete sound is only made with a combination
of letters. ‘rubima’ (Ruby Magazine) can only be written 1 way in their
language, and apparently also means ‘motivation bean jam’. Any time
they shorten something like that, it’s almost assured to also mean
something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Their language is based more upon sounds than most. Each character
represents a sound, and then sounds together create a word. Our letters
do have sounds, but the complete sound is only made with a combination
of letters. ‘rubima’ (Ruby Magazine) can only be written 1 way in their
language, and apparently also means ‘motivation bean jam’. Any time
they shorten something like that, it’s almost assured to also mean
something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Back to technology - now I need a RSS feed proxy that auto-translates
RSS feeds before my feedreader gets them. Netvibes won’t be happy if I
send it off to Google for Translating :slight_smile:

The Google Translator returns a split frame panel. The content frame’s
url is:
/translate_p?hl=en&ie=UTF-8&oe=UTF-8&langpair=ja&u=TARGETURL

And that redirects to /translate_c?..

(Note that the en has been removed from the langpair).

But theoretically, if we pass a RSS feed to
/translate_c?hl=en&ie=UTF-8&oe=UTF-8&langpair=ja&u=FEED.XML

it should autotranslate each time it loads.

I passed it: 家庭内インフラ管理者の独り言(はなずきんの日記っぽいの)
(a random feed I found)

Unfortunately, Google Translate doesn’t seem to work on RSS feeds. Pity.

Dr Nic wrote:

Unfortunately, Google Translate doesn’t seem to work on RSS feeds. Pity.

I’ve summarised this all here:
http://drnicwilliams.com/2006/08/30/translation-of-rss-feeds-a-failure/

Looks like it ignores the feed as bad input and merely redirects you
back to the feed.

Anyone know anyone at Google?

Their language is based more upon sounds than most.

Well, Japanese does have very regular pronounciation, there are
relatively few syllables, around 50, and these tend to be pronounced
much more consistently than in other languages.

Each character
represents a sound, and then sounds together create a word.
Our letters
do have sounds, but the complete sound is only made with a combination
of letters.

Sort of, actually there are three Japanese ‘character sets’ which are
used for different purposes.

Kanji, are the Chinese pictorial characters, each Kanji stands for a
word or a concept. One of the neat things about Kanji is that
speakers of languages which use Chinese characters can often read
written material despite the fact that the writer and reader speak
different languages.

There are two character sets (kana) in which each character represents
a sylable. Hiragana is used for writing Japanese words often in
combination with Kanji. Japanese children usually learn hiragana
first, since there are far fewer symbols than Kanji. The forms of the
Hiragana characters are derived from Kanji, and look curvier than…
Katakana, which covers the same syllables as Hiragana, but is used
primarily for writing words borrowed or adapted from other languages.

There’s also romanji which is the english/european alphabet, which is
used to directly quote foreign names, and sometimes as a
transliteration of katakana/hiragana. Foreign words tend to get
modified to match the closest Japanese pronunciation so “Miss America”
would get rendered as Mi-su A-me-ri-ka in romanji, with the Japanese
‘mi’ being pronounced something like ‘me’, the Japanese ‘me’ like the
english ‘may,’ and the Japanese ‘ri’ something between ‘ree’ and
‘lee,’ actually a sound close to ‘dee.’

‘rubima’ (Ruby Magazine) can only be written 1 way in their
language,

Actually, I’m pretty sure that rubima comes from a popular style of
jargon used by young Japanese, and among varous Japanese enthusiast
groups which comes from abreviating, usually English, words and
phrases. There are at least two words for magazine(periodical) in
Japanese, ma-ga-jin which would be written in katakana, or zasshi
which would be written in kanji.

and apparently also means ‘motivation bean jam’. Any time
they shorten something like that, it’s almost assured to also mean
something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Well maybe it’s coming from there. I’ve discussed this with my
Japanese-American wife, and she couldn’t figure out how. The only
thing I can think of as being bean jam would be anko, which is a
popular stiff jelly-like sweet made from the azuki bean.

I suspect that it’s really coming from a translation of something
else, perhaps someone’s name, sort of like a rote translator of
Italian, translating my last name to “of Christmas.”

But then, I could be wrong.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Paul R. wrote:

That’s such a great name for a Ruby blog…

In fact, we could turn ‘Engrish’ back on itself and do thousands of new
t-shirts off this tool. I’d wear a t-shirt depicting Motivation bean
jam. Maybe.

Heh heh heh… :slight_smile:

Why is machine translation so bad anyway? It’s only changing one set of
vocabulary and grammar for an equivalent set - it would seem that the
biggest problem with most tools out there is ridiculously small foreign
language dictionaries and tiny grammar rule sets. Very odd, and
probably quite easy to fix.

Not to be unkind, but that’s very naive. I can understand people
thinking that way… but only until they’ve studied the field in
detail, or better yet, tried it themselves.

So try it. If you fail, you’ll have learned. If you succeed, you’ll
be rich and famous.

Hal

On 8/30/06, Hal F. [email protected] wrote:

Paul R. wrote:

So try it. If you fail, you’ll have learned. If you succeed, you’ll
be rich and famous.

There’a a famous story about an early attempt on machine translation.

Sometime in the 1960s the US Air Force funded a project for
Russian/English translation.

They decided that good test cases could be obtained by taking famous
quotations, translate them to Russian and then back again.

Two of the tests were:

“Out of sight, out of mind.”
and
“The spirit is willing, but the flesh is weak”

and they came back respectively as:
“Invisible idiot”
and
“The vodka is strong, but the meat is rotten.”

T’aint just a matter of grammar and vocabulary.

I’ve been trying to program my meatware to do NL translation for 40 or
more years with limited success.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Hi,

At Thu, 31 Aug 2006 02:35:55 +0900,
Rick DeNatale wrote in [ruby-talk:211552]:

There’s also romanji which is the english/european alphabet, which is

Japanese say it romaji, without ‘n’.

and apparently also means ‘motivation bean jam’. Any time
they shorten something like that, it’s almost assured to also mean
something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Well maybe it’s coming from there. I’ve discussed this with my
Japanese-American wife, and she couldn’t figure out how. The only
thing I can think of as being bean jam would be anko, which is a
popular stiff jelly-like sweet made from the azuki bean.

It came from “yaruki an’noka?”, which means “do you have the
motivation?”. “an’noka” is rough expression for “arunoka”.

On 8/30/06, William C. [email protected] wrote:

Their language is based more upon sounds than most. Each character
represents a sound, and then sounds together create a word. Our letters
do have sounds, but the complete sound is only made with a combination
of letters. ‘rubima’ (Ruby Magazine) can only be written 1 way in their
language, and apparently also means ‘motivation bean jam’. Any time
they shorten something like that, it’s almost assured to also mean
something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Frankly, I think this is absurd. I’m studying Japanese myself, and
from an online dictionary (http://kanjidict.stc.cx) their word for
‘motivation’ is ‘shigeki’ or ‘mochibeshon’ (imported from English and
mangled in the way the Japanese tend to mangle borrowed foreign
words), and as Mr. DeNatale has mentioned, ‘anko’ means red bean jam.
Apparently the only common words that begin with ‘ru-bi’ are their
imported word Ruby itself, and the chemical element Rubidium
(e$B%k%S%8%&%`e(B,
ru-bi-ji-u-mu). There are also apparently no common Kanji that has the
reading ru-bi or bi-ma, and of the 26 or so kanji that have the
reading (either on-yomi or kun-yomi) ‘ru’ none of them have a meaning
even remotely close to any of ‘motivation’ or ‘bean’ or ‘jam’
(however, interestingly enough, there are apparently several kanji
with the reading ‘ru’ that mean ‘precious stone’ or ‘lapis lazuli’).
Someone’s translation software is really screwed up if it rendered
ru-bi-ma as motivation bean paste, however it was written.

Dido S. wrote:

Someone’s translation software is really screwed up if it rendered
ru-bi-ma as motivation bean paste, however it was written.

But the bookmarklet idea is fun though :slight_smile:

Also, I added some javascript for ppl to add to their own sites to allow
easy translation of their own site. -
http://drnicwilliams.com/2006/08/30/foreign-tourists-to-your-websites-part-2/

Then the Japanese can laugh at the translations in their forum… :slight_smile:

On 8/30/06, [email protected] [email protected] wrote:

something else. The translator has no way of knowing this was a short
form of other words, and does its best to translate.

Well maybe it’s coming from there. I’ve discussed this with my
Japanese-American wife, and she couldn’t figure out how. The only
thing I can think of as being bean jam would be anko, which is a
popular stiff jelly-like sweet made from the azuki bean.

It came from “yaruki an’noka?”, which means “do you have the
motivation?”. “an’noka” is rough expression for “arunoka”.

Domo-arrigato Nobu-sensei.

I wonder where the “bean jam” translation is coming from.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

On 31 Aug 2006, at 14:25, Michal S. wrote:

Heh, it’s not so.

Have you ever tried to learn a foriegn language?

Only romance languages - French is particularly easy if you’re
natively English and know a little about construction of sentences
from latin.

For one, I heared that Eskimos use some tens of various words for
different kinds of snow and ice. You get the idea.

They’re called inuits, not eskimos - calling somebody an eskimo is
like calling them a nigger, i.e. highly offensive - but I’m aware of
what you mean by the snow/ice thing.

However, this is just social slang, all cultures have it, and there
is more slang for those things that culture is obsessed by. Think how
many different terms there are in western culture for genitals and
having sex and getting drunk (yes, I know this is a sad statement on
western culture) - it’s exactly the same thing. A big enough
dictionary takes care of it.

Even in areas where the needs for precision weren’t very different the
slicing often happens at different places. So one word would be
translated as different words in different contexts, often in both
ways.

That is just basic grammar though - if you know that that a pronoun
references an antecedent, and you can spot the antecedent, you can
deal with the pronoun correctly. If you are able to identify the
subject and predicate of a sentence and identify a passive or active
voice, spot tense, etc. then you can translate easily.

This isn’t hard - a 2-year old baby can do it with very primitive
language skills.

My point is that current translation technology doesn’t even attempt
to try and do that.

And the grammar is far from equivalent. I already found English
sentences that I can understand (or so I think) but which would need
several sentences to be explained in my native language.
And these are both Indoeuropean languages. Japanese grammar is much
more interesting :slight_smile:

Maybe translation should move away from the literal word-for-word
translation of words and move toward being able to express an
identical idea or thought.

The only correct translation can come from analyzing the meaning of a
sentence in context of the previous text, and constructing sentence(s)
in the other language with similar meaning. Of course, this is nearly
impossible to do with a computer.

Shouldn’t be - all a brain is, is a computer - and I know that gcc
can parse syntax within a context far more accurately than I can.
What you actually need is a compiler of natural languages - a very,
very big syntax parsing mechanism - and then we have all the right
tools we need. If somebody is able to write a tool that can translate
ruby into C accurately, I don’t see why somebody can’t write a tool
that can translate Japanese into English using a similar set of methods.

Anyway, enough of that, we’re way off topic. Interesting discussion. :slight_smile: