A question for people with English OS

Can you view Japanese documents on the internet with an English OS
without special settings or is it garbled text?
This may seem like a silly question but I have always used a Japanese
OS so I do not know.

Is it just about the browser? Or is this a thing of the past?

Harry

On May 6, 2007, at 12:16 AM, Harry K. wrote:

Kakueki.com is for sale | HugeDomains
A Look into Japanese Ruby List in English

Generally, these days the answer is yes. All modern OS’s that are
widely used use Unicode natively. Windows XP and Vista, OS X and
Ubuntu all ship with multiple languages supported. They’re all
basically language agnostic. Meaning you can switch system languages
as well as browser encodings. Switching the system language may
require logging out/in again or rebooting, depending on the OS.
Japanese is included, including fonts in the standard installations.
The main problem is that browsers do not always catch the encoding.
UTF-8 should be used on all web sites from now on, but many Japanese
sites still encode their content as Shift-JIS or EUC. So it’s really
usually the content authors. In theory Shift-JIS should show up fine
if the browser’s default encoding is set to UTF-8 but often it is
necessary to manually try different encodings.
One problem is that various fonts may not implement some of the
standard Unicode characters included in the range for Japanese in
Unicode. Also, various browser plugins, such as Flash generally don’t
play well with Unicode.
Some sites are also built with older more obscure or platform
specific encodings and ‘mojibake’ is often all you can get.
Some sites built by individuals using WYSIWYG applications may even
end up with pages containing multiple, conflicting encodings.
It’s getting there, but the word on Unicode isn’t completely out
there, and not only in Japan. Many application developers world wide
still do not make the effort or even realize they can.
One more point of contention is that different mobile phones in japan
also often use different encodings still, thus perpetuating some of
the trouble. e-mail client apps also are often troublesome with badly
formed xml/xhtml or non-unicode encodings.
Supporting broad amounts of Unicode does have a little more overhead
than the old encodings, but not much.
If you go with UTF-8 for web sites, regardless of the language, you
should be visible to most modern viewers.
see
http://www.unicode.org
for more on it.
or the w3c’s site.

On 5/6/07, John J. [email protected] wrote:

If you go with UTF-8 for web sites, regardless of the language, you
should be visible to most modern viewers.
see
http://www.unicode.org
for more on it.
or the w3c’s site.

Thanks for the information and the link.
I’ll be using that. I need to study this topic.

I have no immediate plans to use Japanese text at my web site.
But I have some links to Japanese pages (and plan to add more) and
wanted to know if the visitor could see the Japanese text.
I guess that is out of my control but I wanted to know.

Thank you.

Harry

On 05/05/07, Harry K. [email protected] wrote:

Can you view Japanese documents on the internet with an English OS
without special settings or is it garbled text?
This may seem like a silly question but I have always used a Japanese
OS so I do not know.

Is it just about the browser? Or is this a thing of the past?

It is not thing of the past. You need Japanese fonts. Most OSes or
distributions install some but some do not. But this part works in
most cases, and users of obscure distributions are responsible for
their choice I’d guess :wink:

On the other hand, many web page authors fail to specify the encoding
properly. This doesn’t matter for English and a few Western languages.
For most languages that use Latin characters the problem is not
critical, the page is still readable. And many browsers would
autodetect the character set given a hint what language you expect.
But this really hurts for Japanese and other non-Latin scripts. Of
course, I do not set up my browser to try and guess what Japanese
encoding would fit the gibberish I received. There are about five
encodings to try, and only one of them shows some readable characters.

So I would guess that about half of the problem are poorly designed web
pages.

Of course, when you install a Japanese font and view a page that
specifies the encoding properly the page is displayed. You asked
about the ability to read the page which requires special skills of
the reader. So in most cases the proper setup does not help much
anyway :wink:

Thanks

Michal

On May 6, 2007, at 10:01 AM, Harry K. wrote:

Thanks for the information and the link.
Harry
Sure Harry, no prob.
One thing you can do is view the source of those sites you link to.
That will tell you what encoding is being used. The best thing you
can do is e-mail the webmaster of the site to encourage UTF-8
Like Ruby, it’s one of those technologies that spreads slowly at times.

On May 7, 2007, at 9:08 PM, Harry K. wrote:

Harry

Kakueki.com is for sale | HugeDomains
A Look into Japanese Ruby List in English

Japanese. But the page itself is created with bad old HTML with
capital letters in the elements.
It contains no DOCTYPE declaration.
Also the page contains no character set encoding declaration.
It’s basically up to the user-agent (browser application) in this
case to parse it and guess correctly.
User-agents (mostly browsers) often have a ‘quirks mode’ where
they’re really pretty amazingly good at rendering a badly formed page.
When the doctype and encoding are not specified, your results will
either be something readable or a complete mess, or something in
between.
Fortunately the page itself is simple enough that the problems it has
as an html document are not preventing viewing.

Whoever hosts that site (I’ve visited it before) should really spend
a few minutes to update the thing.
If you are having trouble viewing it, or others are, the best bet is
to try a different browser.
It works fine in Safari, which is gecko and KHTML based.
it works fine in Firefox as well.
It even works in Opera.

If it works in those, you can’t ask for much more.
Older browsers may have more trouble.
But good modern browsers are free so there is no reason to support
ancient (by computer standards) technology.

On 5/7/07, Michal S. [email protected] wrote:

But this really hurts for Japanese and other non-Latin scripts. Of
anyway :wink:

Thanks

Michal

Thanks for the input.

Would you look at this without changing any settings and tell me if
you see Japanese or gibberish?

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-list/43471

Thank you.

Harry

http://www.kakueki.com/ruby/list.html
A Look into Japanese Ruby List in English

Harry K. wrote:

Can you view Japanese documents on the internet with an English OS
without special settings or is it garbled text?
This may seem like a silly question but I have always used a Japanese
OS so I do not know.

Is it just about the browser? Or is this a thing of the past?

Harry

The user/reader machine needs to install Chinese/Japanese fonts to see
them, otherwise, they would be all question marks. I am currently
working on a machine without those font installed, therefore, I cannot
read any Chinese etc… :frowning:

On 07/05/07, John J. [email protected] wrote:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-list/43471
It contains no DOCTYPE declaration.
Actually I hate they specified small letters for elements and I do not
care about DOCTYPE declarations. XHTML strict is so limited I never
figured out how it could possibly work. It is probably good enough for
such simple pages, but that’s not the point I guess.

However, the page probably does also have incorrect colors. It looks
like the background of some parts is specified, but not the body
background nor the text and link color. It renders gray on gray for
me.

Also the page contains no character set encoding declaration.

It may be sent in the server headers. It probably is because the
encoding is EUC-JP and Firefox would not figure out without a header
somewhere.
It is questionable if server headers or in-page headers are better.
Both have their strengths and limitations.

I have also seen links to that page already. I had no problems except
the colors and the fact it is in language I do not understand :wink:

Thanks

Michal

On 5/7/07, Roseanne Z. [email protected] wrote:

The user/reader machine needs to install Chinese/Japanese fonts to see
them, otherwise, they would be all question marks. I am currently
working on a machine without those font installed, therefore, I cannot
read any Chinese etc… :frowning:


Posted via http://www.ruby-forum.com/.

OK. I guess I get it.
It’s about the fonts. That was pointed out earlier but I missed it.

Thank you.

Harry

On May 8, 2007, at 12:21 AM, Harry K. wrote:

Harry

A Look into Japanese Ruby List in English

This is also true. If you don’t have any fonts for a particular
language, you won’t be able to view it. Generally speaking, those
that need them do have them, or can get them. They’re included on the
install disk with Windows XP and Vista and OS X installs them by
default. Both of these OS’s take internationalization very seriously.
Linux/BSD/other unixes are more of a mixed bag but support is there.

@Michal
Like it or not, xhtml is here to stay. It is actually very easy
because you don’t have so many attributes crowding your elements.
Lots of software to validate it. It’s intended to be a form of XML so
it uses CSS style sheets.

XHTML and CSS are really really easy to learn.

On 5/8/07, John J. [email protected] wrote:

Linux/BSD/other unixes are more of a mixed bag but support is there.

@Michal
Like it or not, xhtml is here to stay. It is actually very easy
because you don’t have so many attributes crowding your elements.
Lots of software to validate it. It’s intended to be a form of XML so
it uses CSS style sheets.

XHTML and CSS are really really easy to learn.

Thanks, everybody.
I appreciate it.

Harry

On 5/7/07, Roseanne Z. [email protected] wrote:

The user/reader machine needs to install Chinese/Japanese fonts to see
them, otherwise, they would be all question marks. I am currently
working on a machine without those font installed, therefore, I cannot
read any Chinese etc… :frowning:


Posted via http://www.ruby-forum.com/.

Now I am confused.
Some people can see Japanese and some can not.

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-list/43471

This link shows you question marks?
How about this?

http://d.hatena.ne.jp/nappa_zzz/20070429

Thank you.

Harry

xhtml is not one thing there are several varieties at this time.
xhtml 1.0 strict and all future versions have no frames. That kind of
functionality kind of broken (the reason I don’t like RDOC is the
frames). One of the worst problems is bookmarking a page that has
frames. Search engines have a tough time indexing such things as
well. With css you can create the same thing with more control and it
degrades much more gracefully.

I understand your thinking about elements being easier to separate
from content visually with upper-case. But this is what a good text
editor with colors is for.

I use OS X with all the fonts installed. The biggest problem I have is
that many sites do not set the encoding for the pages. I guess someone
using a Japanese computer to visit a Japanese web site would not find
this a problem as the default will be to assume that the site is
Japanese but for me it can be pretty much impossible to view some sites
unless I am prepared to try every possible encoding (and then find out
that they are really written in Chinese or Korean).

I understand that IE can make a pretty good guess at the encoding if it
does not look like Latin but Safari does not seem to hack it.

Fonts and encoding and you are made.

Peter H. wrote:

I use OS X with all the fonts installed. The biggest problem I have is
that many sites do not set the encoding for the pages. I guess someone
using a Japanese computer to visit a Japanese web site would not find
this a problem as the default will be to assume that the site is
Japanese but for me it can be pretty much impossible to view some sites
unless I am prepared to try every possible encoding (and then find out
that they are really written in Chinese or Korean).

I understand that IE can make a pretty good guess at the encoding if it
does not look like Latin but Safari does not seem to hack it.

Browsers aren’t supposed to guess. That IE guesses simply means that IE
has yet another bug, born out of Microsoft’s typical arrogant refusal to
follow standards.

Servers that do not identify the correct encoding are bugged, too. Bitch
to the webmasters until they fix it. Their sites are broken, and should
be fixed. Period.


John W. Kennedy
“Give up vows and dogmas, and fixed things, and you may grow like That.
…you may come to think a blow bad, because it hurts, and not because
it humiliates. You may come to think murder wrong, because it is
violent, and not because it is unjust.”
– G. K. Chesterton. “The Ball and the Cross”

On 07/05/07, John J. [email protected] wrote:

This is also true. If you don’t have any fonts for a particular
language, you won’t be able to view it. Generally speaking, those
that need them do have them, or can get them. They’re included on the
install disk with Windows XP and Vista and OS X installs them by
default. Both of these OS’s take internationalization very seriously.
Linux/BSD/other unixes are more of a mixed bag but support is there.

I am not sure about the permissions on the Windows fonts folder.
Explorer offers to install fonts if it finds a page that cannot be
displayed but you may need special privileges for that.
On modern unix-like (OS X, most *BSD, Linux) systems you can put fonts
in your home folder. Firefox uses fontconfig on systems that use X11
so it can find your fonts both on OS X and most unixes.

@Michal
Like it or not, xhtml is here to stay. It is actually very easy
because you don’t have so many attributes crowding your elements.
Lots of software to validate it. It’s intended to be a form of XML so
it uses CSS style sheets.

XHTML and CSS are really really easy to learn.

I do not say that they are hard to learn or that XHTML is harder than
HTML. I am not against moving functionality from HTML to CSS either. I
liked element names in uppercase because it made them stand out. And I
do not like removing functionality. Frames in all forms are
unsupported or deprecated in XHTML as far as I know.

Thanks

Michal

On 10/05/07, John W. Kennedy [email protected] wrote:

does not look like Latin but Safari does not seem to hack it.

Browsers aren’t supposed to guess. That IE guesses simply means that IE
has yet another bug, born out of Microsoft’s typical arrogant refusal to
follow standards.

Servers that do not identify the correct encoding are bugged, too. Bitch
to the webmasters until they fix it. Their sites are broken, and should
be fixed. Period.

It may be viewed as refusal to follow standards and encouraging bad
webmaster practices (using some proprietary Windows encoding and
relying on Explorer to guess right). On the other hand, it could be
seen as an attempt to remove some burden from the users. A web browser
developer may implement scheme for guessing the encoding on sites that
do not specify it but cannot fix the sites.

However, the right implementation would also include a big fat warning
about the encoding being guessed. This serves both to let the user
know that the site is deficient and may be displayed incorrectly and
to remind the web developer that it should be fixed.

Thanks

Michal

John J. wrote:

xhtml is not one thing there are several varieties at this time. xhtml
1.0 strict and all future versions have no frames.

Neither does HTML 4 strict. Or HTML 3.2. W3C has /never/ wanted frames.

On May 10, 2007, at 7:13 PM, Michal S. wrote:

to remind the web developer that it should be fixed.

Thanks

Michal

True that it would be nice if the user-agent could alert that the
encoding is being guessed at and is actually unknown, but…
how many average users even know what an encoding is? Few. Many
programmers don’t even know about it. It’s a situation caused by
history. In the past, systems were more expensive and less powerful.
Compromises were frequently made. (Y2K bug was a result of the same
problem.) Now there is a sense of unlimited potential for storage and
processing power, but there is still the legacy content and still
content being produced following many different standards.
Some of the problems with web development are also a result of the
first browser war between netscape and microsoft.
At one time there were dozens of ASCII variants out there as well,
hard-coded into machines. Eventually this stuff will all be replaced
by unicode. It has the momentum. The current problem is still partial
implementations and different implementations. Even at the OS level.
I don’t know on windows, but much of this functionality is a service
in OS X, it is provided freely to the programmer by the system.
Still, there are programmers in OS X who just avoid the issue. Cocoa
makes it easy for those developers to support all the languages it
supports in unicode. TextMate, the premier Ruby/Rails editor on OS X,
is a great example of this. It doesn’t support CJK properly. Odd,
because TextEdit does, because it recieves all of that support free
from the system.

Windows must surely have a similar offering to its developers.

Still there are lots of crusty developers out there who don’t see an
issue with this.

So we continue to get a lot of accepted problems with language
support.

The first scripting language to really start to push unicode will be
the winner in the future.