it seems that there is no parameter for the function h() (html_escape())
to indicate the character encoding being used?
for PHP, its htmlspecialchars() function has a dozen encoding possible,
such as UTF-8, Chinese Big5, Chinese GB, Russia, Japanese.
i think thought, h() will work for UTF-8, since h() will only touch the
4 special characters
< > & "
and replace them with < etc and those 4 characters are all in the
0x00 to 0x7F range, and h() will leave the other bytes intact
(unchanged). Now, since a character in UTF-8 can be 1 to 4 bytes, and
that any ASCII will be represented as 1 byte, which is 0x00 to 0x7F
itself, and that 0x80 to 0xFF and other unicode characters will be 2 to
4 bytes long, but with the 1st to 4th bytes all being in the 0x80 to
0xFF range (see UTF-8 http://en.wikipedia.org/wiki/Utf-8 ), so when h()
replaces those 4 ASCII characters, it will successfully do so when h()
sees those 4 characters as a 1-byte character, and then it will bypass
all the 1st to 4th bytes characters because those characters are in the
0x80 to 0xFF range, and therefore can never be matched as one of those 4
special characters, so the job of replacing those 4 characters will be
done with no side effect whatsoever done to the non-ASCII characters.