Regexp to match CJK characters

cafebabe · October 28, 2006, 11:24am

How can I write a regexp to match CJK characters?
Thanks in advance:)

cafebabe · October 28, 2006, 6:25pm

Cafe B. wrote:

How can I write a regexp to match CJK characters?
Thanks in advance:)

print “Yes!” if varname =~ /^CJK$/

If this is not what you wanted, you will simply have to write a longer
post.

cafebabe · October 28, 2006, 6:35pm

Paul L. wrote:

Cafe B. wrote:

How can I write a regexp to match CJK characters?
Thanks in advance:)

print “Yes!” if varname =~ /^CJK$/

If this is not what you wanted, you will simply have to write a longer post.

CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David V.

cafebabe · October 28, 2006, 7:04pm

David V. wrote:

CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David V.

Yes, so how can write the regexp? thanks a lot

cafebabe · October 28, 2006, 7:08pm

Actually I want to strip all chinese characters from a string which
contains English, Chinese and other characters

cafebabe · October 28, 2006, 10:11pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe B. wrote:
| David V. wrote:
|> CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

cafebabe · October 29, 2006, 2:56am

Josef ‘Jupp’ Schugt wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe B. wrote:
| David V. wrote:
|> CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

UTF-8

and

$KCODE=‘u’
require_dependency ‘jcode’,

thanks

cafebabe · October 29, 2006, 4:27pm

On 10/29/06, Cafe B. [email protected] wrote:

UTF-8

and

$KCODE=‘u’
require_dependency ‘jcode’,

You may need to use the Oniguruma patch. I believe this is necessary
to give regular expressions support for character sets other than
plain ASCII.

http://www.geocities.jp/kosako3/oniguruma/

If you’re using Gentoo, all you need to do is remerge Ruby with the
cjk use flag turned on. For other systems, you may need to download
and apply the patch manually. See the Oniguruma site for more details.
If you’re using a 1.9 Ruby, Oniguruma is already built-in.

cafebabe · November 7, 2006, 10:44am

Hi,

In message “Re: regexp to match CJK characters”
on Mon, 30 Oct 2006 00:26:49 +0900, “Dido S.”
[email protected] writes:

|You may need to use the Oniguruma patch. I believe this is necessary
|to give regular expressions support for character sets other than
|plain ASCII.

Regular expression comes with 1.8 does support UTF-8.

						matz.

cafebabe · November 7, 2006, 10:44am

Hi,

In message “Re: regexp to match CJK characters”
on Mon, 30 Oct 2006 12:33:08 +0900, “Kevin J.”
[email protected] writes:

|> Regular expression comes with 1.8 does support UTF-8.
|
|does this mean though that you must do a match on an escaped character
|(\u1234 or on a ‘real’ character?)

You don’t have to escape, if you specify -Ku or $KCODE=‘u’.

						matz.

cafebabe · November 7, 2006, 10:44am

Regular expression comes with 1.8 does support UTF-8.

does this mean though that you must do a match on an escaped character
(\u1234 or on a ‘real’ character?)

Kev