Regexp to match CJK characters

How can I write a regexp to match CJK characters?
Thanks in advance:)

Cafe B. wrote:

How can I write a regexp to match CJK characters?
Thanks in advance:)

print “Yes!” if varname =~ /^CJK$/

If this is not what you wanted, you will simply have to write a longer
post.

Paul L. wrote:

Cafe B. wrote:

How can I write a regexp to match CJK characters?
Thanks in advance:)

print “Yes!” if varname =~ /^CJK$/

If this is not what you wanted, you will simply have to write a longer post.

CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David V.

David V. wrote:

CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David V.

Yes, so how can write the regexp? thanks a lot

Actually I want to strip all chinese characters from a string which
contains English, Chinese and other characters

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe B. wrote:
| David V. wrote:
|> CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

Josef ‘Jupp’ Schugt wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe B. wrote:
| David V. wrote:
|> CJK = (I think) Chinese, Japanese, Korean. “CJK characters” usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

UTF-8

and

$KCODE=‘u’
require_dependency ‘jcode’,

thanks

On 10/29/06, Cafe B. [email protected] wrote:

UTF-8

and

$KCODE=‘u’
require_dependency ‘jcode’,

You may need to use the Oniguruma patch. I believe this is necessary
to give regular expressions support for character sets other than
plain ASCII.

http://www.geocities.jp/kosako3/oniguruma/

If you’re using Gentoo, all you need to do is remerge Ruby with the
cjk use flag turned on. For other systems, you may need to download
and apply the patch manually. See the Oniguruma site for more details.
If you’re using a 1.9 Ruby, Oniguruma is already built-in.

Hi,

In message “Re: regexp to match CJK characters”
on Mon, 30 Oct 2006 00:26:49 +0900, “Dido S.”
[email protected] writes:

|You may need to use the Oniguruma patch. I believe this is necessary
|to give regular expressions support for character sets other than
|plain ASCII.

Regular expression comes with 1.8 does support UTF-8.

						matz.

Hi,

In message “Re: regexp to match CJK characters”
on Mon, 30 Oct 2006 12:33:08 +0900, “Kevin J.”
[email protected] writes:

|> Regular expression comes with 1.8 does support UTF-8.
|
|does this mean though that you must do a match on an escaped character
|(\u1234 or on a ‘real’ character?)

You don’t have to escape, if you specify -Ku or $KCODE=‘u’.

						matz.

Regular expression comes with 1.8 does support UTF-8.

does this mean though that you must do a match on an escaped character
(\u1234 or on a ‘real’ character?)

Kev