Issue #8129 has been reported by zmoazeni (Zach Moazeni). ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129 Author: zmoazeni (Zach Moazeni) Status: Open Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 I created a simple ruby script: ``` #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end ``` And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two?
[ruby-trunk - Bug #8129][Open] String#index has drastically different performance when a single unic
on 2013-03-20 00:25
on 2013-03-20 00:41
Issue #8129 has been updated by charliesome (Charlie Somerville). Status changed from Open to Rejected When all the characters in a string are ASCII characters (single bytes), the byte index for any given character can be calculated in constant time. When the string contains multibyte characters, finding the byte index given a character index becomes O(n). If you need fast character indexing, try splitting the string into an array or characters. ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129#change-37748 Author: zmoazeni (Zach Moazeni) Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 I created a simple ruby script: ``` #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end ``` And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two?
[ruby-trunk - Bug #8129] String#index has drastically different performance when a single unicode ch
on 2013-03-20 00:45
Issue #8129 has been updated by nobu (Nobuyoshi Nakada). Description updated ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129#change-37749 Author: zmoazeni (Zach Moazeni) Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 =begin I created a simple ruby script: #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two? =end
[ruby-trunk - Bug #8129] String#index has drastically different performance when a single unicode ch
on 2013-03-20 00:53
Issue #8129 has been updated by nobu (Nobuyoshi Nakada). You may want to: * use regexp, e.g. scan. * convert to fix width wide char encoding, i.e., UTF-32LE or UTF-32BE. ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129#change-37750 Author: zmoazeni (Zach Moazeni) Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 =begin I created a simple ruby script: #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two? =end
[ruby-trunk - Bug #8129] String#index has drastically different performance when a single unicode ch
on 2013-03-20 01:00
Issue #8129 has been updated by zmoazeni (Zach Moazeni). Thanks for the feedback guys. This came up from https://github.com/kschiess/parslet/issues/73 which heavily uses String#index (http://www.ruby-doc.org/core-2.0/String.html#method-i-index) by passing a position to search from as the source content was consumed. ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129#change-37751 Author: zmoazeni (Zach Moazeni) Status: Rejected Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 =begin I created a simple ruby script: #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two? =end
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.