Hi,
I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don’t
see an equivalent for the String class.
So I’ve monkey-patched String a bit like this:
Monkeypatch String to add some each_slice methods:
class String
Like Enumerable#each_slice() only it yields a string
of chars characters (the slice):
def each_slice(chars)
self.scan(/.{1,#{chars}}/m).each do |s|
yield s
end
end
Like Enumerable#each_slice() only it yields an array
of Fixnum bytes from the string (the slice):
def each_byteslice(bytes)
self.bytes.to_a.each_slice(bytes) do |s|
yield s
end
end
Like Enumerable#each_slice() only it yields a binary
string of specified bytes (the slice):
def each_bslice(bytes)
if encoding == Encoding::BINARY
str = self
else
str = self.dup.force_encoding(Encoding::BINARY)
end
str.scan(/.{1,#{bytes}}/m).each do |s|
yield s
end
end
end
So now for the question. Is there a better way to accomplish
something similar? I’m not debating whether to do it as a monkey
patch or not–that’s irrelevant to me. But is there a more efficient
way to slice up strings and iterate over fixed sized chunks?
One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.
Aaron out.
Am 06.04.2011 19:52, schrieb Aaron D. Gifford:
Use Enumarators:
irb(main):001:0> str = “ÄÄÄÖÖÖÜÜÜ”
=> “ÄÄÄÖÖÖÜÜÜ”
irb(main):002:0> str.chars.each_slice(3){|x| p x}
[“Ä”, “Ä”, “Ä”]
[“Ö”, “Ö”, “Ö”]
[“Ü”, “Ü”, “Ü”]
=> nil
irb(main):003:0> str.bytes.each_slice(3){|x| p x}
[195, 132, 195]
[132, 195, 132]
[195, 150, 195]
[150, 195, 150]
[195, 156, 195]
[156, 195, 156]
=> nil
irb(main):004:0>
Vale,
Marvin
Quintus [email protected] replied:
[195, 132, 195]
Marvin
Yes, I agree, that can work.
As I said in my original post:
One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.
That implementation did use enumerators. But it was slower than
str.scan. Hence my asking if there was a better (faster/more
efficient) way.
I didn’t try benchmarking str.chars.each_slice vs str.scan. I’ll have
to check that out. Thanks for pointing that out to me!
Aaron out.
Looking more closely on the use of str.scan vs str.chars.each_slice
string slicing, it appears that the best one to use depends on what
form of slice one needs.
If I need a string yielded that is a substring (a slice) vs. an array
of characters or array of bytes, then the scan method is consistently
faster on my machine. However, if I want an array of characters or
bytes, then str.chars.each_slice or str.bytes.each_slice is faster.
Most of the time for me, however, I need a substring slice.
Aaron out.
Aaron D. Gifford wrote in post #991274:
Hi,
I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don’t
see an equivalent for the String class.
There’s also slice!():
str = “hello world”
while str.size > 0
substr = str.slice!(0, 3) #(offset, length)
puts “–>#{substr}<–”
p substr.split(//)
end
–output:–
–>hel<–
[“h”, “e”, “l”]
–>lo <–
[“l”, “o”, " "]
–>wor<–
[“w”, “o”, “r”]
–>ld<–
[“l”, “d”]