String#each_*slice* methods (like Enumerable#each_slice)

Hi,

I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don’t
see an equivalent for the String class.

So I’ve monkey-patched String a bit like this:

Monkeypatch String to add some each_slice methods:

class String

Like Enumerable#each_slice() only it yields a string

of chars characters (the slice):

def each_slice(chars)
self.scan(/.{1,#{chars}}/m).each do |s|
yield s
end
end

Like Enumerable#each_slice() only it yields an array

of Fixnum bytes from the string (the slice):

def each_byteslice(bytes)
self.bytes.to_a.each_slice(bytes) do |s|
yield s
end
end

Like Enumerable#each_slice() only it yields a binary

string of specified bytes (the slice):

def each_bslice(bytes)
if encoding == Encoding::BINARY
str = self
else
str = self.dup.force_encoding(Encoding::BINARY)
end
str.scan(/.{1,#{bytes}}/m).each do |s|
yield s
end
end

end

So now for the question. Is there a better way to accomplish
something similar? I’m not debating whether to do it as a monkey
patch or not–that’s irrelevant to me. But is there a more efficient
way to slice up strings and iterate over fixed sized chunks?

One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.

Aaron out.

Am 06.04.2011 19:52, schrieb Aaron D. Gifford:

Use Enumarators:

irb(main):001:0> str = “ÄÄÄÖÖÖÜÜÜ”
=> “ÄÄÄÖÖÖÜÜÜ”
irb(main):002:0> str.chars.each_slice(3){|x| p x}
[“Ä”, “Ä”, “Ä”]
[“Ö”, “Ö”, “Ö”]
[“Ü”, “Ü”, “Ü”]
=> nil
irb(main):003:0> str.bytes.each_slice(3){|x| p x}
[195, 132, 195]
[132, 195, 132]
[195, 150, 195]
[150, 195, 150]
[195, 156, 195]
[156, 195, 156]
=> nil
irb(main):004:0>

Vale,
Marvin

Quintus [email protected] replied:

[195, 132, 195]
Marvin
Yes, I agree, that can work.

As I said in my original post:

One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.

That implementation did use enumerators. But it was slower than
str.scan. Hence my asking if there was a better (faster/more
efficient) way.

I didn’t try benchmarking str.chars.each_slice vs str.scan. I’ll have
to check that out. Thanks for pointing that out to me!

Aaron out.

Looking more closely on the use of str.scan vs str.chars.each_slice
string slicing, it appears that the best one to use depends on what
form of slice one needs.

If I need a string yielded that is a substring (a slice) vs. an array
of characters or array of bytes, then the scan method is consistently
faster on my machine. However, if I want an array of characters or
bytes, then str.chars.each_slice or str.bytes.each_slice is faster.

Most of the time for me, however, I need a substring slice.

Aaron out.

Aaron D. Gifford wrote in post #991274:

Hi,

I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don’t
see an equivalent for the String class.

There’s also slice!():

str = “hello world”

while str.size > 0
substr = str.slice!(0, 3) #(offset, length)
puts “–>#{substr}<–”
p substr.split(//)
end

–output:–
–>hel<–
[“h”, “e”, “l”]
–>lo <–
[“l”, “o”, " "]
–>wor<–
[“w”, “o”, “r”]
–>ld<–
[“l”, “d”]