Hi,
I have text that is petty large. What i want to do is split it into an
array of no more than 250 characters per line but finishing on a full
word at the end of each line. So if the last word was “ruby” but ‘b’
was the 250th character in the line then i want the line to cut off
before ruby and then start at ruby for the next line. What i have at
the minute is below which splits the text into an array of lines 250
characters long, but as i say it slices through words at the end of the
250 character limit.
mytext.scan(/.{1,250}/m)
Any ideas?
JB
On 03/06/10 10:19, John B. wrote:
mytext.scan(/.{1,250}/m)
This is a remarkably subtle problem.
The key is to read the input a “word” at a time, and pack words into
the output only if there is room on the current line.
A word break is any newline or whitespace - see below.
Issues to consider:
What if the input has a word longer than the output line? Should it be
split?
What if the input has multiple spaces? Should they be preserved or
collapsed into one?
What about paragraph breaks?
Regards
Ian
Hi –
On Thu, 3 Jun 2010, John B. wrote:
mytext.scan(/.{1,250}/m)
Any ideas?
The \b anchor (word boundary) might help you:
mytext.scan(/.{1,250}\b/m)
at least as a first approximation. You’ll still have some edge cases
and probably have to massage the output though.
David
–
David A. Black, Senior Developer, Cyrus Innovation Inc.
THE Ruby training with Black/Brown/McAnally
COMPLEAT Coming to Chicago area, June 18-19, 2010!
RUBYIST http://www.compleatrubyist.com
On Thu, Jun 3, 2010 at 10:19 AM, John B.
[email protected]wrote:
I have text that is pretty large. What i want to do is split it into an
array of no more than 250 characters per line but finishing on a full
word at the end of each line. So if the last word was “ruby” but ‘b’
was the 250th character in the line then i want the line to cut off
before ruby and then start at ruby for the next line. … Any ideas?
Using as an example the text below (setting to one side my slight
surprise
that Warren Buffett knows the quote by Jacobi), and ignoring any edge
cases
and subtle issues mentioned by Ian H. and David A Black, an idea
might
be:
-
go through the text adding 250 (or whatever) to the “latest” end
position;
-
use rindex and David Black’s suggestion of the \b anchor
to search backwards for the “first” previous word boundary;
-
a refinement is use that as a “first guess” for the next split
position,
and then see if any edge cases or issues raised by Ian H.
should make the split position “earlier”. (But once you’re doing
that, it might perhaps be better to use Ian H.'s suggestion
of going forwards one word at a time and checking if it will fit,
or if there is a paragraph break, etc.)
def split_on_words_with_max_line_length( text, max_line_length )
text_last_index_plus_1 = text.length
ii = nil ; jj = 0; aa = []
while true
ii = jj; jj = ii + max_line_length
if jj < text_last_index_plus_1 then
ww = text.rindex( %r{\b}m, jj )
jj = ww # done like this in case jj needed for edge cases
else
jj = text_last_index_plus_1
end
aa << text[ ii … jj ].strip
break unless jj < text_last_index_plus_1
end
aa
end
text =
“Carl Gustav Jacob Jacobi - Wikipedia”
" Carl Gustav Jacob Jacobi (10 December 1804 - 18 February 1851)"
" was a Prussian mathematician, widely considered to be"
" the most inspiring teacher of his time and one of"
" the greatest mathematicians of all time."
" … It was in algebraic development that Jacobi’s peculiar power"
" mainly lay, and he made important contributions of this kind"
" to many areas of mathematics … One of his maxims was:"
" ‘Invert, always invert’ (‘man muss immer umkehren’),"
" expressing his belief that the solution of many hard problems"
" can be clarified by re-expressing them in inverse form."
“\n\n”
“http://www.ibtimes.com/articles/20100227/”
“invert-always-invert-buffet-advises-shareholders.htm”
" In his annual letter to shareholders, legendary investor"
" Warren Buffett outlined a few approaches to investing"
" and business management that should be avoided."
" Buffett cited Jacobi, a mathematician, who advised problem solvers"
" to "invert, always invert". In other words, instead of trying"
" to find ways of doing something successfully, first find methods"
" that are likely fail and avoid them."
max_line_length = 102 # or whatever
puts text
puts; puts split_on_words_with_max_line_length( text, max_line_length )