So I am trying to get some information from a snippet of html
(<table class="listing sortable" id="packages_list"> <thead> - Pastebin.com), and im using doc.inner_text to get the
important parts, but when I do so I get an odd amount of spacing
(pino 0.2.11 - Pastebin.com). is there a way where I can get rid of
all that extra spacing so I can just print the output and it looks
clean? possibly something like
pino
0.2.11-ubuntu0~lucid
troorl
(2010-07-04)
pino
0.2.10-ubuntu0~karmic
troorl
(2010-05-27)
that? or can i get each piece of text and add it to an array? if i do
that while its got all that odd spacing, is that spacing a piece of the
variable? or is it juts the text?
that? or can i get each piece of text and add it to an array? if i do
that while its got all that odd spacing, is that spacing a piece of the
variable? or is it juts the text?
You can remove 2 or more consecutive “\n” like this:
irb(main):001:0> s =<<EOS
irb(main):002:0" test
irb(main):003:0"
irb(main):004:0" test2
irb(main):005:0" sdfsdf
irb(main):006:0" werwer
irb(main):007:0"
irb(main):008:0"
irb(main):009:0"
irb(main):010:0"
irb(main):011:0" sdfsdfsd
irb(main):012:0" sdfer234
irb(main):013:0" EOS
=> “test\n\ntest2\nsdfsdf\nwerwer\n\n\n\n\nsdfsdfsd\nsdfer234\n”
irb(main):019:0> s.gsub /\n\n+/, “\n”
=> “test\ntest2\nsdfsdf\nwerwer\nsdfsdfsd\nsdfer234\n”
Use the String methods: s. strip!, s.gsub! and s.squeeze as in
the following snippet:
no-white.rb - remove empty lines and sequences of blanks
from a text file
fh = File.open(‘6HWDs5dm.txt’)
while( !fh.eof)
line = fh.readline.chomp
# remove leading and trailing blanks
line.strip!
# skip empty lines
next if line == ‘’
# convert tab chars to blanks
line.gsub!(/\t/,’ ‘)
# substitute a single blank for a sequence of blanks
line.squeeze!(’ ')
# add code to process line if needed
puts line
end
fh.close
exit(0)