Forum: Ruby Group several lines into one line

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
D324101dde0ff6e8a42c387ccc71dadc?d=identicon&s=25 Dirk Dre (dad)
on 2009-04-27 20:25
After sitting at this problem for hours without much progress, now is
the time that I need your help. Btw. I'm quite new to ruby.

I have a file that looks like this:

00-04-00;Austragungssystem für längliche oder
00-04-00;quadratische Lagerräume inklusive 3-
00-04-00;poligen Wielandstecker/Gegenstecker
00-04-00;Technische Daten:
00-04-00;Elektrischer Anschluss: 230V / 50Hz
00002274;Wilo Temperaturfühler TF
00002274;Temperaturschalter mit Einstellknopf
00002274;einschließlich 2 Stück Federspannbändern
00002274;zum Anlegen an Rohre bis DN 100.
00002274;Max. Betriebsspannung: 250 V
00002274;Max. Schaltleistung: 4 A
00002274;Schutzart: IP 43
00002274;Schaltbereich: 30 oC bis 90 oC
00002274;Fabrikat: WILO
00002274;Typ: Temperaturschalter TF

what i need is to group lines with the same serialnumber
(eg. 00-04-00, 00002274) into one line like this:

00-04-00;Austragungssystem für längliche oder quadratische Lagerräume
inklusive 3-poligen Wielandstecker/Gegenstecker Technische Daten:
Elektrischer Anschluss: 230V / 50Hz

I need several lines combined to one line with the matching serialnumber
in front.

I searched for several hours on the internet and in several forums.
Maybe I was looking for the wrong search terms. But I'm at a loss here.

Thanks in advance.
9cce4cca531f835e951309aa39bb421b?d=identicon&s=25 Michael Furmaniuk (mfurmaniuk)
on 2009-04-27 20:52
Dirk Dre wrote:
> I need several lines combined to one line with the matching serialnumber
> in front.
I'm not good at coming up with the code examples off the top of my head
but maybe a regex matching the serial numbers and then a split to pull
off the portion you want then combining the matched portion.
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2009-04-27 21:00
(Received via mailing list)
On Mon, Apr 27, 2009 at 8:25 PM, Dirk Dre <dad@pulf.de> wrote:
> 00002274;Wilo Temperaturfühler TF
> what i need is to group lines with the same serialnumber
> (eg. 00-04-00, 00002274) into one line like this:
>
> 00-04-00;Austragungssystem für längliche oder quadratische Lagerräume
> inklusive 3-poligen Wielandstecker/Gegenstecker Technische Daten:
> Elektrischer Anschluss: 230V / 50Hz


Something like this (untested) might get you started:

s = "00-40..." #your string
h = Hash.new {|h,k| h[k] = ""}
s.each do |line|
   key, value = line.split(";")
   h[key] << value.chomp
end

This will give you a hash where the keys are the serial numbers and
the values, the concatenated parts that correspond to that serial
number.

Jesus.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2009-04-27 21:24
(Received via mailing list)
Jesús Gabriel y Galán wrote:
>> 00-04-00;Elektrischer Anschluss: 230V / 50Hz
>>
> s = "00-40..." #your string
> h = Hash.new {|h,k| h[k] = ""}
> s.each do |line|
>    key, value = line.split(";")
>    h[key] << value.chomp
> end

That won't quite work if there is a second semicolon on the line. Try:

   key, value = line.scan(/([^;]*);(.*)/).first
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2009-04-27 21:37
(Received via mailing list)
On Apr 27, 2009, at 2:59 PM, Jesús Gabriel y Galán wrote:

>> 00-04-00;Elektrischer Anschluss: 230V / 50Hz
>>
> s = "00-40..." #your string
> Jesus.
current_serial = nil
texts = []
File.open(outputfilename, 'w') do |out|
   File.foreach(filename) do |line|
     serial, text = line.chomp.split(';', 2)
     if current_serial && serial != current_serial
       out.puts "#{current_serial};#{texts.join(' ')}"
       texts = []
     end
     current_serial = serial
     texts << text
   end
   if current_serial
     out.puts "#{current_serial};#{texts.join(' ')}"
   end
end

Two things of note: The second argument to split(';', 2) limits the
result to 2 items so if there happens to be a ';' later in the line it
isn't considered a place to split.  Keeping the items in a hash
doesn't guarantee the order on the way back out, but I'm also assuming
that all the lines with a given serial number are together.

You have to initialize your filename and outputfilename, of course.
-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
D324101dde0ff6e8a42c387ccc71dadc?d=identicon&s=25 Dirk Dre (dad)
on 2009-04-28 02:36
Thanks a lot to all.

Rob,
your solution works like a charme.

Now i'll just do some heavy commenting, to understand it all and for
reference.
6e191124d011c102f295ae1bccf20ee1?d=identicon&s=25 Robert Schaaf (Guest)
on 2009-05-01 03:04
(Received via mailing list)
The desired output cannot be achieved unless your log preserves
trailing spaces.  Thus:

00002274;Temperaturschalter mit Einstellknopfeinschließlich 2
Stück  Federspannbändernzum Anlegen an Rohre bis DN 100.Max.
Betriebsspannung: 250 VMax. Schaltleistung: 4 ASchutzart: IP
43Schaltbereich: 30 oC bis 90 oCFabrikat: WILOTyp: Temperaturschalter TF

My suggestion is to pad with a space unless the line ends in a hyphen.

Picayune perhaps, but aesthetics (readability) counts.

Bob Schaaf
This topic is locked and can not be replied to.