Group several lines into one line


#1

After sitting at this problem for hours without much progress, now is
the time that I need your help. Btw. I’m quite new to ruby.

I have a file that looks like this:

00-04-00;Austragungssystem für längliche oder
00-04-00;quadratische Lagerräume inklusive 3-
00-04-00;poligen Wielandstecker/Gegenstecker
00-04-00;Technische Daten:
00-04-00;Elektrischer Anschluss: 230V / 50Hz
00002274;Wilo Temperaturfühler TF
00002274;Temperaturschalter mit Einstellknopf
00002274;einschließlich 2 Stück Federspannbändern
00002274;zum Anlegen an Rohre bis DN 100.
00002274;Max. Betriebsspannung: 250 V
00002274;Max. Schaltleistung: 4 A
00002274;Schutzart: IP 43
00002274;Schaltbereich: 30 oC bis 90 oC
00002274;Fabrikat: WILO
00002274;Typ: Temperaturschalter TF

what i need is to group lines with the same serialnumber
(eg. 00-04-00, 00002274) into one line like this:

00-04-00;Austragungssystem für längliche oder quadratische Lagerräume
inklusive 3-poligen Wielandstecker/Gegenstecker Technische Daten:
Elektrischer Anschluss: 230V / 50Hz

I need several lines combined to one line with the matching serialnumber
in front.

I searched for several hours on the internet and in several forums.
Maybe I was looking for the wrong search terms. But I’m at a loss here.

Thanks in advance.


#2

Dirk D. wrote:

I need several lines combined to one line with the matching serialnumber
in front.
I’m not good at coming up with the code examples off the top of my head
but maybe a regex matching the serial numbers and then a split to pull
off the portion you want then combining the matched portion.


#3

On Mon, Apr 27, 2009 at 8:25 PM, Dirk D. removed_email_address@domain.invalid wrote:

00002274;Wilo Temperaturfühler TF
what i need is to group lines with the same serialnumber
(eg. 00-04-00, 00002274) into one line like this:

00-04-00;Austragungssystem für längliche oder quadratische Lagerräume
inklusive 3-poligen Wielandstecker/Gegenstecker Technische Daten:
Elektrischer Anschluss: 230V / 50Hz

Something like this (untested) might get you started:

s = “00-40…” #your string
h = Hash.new {|h,k| h[k] = “”}
s.each do |line|
key, value = line.split(";")
h[key] << value.chomp
end

This will give you a hash where the keys are the serial numbers and
the values, the concatenated parts that correspond to that serial
number.

Jesus.


#4

Jesús Gabriel y Galán wrote:

00-04-00;Elektrischer Anschluss: 230V / 50Hz

s = “00-40…” #your string
h = Hash.new {|h,k| h[k] = “”}
s.each do |line|
key, value = line.split(";")
h[key] << value.chomp
end

That won’t quite work if there is a second semicolon on the line. Try:

key, value = line.scan(/([^;]);(.)/).first


#5

Thanks a lot to all.

Rob,
your solution works like a charme.

Now i’ll just do some heavy commenting, to understand it all and for
reference.


#6

On Apr 27, 2009, at 2:59 PM, Jesús Gabriel y Galán wrote:

00-04-00;Elektrischer Anschluss: 230V / 50Hz

s = “00-40…” #your string
Jesus.
current_serial = nil
texts = []
File.open(outputfilename, ‘w’) do |out|
File.foreach(filename) do |line|
serial, text = line.chomp.split(’;’, 2)
if current_serial && serial != current_serial
out.puts “#{current_serial};#{texts.join(’ ')}”
texts = []
end
current_serial = serial
texts << text
end
if current_serial
out.puts “#{current_serial};#{texts.join(’ ')}”
end
end

Two things of note: The second argument to split(’;’, 2) limits the
result to 2 items so if there happens to be a ‘;’ later in the line it
isn’t considered a place to split. Keeping the items in a hash
doesn’t guarantee the order on the way back out, but I’m also assuming
that all the lines with a given serial number are together.

You have to initialize your filename and outputfilename, of course.
-Rob

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid


#7

The desired output cannot be achieved unless your log preserves
trailing spaces. Thus:

00002274;Temperaturschalter mit Einstellknopfeinschließlich 2
Stück Federspannbändernzum Anlegen an Rohre bis DN 100.Max.
Betriebsspannung: 250 VMax. Schaltleistung: 4 ASchutzart: IP
43Schaltbereich: 30 oC bis 90 oCFabrikat: WILOTyp: Temperaturschalter TF

My suggestion is to pad with a space unless the line ends in a hyphen.

Picayune perhaps, but aesthetics (readability) counts.

Bob Schaaf