Text parser / reformatting


#1

Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner
._.

Cheers,

Marc


#2

2007/7/9, Marc H. removed_email_address@domain.invalid:

At3g52590

the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner

Hm… Maybe something like this:

key = nil
ARGF.each do |line|
line.chomp!
case line
when /^(\S+)/
key = line.strip
when /^\s+(\S+)/
print key, " ", $1, “\n” if key
else
# ignore
end
end

Kind regards

robert


#3

Thanks you two, worked like a charm!

Cheers,

Marc


#4

On 9 Jul 2007, at 16:42, Marc H. wrote:

At2g36170

not really a table. The desired output would look something like this:
file,
Cheers,

Marc


Posted via http://www.ruby-forum.com/.

Not a very fancy solution, but it seems to work for the data you
posted. Also uses the pattern you suggested, storing the KOG*
identifier in a variable (field1):

[alexg@powerbook]/Users/alexg/Desktop(7): cat test.rb
field1 = nil
IO.foreach(ARGV[0]) do |l|
if l.match(/^(\S+)/)
field1 = $1
else
puts “#{field1} #{l.strip}”
end
end
[alexg@powerbook]/Users/alexg/Desktop(8): cat data.dat
KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04
[alexg@powerbook]/Users/alexg/Desktop(9): ruby test.rb data.dat
KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c
KOG0004 SPAC11G7.04

Alex G.

Bioinformatics Center
Kyoto University