Text parser / reformatting

Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner
._.

Cheers,

Marc

2007/7/9, Marc H. [email protected]:

At3g52590

the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner

Hm… Maybe something like this:

key = nil
ARGF.each do |line|
line.chomp!
case line
when /^(\S+)/
key = line.strip
when /^\s+(\S+)/
print key, " ", $1, “\n” if key
else
# ignore
end
end

Kind regards

robert

Thanks you two, worked like a charm!

Cheers,

Marc

On 9 Jul 2007, at 16:42, Marc H. wrote:

At2g36170

not really a table. The desired output would look something like this:
file,
Cheers,

Marc


Posted via http://www.ruby-forum.com/.

Not a very fancy solution, but it seems to work for the data you
posted. Also uses the pattern you suggested, storing the KOG*
identifier in a variable (field1):

[[email protected]]/Users/alexg/Desktop(7): cat test.rb
field1 = nil
IO.foreach(ARGV[0]) do |l|
if l.match(/^(\S+)/)
field1 = $1
else
puts “#{field1} #{l.strip}”
end
end
[[email protected]]/Users/alexg/Desktop(8): cat data.dat
KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04
[[email protected]]/Users/alexg/Desktop(9): ruby test.rb data.dat
KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c
KOG0004 SPAC11G7.04

Alex G.

Bioinformatics Center
Kyoto University

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs