Forum: Ruby Text parser / reformatting

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0f84f5e455d71105de3f995eadaea601?d=identicon&s=25 Marc Hoeppner (lasastard)
on 2007-07-09 09:42
Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
    At2g36170
    At3g52590
    CE15495
    7295730
KOG0004
    Hs20476120
    YIL148w
    YKR094c
    SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner
._.

Cheers,

Marc
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-07-09 10:04
(Received via mailing list)
2007/7/9, Marc Hoeppner <marc.hoeppner@molbio.su.se>:
>     At3g52590
>
> the first word of the previous line, given that this line does in fact
> starts with a word (could also be selected by using KOG[0-9]*). I
> thought of storing the KOG[0-9] in a variable, but overall I cant make
> it work and have no real idea how to solve this. Any help would be
> greatly appreciated. Guess for an experienced user this is a three-liner

Hm...  Maybe something like this:

key = nil
ARGF.each do |line|
  line.chomp!
  case line
    when /^(\S+)/
      key = line.strip
    when /^\s+(\S+)/
      print key, " ", $1, "\n" if key
    else
       # ignore
  end
end

Kind regards

robert
807270f56f26ad90755eef71f2c228fe?d=identicon&s=25 Alex Gutteridge (Guest)
on 2007-07-09 10:05
(Received via mailing list)
On 9 Jul 2007, at 16:42, Marc Hoeppner wrote:

>     At2g36170
> not really a table. The desired output would look something like this:
> file,
> Cheers,
>
> Marc
>
> --
> Posted via http://www.ruby-forum.com/.
>

Not a very fancy solution, but it seems to work for the data you
posted. Also uses the pattern you suggested, storing the KOG*
identifier in a variable (field1):

[alexg@powerbook]/Users/alexg/Desktop(7): cat test.rb
field1 = nil
IO.foreach(ARGV[0]) do |l|
   if l.match(/^(\S+)/)
     field1 = $1
   else
     puts "#{field1} #{l.strip}"
   end
end
[alexg@powerbook]/Users/alexg/Desktop(8): cat data.dat
KOG0003
     At2g36170
     At3g52590
     CE15495
     7295730
KOG0004
     Hs20476120
     YIL148w
     YKR094c
     SPAC11G7.04
[alexg@powerbook]/Users/alexg/Desktop(9): ruby test.rb data.dat
KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c
KOG0004 SPAC11G7.04

Alex Gutteridge

Bioinformatics Center
Kyoto University
0f84f5e455d71105de3f995eadaea601?d=identicon&s=25 Marc Hoeppner (lasastard)
on 2007-07-09 10:11
Thanks you two, worked like a charm!


Cheers,

Marc
This topic is locked and can not be replied to.