Forum: Ruby Describing degerate dna strings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cd4de09b6a6cdbbf291b5963d6f02efd?d=identicon&s=25 George George (george_g)
on 2009-01-16 07:55
I am working with strings of 4 letter alphabet a,c,t,g that describe
biological dna sequences. sometimes a sequence can be described as
ac[ta]cct  meaning that at position 3 you are can have 't 'or an 'a'
without changing the biological function of the sequence.

Given ac[ta]cct as input i would like to generate a set of strings such
that it gives me the various combination of the strings that can
represent the above degenerate sequence e.g
 1. actcct
 2. acacct

both satisfy the above degeneracy.

any ideas?
thank you
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-16 12:24
> any ideas?

Here's a simple recursive expansion, with a block callback for each
sequence found.

def expand_seq(src, &blk)
  if src =~ /\A(.*?)\[(.*?)\](.*)\z/m
    prefix, chars, suffix = $1, $2, $3
    chars.split(//).each do |ch|
      expand_seq(prefix + ch + suffix, &blk)
    end
  else
    yield src
  end
end

expand_seq "ac[ta]cct[gt]c" do |seq|
  puts seq
end
Cd4de09b6a6cdbbf291b5963d6f02efd?d=identicon&s=25 George George (george_g)
on 2009-01-16 14:13
Thank you!


Brian Candler wrote:
>> any ideas?
>
> Here's a simple recursive expansion, with a block callback for each
> sequence found.
>
> def expand_seq(src, &blk)
>   if src =~ /\A(.*?)\[(.*?)\](.*)\z/m
>     prefix, chars, suffix = $1, $2, $3
>     chars.split(//).each do |ch|
>       expand_seq(prefix + ch + suffix, &blk)
>     end
>   else
>     yield src
>   end
> end
>
> expand_seq "ac[ta]cct[gt]c" do |seq|
>   puts seq
> end
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2009-01-16 14:34
(Received via mailing list)
On Fri, Jan 16, 2009 at 7:54 AM, George George
<george.githinji@gmail.com> wrote:
>
> both satisfy the above degeneracy.
>
> any ideas?

Hi, this reminded me so much of a Ruby Quiz I solved that I wanted to
mention it :-)

http://rubyquiz.com/quiz143.html
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... (my
solution)

This code generates all strings that match a regexp. So we are left
with the task of converting your strings to regexps:

irb(main):010:0> require 'quiz143'
=> true
irb(main):011:0> def expand a
irb(main):012:1> re = Regexp.new(a.gsub(/\[(.*?)\]/) {|m|
"(#{$1.split(//).join("|")})"})
irb(main):013:1> re.generate
irb(main):014:1> end
=> nil
irb(main):015:0> expand "ac[ta]cct"
=> ["actcct", "acacct"]

It's probably overkill for your needs.

Jesus.
Cd4de09b6a6cdbbf291b5963d6f02efd?d=identicon&s=25 George George (george_g)
on 2009-01-16 14:57
> => nil
> irb(main):015:0> expand "ac[ta]cct"
> => ["actcct", "acacct"]
>
> It's probably overkill for your needs.
>
> Jesus.

hi Jesus!
Thank you for referencing me to that quiz, its nice to study the code.
That exactly solves one of the problems that i had while looking for dna
motifs which are represented as regular expressions, but need to be
expanded if you gonna use them as possible dna primers. and then such
back and see which one gives the best predictive value ... blah blah ...
Sorry for the bio talk :)

Thank you so much!!

GG
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2009-01-16 15:12
(Received via mailing list)
On Fri, Jan 16, 2009 at 2:56 PM, George George
<george.githinji@gmail.com> wrote:
> Thank you for referencing me to that quiz, its nice to study the code.
> That exactly solves one of the problems that i had while looking for dna
> motifs which are represented as regular expressions, but need to be
> expanded if you gonna use them as possible dna primers. and then such
> back and see which one gives the best predictive value ... blah blah ...
> Sorry for the bio talk :)

You are welcome. Just a comment on the above: I have realized that if
each position of the sequence is just one character, then your
original string is already a valid regexp for the problem, so no need
to change [ta] to (t|a) as I was doing, cause [ta] is a character
class with those two possibilities and those work too:

irb(main):001:0> require 'quiz143'
=> true
irb(main):002:0> /#{"ac[ta]cc"}/.generate
=> ["actcc", "acacc"]

:-)

Jesus.
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2009-01-16 17:53
(Received via mailing list)
On Jan 16, 2009, at 9:10 AM, Jesús Gabriel y Galán wrote:
>>
>
> You are welcome. Just a comment on the above: I have realized that if
> each position of the sequence is just one character, then your
> original string is already a valid regexp for the problem, so no need
> to change [ta] to (t|a) as I was doing, cause [ta] is a character
> class with those two possibilities and those work too:
>
> irb(main):001:0> require 'quiz143'
> => true
> irb(main):002:0> /#{"ac[ta]cc"}/.generate
> => ["actcc", "acacc"]

No need to do the string interpolation there:
   /ac[ta]cc/.generate
Or if you have that in a string:
   x="ac[ta]cc"
   Regexp.new(x).generate

> :-)
>
> Jesus.


-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2009-01-16 23:49
(Received via mailing list)
On Fri, Jan 16, 2009 at 5:51 PM, Rob Biedenharn
<Rob@agileconsultingllc.com> wrote:
> On Jan 16, 2009, at 9:10 AM, Jesús Gabriel y Galán wrote:

>> irb(main):002:0> /#{"ac[ta]cc"}/.generate
>> => ["actcc", "acacc"]
>
> No need to do the string interpolation there:
>  /ac[ta]cc/.generate
> Or if you have that in a string:
>  x="ac[ta]cc"
>  Regexp.new(x).generate

Good catch !!
Thanks.

Jesus.
Cd4de09b6a6cdbbf291b5963d6f02efd?d=identicon&s=25 George George (george_g)
on 2009-01-17 09:12
Thank you so much for all the replies. Here is a simple benchmark for
Brian and Jesus approaches. I Run it on ubuntu 8.04, 1GB RAM, 2 CPUs
3.40GHz.

.....
...
require 'benchmark'
 Benchmark.bm  do |bm|

  bm.report("Brian:") do
  expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
    #puts seq
  end
 end

bm.report("Jesus:") do
  /t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga/.generate
 end
 end

ser     system      total        real
Brian:  0.000000   0.000000   0.000000 (  0.000642)
Jesus:  0.000000   0.000000   0.000000 (  0.003574)
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-17 11:10
(Received via mailing list)
On 17.01.2009 09:10, George George wrote:
>   expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
> Brian:  0.000000   0.000000   0.000000 (  0.000642)
> Jesus:  0.000000   0.000000   0.000000 (  0.003574)

You probably need to execute each variant in a loop multiple times to
get meaningful results.

Kind regards

  robert
Cd4de09b6a6cdbbf291b5963d6f02efd?d=identicon&s=25 George George (george_g)
on 2009-01-17 14:14
> You probably need to execute each variant in a loop multiple times to
> get meaningful results.
>
> Kind regards
>
>   robert

Thanks robert here are the results ran 100000 times for each approach

require 'benchmark'

iterations = 100000
 Benchmark.bm  do |bm|

  bm.report("Brian:") do

iterations.times do
  expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
#  puts seq
end
end
  end

bm.report("Jesus:") do
  iterations.times do
  /t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga/.generate
 end
end
 end

 user   system        total     real
Brian: 36.500000   2.080000  38.580000 ( 38.738666)
Jesus: 217.180000   30.710000 247.890000 (248.848401)
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2009-01-17 16:04
(Received via mailing list)
On Sat, Jan 17, 2009 at 2:13 PM, George George
<george.githinji@gmail.com> wrote:

> Thanks robert here are the results ran 100000 times for each approach
>
>  user   system        total     real
> Brian: 36.500000   2.080000  38.580000 ( 38.738666)
> Jesus: 217.180000   30.710000 247.890000 (248.848401)

It shows that a specialized solution could be more streamlined :-).
Anyway, my solution was never optimized for performance. Could be an
interesting project...

Jesus.
This topic is locked and can not be replied to.