Forum: Ruby identify and extract positions from a string - how to?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0f84f5e455d71105de3f995eadaea601?d=identicon&s=25 Marc Hoeppner (lasastard)
on 2007-07-19 12:30
Hi,

I am not quite sure about how to approach the following problem:

I have a long (long long long) string of letters, a genomic sequence
(600k characters+).
Now, what I want to do is to extract certain parts of this string, based
on the position.
So for example lets say I want all characters from position 2340 to
5436.

A quick pointer in the right direction would be much appreciated. I have
a vague idea that it could perhaps be done with count? Like "puts string
where string.count("actg")=2340 until string.count("actg")=5436"... ?
Not sure tho, and probably there are better ways.



Cheers,

Marc
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2007-07-19 12:37
(Received via mailing list)
On 7/19/07, Marc Hoeppner <marc.hoeppner@molbio.su.se> wrote:
> 5436.
>
> A quick pointer in the right direction would be much appreciated. I have
> a vague idea that it could perhaps be done with count? Like "puts string
> where string.count("actg")=2340 until string.count("actg")=5436"... ?
> Not sure tho, and probably there are better ways.


string[2340..5436]

Cheers,
Da67f3ea903ef128227ca0a2f9613aa3?d=identicon&s=25 Thomas Worm (Guest)
on 2007-07-19 12:56
(Received via mailing list)
On Thu, 19 Jul 2007 19:31:12 +0900, Marc Hoeppner wrote:

> Hi,
>
> I am not quite sure about how to approach the following problem:
>
> I have a long (long long long) string of letters, a genomic sequence
> (600k characters+).
> Now, what I want to do is to extract certain parts of this string, based
> on the position.
> So for example lets say I want all characters from position 2340 to
> 5436.

What about

puts "My String"[5..7]

Thomas
B57c5af36f5c1f33243dd8b2dd9043b1?d=identicon&s=25 F. Senault (Guest)
on 2007-07-19 13:02
(Received via mailing list)
Le 19 juillet à 12:31, Marc Hoeppner a écrit :

> Hi,
>
> I am not quite sure about how to approach the following problem:
>
> I have a long (long long long) string of letters, a genomic sequence
> (600k characters+).
> Now, what I want to do is to extract certain parts of this string, based
> on the position.
> So for example lets say I want all characters from position 2340 to
> 5436.

For example :

>> str = "abcdefghijklmnopqrstuvwxyz"
=> "abcdefghijklmnopqrstuvwxyz"

The simplest way to do answer you question is :

>> str[5..11]
=> "fghijkl"

You may want to try the other variants :

>> str[5, 6]
=> "fghijkl"

>> str[/f.*l/]
=> "fghijkl"

>> str['jghijkl']
=> "fghijkl"

If you need to parse it char per char, you can use a multitude of
methods :

>> str[5..10].each_byte { |b| puts b.chr }
f
g
h
i
j
k
=> "fghijk"

>> str[5..10].split(//)
=> ["f", "g", "h", "i", "j", "k"]

>> str[5..10].split(//).each { |c| puts c }
f
g
h
i
j
k
=> ["f", "g", "h", "i", "j", "k"]

Etc.

I didn't try with very long strings, now, but I don't see why the ranges
methods of access wouldn't be acceptable.  (Of course, the regular
expression will be slower.)

Fred
0f84f5e455d71105de3f995eadaea601?d=identicon&s=25 Marc Hoeppner (lasastard)
on 2007-07-19 13:15
Thanks a lot, dont know how I missed that in the string chapter.

Anyhow, another thing came up:

while string[1..10] is pretty much what I was looking for - is there any
way that I can substitute the numbers (or the whole content of the
square brackets for that matter) with variables?

As it is now I have a file that contains coordinates and a second file
that contains the string that I want to extract from.

So ideally the script would read each line of the coordinate file

45..78
90..120
etc

and uses it in the extraction method

file.readlines each do |l|
  puts string[l]
end

Doesnt work tho -any suggestions on how to pipe each line of the
coordinate file to the string method? I know I know, probably simple,
but I am still learning ;)

Cheers,

Marc
15a5043475dac9278ae75efb4c71f1f6?d=identicon&s=25 Felix Windt (Guest)
on 2007-07-19 13:40
(Received via mailing list)
>
> end
> Posted via http://www.ruby-forum.com/.
irb(main):001:0> (a,b) = "3..5".split("..").map {|x| x.to_i}
=> [3, 5]
irb(main):002:0> "test_string"[a..b]
=> "t_s"
B57c5af36f5c1f33243dd8b2dd9043b1?d=identicon&s=25 F. Senault (Guest)
on 2007-07-19 14:16
(Received via mailing list)
Le 19 juillet à 13:16, Marc Hoeppner a écrit :

> and uses it in the extraction method
>
> file.readlines each do |l|
>   puts string[l]
> end

The others solutions in the thread are the ones to use, but I feel the
need to suggest the very dirty / insecure / bad one :

File(filepath).readlines.each do |l|
  puts string[eval(l)]
end

Don't try this at home, etc...  :)

(But, in a controlled environment, it may be useful since it allows for
all the variations that can be evaluated in one line of ruby code...)

Fred
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-07-19 14:33
(Received via mailing list)
2007/7/19, F. Senault <fred@lacave.net>:
>
> File(filepath).readlines.each do |l|
>   puts string[eval(l)]
> end
>
> Don't try this at home, etc...  :)
>
> (But, in a controlled environment, it may be useful since it allows for
> all the variations that can be evaluated in one line of ruby code...)

A safer variant:

file.each do |line|
  if /^(\d+)\.\.(\d+)$/ =~ line
    puts string[ $1.to_i .. $2.to_i ]
  end
end

Note, that file.each is more efficient than file.readlines.each
because it does not need to read the whole file into memory.

Kind regards

robert
Da67f3ea903ef128227ca0a2f9613aa3?d=identicon&s=25 Thomas Worm (Guest)
on 2007-07-19 17:09
(Received via mailing list)
On Thu, 19 Jul 2007 20:16:12 +0900, Marc Hoeppner wrote:

> As it is now I have a file that contains coordinates and a second file
> that contains the string that I want to extract from.
>
> So ideally the script would read each line of the coordinate file
>
> 45..78
> 90..120
> etc

Those ..-things are called ranges, which, what wonder, are a class in
ruby. Have a look at http://corelib.rubyonrails.org/ for the class
Range.

another way to express str[45..78] is str[45,78] or str.slice(45,78) or
str.slice(45..78), where the numbers can be replaced by variables:
str[fr..to], str[fr,to], str.slice[fr,to], str.slice(fr..to)

This information can be found at the same webpage, just look for the
class String ;-)


> and uses it in the extraction method
>
> file.readlines each do |l|
>   puts string[l]
> end
>
> Doesnt work tho -any suggestions on how to pipe each line of the
> coordinate file to the string method? I know I know, probably simple,
> but I am still learning ;)

l is a String-object, not a Range-object.

file.readlines each do |l|
  fr, to = l.split(/\.\./)
  puts string[fr,to]
end

should do the job.

The thingy with the slashes in the split-method is a regular expression.

Regards
Thomas
Da67f3ea903ef128227ca0a2f9613aa3?d=identicon&s=25 Thomas Worm (Guest)
on 2007-09-25 23:02
(Received via mailing list)
On Thu, 19 Jul 2007 11:54:29 +0000, Thomas Worm wrote:

>   puts string[fr,to]

should be

puts string[fr.to_i,to.to_i]

Thomas
Da67f3ea903ef128227ca0a2f9613aa3?d=identicon&s=25 Thomas Worm (Guest)
on 2007-09-25 23:04
(Received via mailing list)
On Thu, 19 Jul 2007 14:04:13 +0200, F. Senault wrote:

>>> str[45..78].length
> => 34
>>> str[45,78].length
> => 78
>
> (IOW start_position..end_position versus start_position,length.)
>

I guess you are right. I misintepreted the documentation, which says in
a
number of examples:

   a = "hello there"
   a[1,3]                 #=> "ell"
   a[1..3]                #=> "ell"

I should have taken the time to read the text instead.

Thomas
B57c5af36f5c1f33243dd8b2dd9043b1?d=identicon&s=25 F. Senault (Guest)
on 2007-09-25 23:04
(Received via mailing list)
Le 19 juillet à 13:54, Thomas Worm a écrit :

> Those ..-things are called ranges, which, what wonder, are a class in
> ruby. Have a look at http://corelib.rubyonrails.org/ for the class Range.
>
> another way to express str[45..78] is str[45,78]

Nope :

>> str[45..78].length
=> 34
>> str[45,78].length
=> 78

(IOW start_position..end_position versus start_position,length.)

Fred
This topic is locked and can not be replied to.