Extracting numbers from a string


#1

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

I expect it to involve the regex /\d+/, but I’m unclear how to extract a
portion of a string matching a regex.

Thank you


#2

Matt J. wrote:

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

I expect it to involve the regex /\d+/, but I’m unclear how to extract a
portion of a string matching a regex.

Thank you

This may be the simplest (and arguably the most ruby-esque):
str = “DSC_1234.jpg”
num = str.scan(/\d+/)[0]

Other ways to do it:
num = str.match(/\d+/)[0]

OR
num = (/\d+/).match(str)[0]

OR
num = str.scan(/\d+/) {|match| match}

OR
num = str =~ /(\d+)/ ? $1 : nil

That is,
num = if str =~ /(\d+)/
$1
else
nil
end

OR
if str =~ /\d+/
num = $~[0]
end

Some proponents of ruby have said that perl’s “There is more than one
way to do it,” is a curse. But the same is true of ruby. However, it
seems to me that most people learn reasonable idioms and common sense
prevails.

Dan


#3

Matt J. wrote:

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

I expect it to involve the regex /\d+/, but I’m unclear how to extract a
portion of a string matching a regex.

Thank you

a = “DSC_1234.jpg”
b = a.gsub(/[^[:digit:]]/, ‘’)


#4

If you just want to extract one number from a string, you could write
something like :

if a=“DSC_1234.jpg”

then a[/\d+/] will give you the first longest string of numbers, so
1234.

If you want to be more precise, you could use parenthesis to extract
the exact portion you want, like :

a[/DSC_(\d+).jpg/,1] (<=> a.match(/DSC_(\d+).jpg/)[1])

or even : a[/\ADSC_(\d+).jpg\Z/,1]


#5

On 12.06.2007 09:32, come wrote:

a[/DSC_(\d+).jpg/,1] (<=> a.match(/DSC_(\d+).jpg/)[1])

or even : a[/\ADSC_(\d+).jpg\Z/,1]

Or even simpler

irb(main):001:0> “DSC_1234.jpg”[/\d+/]
=> “1234”
irb(main):002:0> Integer(“DSC_1234.jpg”[/\d+/])
=> 1234

Kind regards

robert


#6

On Tue, Jun 12, 2007 at 03:45:04PM +0900, Matt J. wrote:

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

Some solutions have been posted already, but here’s mine:

irb(main):001:0> s=“DSC_1234.jpg”
=> “DSC_1234.jpg”
irb(main):002:0> s.sub(/\D+(\d+).*/,’\1’)
=> “1234”

basicially the regexp looks for :

  • one or more non-digits
  • one or more digits => because this is between parenthesis you can
    refer to
    it with \1 later on
  • something more

The digits (safely stored in \1) is all you want to keep… this assumed
you
are only interested in the first sequence of numbers.

Cheers

Bas


Bas van Gils removed_email_address@domain.invalid, http://www.van-gils.org
[[[ Thank you for not distributing my E-mail address ]]]

Quod est inferius est sicut quod est superius, et quod est superius est
sicut
quod est inferius, ad perpetranda miracula rei unius.


#7

On Jun 12, 2007, at 2:45 AM, Matt J. wrote:

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

I expect it to involve the regex /\d+/, but I’m unclear how to
extract a
portion of a string matching a regex.

Thank you

Last November (2006), there was a series of postings to the Columbus
Ruby Brigade list beginning with:
http://groups.google.com/group/columbusrb/browse_frm/thread/
9c2e682f9926bad0

This was the pattern that I used when responding to Bill’s code
because many of my pictures had names like “100_5142.jpg”,
“100_5143.jpg”, etc.

NUMBERED_FILE_PATTERN = %r{^(.*\D)?(\d+)(.+)$}

It became a constant since I used it in three places.

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid


#8

A big thanks to everybody and all the creative solutions!