Help: Efficient regular expression

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”

i need to fetch 14051 and /bin/bash from the string

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.\s+\d+:\d+:\d+\s+(.)\s/

i know this is not the efficient way of doing it.

Please help.

Divya B. wrote:

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”

i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.\s+\d+:\d+:\d+\s+(.)\s/

i know this is not the efficient way of doing it.

Please help.

On Jul 10, 2007, at 3:25 PM, Divya B. wrote:

Divya B. wrote:

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”

i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.

cols = string.split
sec, last = cols.values_at(1, -1)

Hope that helps.

James Edward G. II

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

Florian Aßmann schrieb:

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

pid = string[/\s(\d+)/, 1]
cmd = string[/\s(\S+)$/, 1] # is missing

Florian Aßmann wrote:

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”
with this,
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.\s+\d+:\d+:\d+\s+(.)\s/

$1 gives me 14051
and
$3 gives me /bin/bash

what i am trying to do is to get $1 and $3 into a hash.

I love regex, so it hurts me to say it, there are other ways of solving
this :wink:

for instance:

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”
number = string.split[1]
program=string.split.last

now regexes!

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”
number=string[/[0-9]+/]
program=string[/[a-z/]+$/]

You know you can get values out of an array with the [] operator.
Well you can get strings out of strings that same way, and it works
with regexes!

string[/[0-9]+/] will return the first match of 1 or more numbers

Here’s the magic use [ ] inside of a regular expression to create your
own groups. Individual characters in there are included in the group,
and ranges may be included using the -. so a-b is
abcdefghijklmnopqrstuvwxyz.
The + afterwards means 1 or more times.
What if you want _exactly 5 consecutive numbers? use the {}
string[/[0-9]{5}/]
ranges also work here
string[/[0-9]{3-5}/] would match 3, 4 or 5 digit numbers

and
string[/[a-z/]+$/] will match a text string containing the forward
slash at the end. The $ is a special char to represent the end of a
line, and since / is a special char itself, it needed to be escaped
with a .

BUT it could even be easier.
the [] groups, can be negative!
/[^a]/ would match any string that did not have an a in it
/[^ ]
/ would match any string that did not have a space in it…soo
string[/[^ ]+$/] would be a good way to get the last bit.

Ooooh fun.
so are you going to announce the winner :wink:

On 7/10/07, James Edward G. II [email protected] wrote:

sec, last = cols.values_at(1, -1)
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

sec, last = string.split.
values_at(1, -1)

Robert

Divya B. schrieb:

Please help.

talking about efficient, I was just curious…

#!/usr/bin/env ruby -w

Created by Florian Aßmann on 2007-07-10.

Copyright © 2007. All rights reserved.

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”
require ‘profiler’

puts <<-EOS

pid = string[/\s(\d+)/, 1]
cmd = string[/\s(\S+)$/, 1]

EOS
Profiler__::start_profile

10000.times do
pid = string[/\s(\d+)/, 1]
cmd = string[/\s(\S+)$/, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

cols = string.split
sec, last = cols.values_at(1, -1)

EOS
Profiler__::start_profile

10000.times do
cols = string.split
sec, last = cols.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

number = string.split[1]
program = string.split.last

EOS
Profiler__::start_profile

10000.times do
number = string.split[1]
program = string.split.last
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

grin

Florian

On 7/10/07, Kyle S. [email protected] wrote:

Ooooh fun.
so are you going to announce the winner :wink:

It’s just profiler code, you can run it yourself… But on my machine:

pid = string[/ (d+)/, 1]
cmd = string[/ (S+)$/, 1]

% cumulative self self total
time seconds seconds calls ms/call ms/call name
58.26 0.67 0.67 1 670.00 1150.00 Integer#times
41.74 1.15 0.48 20000 0.02 0.02 String#[]
0.00 1.15 0.00 1 0.00 1150.00 #toplevel

cols = string.split
sec, last = cols.values_at(1, -1)

% cumulative self self total
time seconds seconds calls ms/call ms/call name
66.67 0.70 0.70 1 700.00 1050.00 Integer#times
18.10 0.89 0.19 10000 0.02 0.02 String#split
15.24 1.05 0.16 10000 0.02 0.02 Array#values_at
0.00 1.05 0.00 1 0.00 1050.00 #toplevel

number = string.split[1]
program = string.split.last

% cumulative self self total
time seconds seconds calls ms/call ms/call name
61.70 1.16 1.16 1 1160.00 1880.00 Integer#times
23.94 1.61 0.45 20000 0.02 0.02 String#split
8.51 1.77 0.16 10000 0.02 0.02 Array#last
5.85 1.88 0.11 10000 0.01 0.01 Array#[]
0.00 1.88 0.00 1 0.00 1880.00 #toplevel

Ok, it was hard to beat Edward, but at least building the simplest
regular expression to do somthing like a String.split seems to faster:

#!/usr/bin/env ruby -w

Created by Florian Aßmann on 2007-07-10.

Copyright © 2007. All rights reserved.

string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash”
require ‘profiler’

puts <<-EOS

pid_rx = /\s(\d+)/
cmd_rx = /\s(\S+)$/
pid, cmd = string[pid_rx, 1], string[cmd_rx, 1]

EOS
Profiler__::start_profile

pid_rx = /\s(\d+)/
cmd_rx = /\s(\S+)$/
100000.times do
pid, cmd = string[pid_rx, 1], string[cmd_rx, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

pid, cmd = string.split.values_at(1, -1)

EOS
Profiler__::start_profile

100000.times do
pid, cmd = string.split.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx = Regexp.new(’\S+\s(\d+).*\s(\S+$)’)
pid, cmd = rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx = Regexp.new(’\S+\s(\d+).*\s(\S+$)’)
100000.times do
pid, cmd = rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx = Regexp.new(’(\S+)’)
pid, cmd = rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx = Regexp.new(’(\S+)’)
100000.times do
pid, cmd = rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

Sincerely
Florian

except that the last regexp match sh**… lol

On 7/10/07, Robert D. [email protected] wrote:

cols = string.split
sec, last = cols.values_at(1, -1)
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

sec, last = string.split.
values_at(1, -1)

What is your terminal width, 30?

sorry for being OT since I’m not going to talk about ruby or regexp

If the string you’re parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.

Paolo

Robert D. wrote:

On 7/10/07, James Edward G. II [email protected] wrote:

sec, last = cols.values_at(1, -1)
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

sec, last = string.split.
values_at(1, -1)

Robert

cmd = string[/\s(\S+)$/, 1]
doesnt fetch me anything:)

program=string.split.last
what if
string = “root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x
-s”
it fetches only -s for me.
sec, last = string.split.values_at(1, -1)
doesnt work for the same reason
i need everything after 00.00.00 till the end
i.e., /bin/bash -x -s

program=string[/[a-z/]+$/]
the command column mauy start with character. i dont want to limit it in
my regexp. it has to be generic.

with all your comments, i tried
pid = run_process[/\s(\d+)/, 1]
cmd = run_process[/:\d+:\d+\s(\S.*)\s$/, 1]

is there any other way?

Paolo N. wrote:

sorry for being OT since I’m not going to talk about ruby or regexp

If the string you’re parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.

Paolo

i saw that too. But i can not use all the options in a ps command where
i am using.
i am limited to using ps -aef

i need to take care of fetching the stuff i need using from this result.

On 7/10/07, Divya B. [email protected] wrote:

                   values_at(1, -1)

it fetches only -s for me.
pid = run_process[/\s(\d+)/, 1]
cmd = run_process[/:\d+:\d+\s(\S.*)\s$/, 1]

is there any other way?

It’s not fancy, but I’ll throw it in:

s = ‘root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x -s’
_, pid, _, cmd = (s.match /(\d+)\s.(:\d+){2}\s(.*?)$/)

so, if you’re using a hash like I think you might be:

s =
h = {}
s.each_line do |line|
_, pid, _, cmd = (line.match /(\d+)\s.(:\d+){2}\s(.*?)$/)
h[pid] = cmd
end

I think that should work.

Todd

_, pid

Just limit the split and you should go with command arguments…

On Jul 10, 2007, at 7:54 PM, Florian Aßmann wrote:

Just limit the split and you should go with command arguments…

…unless the process start time is one of the output columns and it
goes from ‘HH:MM’ to ‘Mon dd’ for a process that runs long enough.

If you really can’t change the ps options, suck it up, count columns,
forget the regexp, and be done.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]