Regular expressions and long text

unknown · June 20, 2008, 7:58pm

Hello guys,

I’ve started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = “Ruby is great”
words = []
words = str.scan(/\w+/)

The result is words[0]=“Ruby” words[1]=“is” and words[3]=“great”

I would like to do the following:

str = “Ruby is great. We all know that.”

and get words[0]=“Ruby is great” and ruby[1]=“We all know that”

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the “.”?

Thanks,

Guillermo

unknown · June 20, 2008, 9:03pm

I did not understand if you want to split the string on the full stop
str.split(".")
or divide the string in words and split them in two groups:

str = “Ruby is great. We all know that.”
([(v=str.split(" “))[0…k=((l=(v.size))/2)]]+[v[k…l]]).map{|e|e.join(”
")}
=> [“Ruby is great.”, “We all know that.”]

unknown · June 21, 2008, 11:26am

You can split on a regex for a full-stop followed by (optional)
whitespace.

str.split(/.\s?/)
=> [“Ruby is great”, “We all know that”]

unknown · June 24, 2008, 12:22pm

[email protected] pisze:

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

[cut]

I would like to do the following:
str = “Ruby is great. We all know that.”
and get words[0]=“Ruby is great” and ruby[1]=“We all know that”

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the “.”?

Hi,
maybe you should to try this: words = str.split(/.\s*/)

it works for me:

irb(main):008:0> str = “Ruby is great. We all know that.”
=> “Ruby is great. We all know that.”
irb(main):009:0> words = str.split(/.\s*/)
=> [“Ruby is great”, “We all know that”]
irb(main):010:0> words[0]
=> “Ruby is great”
irb(main):011:0> words[1]
=> “We all know that”

greetings

unknown · June 24, 2008, 12:53pm

On Tue, Jun 24, 2008 at 2:18 PM, shaman [email protected] wrote:

and get words[0]=“Ruby is great” and ruby[1]=“We all know that”
=> “Ruby is great. We all know that.”
irb(main):009:0> words = str.split(/.\s*/)
=> [“Ruby is great”, “We all know that”]
irb(main):010:0> words[0]
=> “Ruby is great”
irb(main):011:0> words[1]
=> “We all know that”

greetings

even more simple

irb(main):001:0> “Ruby is great. We all know that.”.split(“.”)
=> [“Ruby is great”, " We all know that"]

unknown · June 25, 2008, 7:27am

Hi,

I think u expect this output… so pls try it…

str=“Ruby is great. We all know that.”
a= str.split(‘.’).join(’ ')
words=[]
words=a.scan(/\w+/)

=> words=[“Ruby”,“is”,“great”,“We”,“all”,“know”,“that”]

Regards,
P.Raveendran

unknown wrote:

Hello guys,

I’ve started with Ruby a month ago and I am doing some works with
strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = “Ruby is great”
words = []
words = str.scan(/\w+/)

The result is words[0]=“Ruby” words[1]=“is” and words[3]=“great”

I would like to do the following:

str = “Ruby is great. We all know that.”

and get words[0]=“Ruby is great” and ruby[1]=“We all know that”

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the “.”?

Thanks,

Guillermo

unknown · July 4, 2008, 2:15am

On Jun 25, 2:25 am, Raveendran J. [email protected] wrote:

and regular expressions. I am trying to take a long text and store the

Guillermo

–
Posted viahttp://www.ruby-forum.com/.

If you want to stick to a regex based solution.

str = “one one one. two. three.”
=> “one one one. two. three.”
str.scan(/\w[\s|\w]*./)
=> [“one one one.”, “two.”, “three.”]

And you could keep going adding more words in the same pattern

str = “one one one. two. three. four. five.”
=> “one one one. two. three. four. five.”
str.scan(/\w[\s|\w]*./)
=> [“one one one.”, “two.”, “three.”, “four.”, “five.”]

It may not be the best solution to this problem, but it is always good
have your regexp skills up to date

unknown · July 4, 2008, 4:31am

Very late to this thread, but…

On Sat, Jun 21, 2008 at 2:23 AM, Bryan JJ Buckley [email protected]
wrote:

You can split on a regex for a full-stop followed by (optional) whitespace.

str.split(/.\s?/)
=> [“Ruby is great”, “We all know that”]

str=“Dr. Feelgood will meet you at the corner of Foo St. and Bar Dr.
tonight at 8:00; bring $2.98 – exact change – to resolve the 5.5%
interest you owe.”

unknown · December 11, 2008, 9:24am

2008/12/11 Jun Y. Kim [email protected]:

   end
end

Another class give an aPatten argument as a “/[aeiou]/” and aReplace as a
“*”. Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead of
“/[aeiou]/”.

Any ideas on how to do I convert string to patten?

How about looking at the documentation?

http://www.ruby-doc.org/core/classes/Regexp.html

Btw, I rather tend to make it a requirement that the argument has the
appropriate type. Since #gsub is capable of working with String and
Regexp as pattern, I would not change your method’s implementation but
the code invoking it.

Taking this one step further: I would choose a different abstraction:

def transform from_file, to_file
repl = yield(File.read(from_file)) and
File.open(to_file, “w”) do |io|
io.write(repl)
end
end

Then you can do

transform “data”, “result” do |content|
content.gsub! /[aeiou]/, “*”
content
end

Cheers

robert

unknown · December 11, 2008, 9:45am

Any ideas on how to do I convert string to patten?

irb(main):001:0> Regexp.new("[aeiou]")
=> /[aeiou]/

unknown · December 11, 2008, 9:13am

Hi,

I’ve one program to replace text’s contents.

def replace (aPatten, aReplace)

I need some logic to translate string to patten

contents = File.read(“data”)
contents.gsub!(aPatten, aReplace)
File.open(“result”, “w”) do |file|
file << contents
end
end

Another class give an aPatten argument as a “/[aeiou]/” and aReplace
as a “*”. Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead
of “/[aeiou]/”.

Any ideas on how to do I convert string to patten?

unknown · December 11, 2008, 10:13am

thanks for your reply, brian.

How about Regexp.new("/[aeiou]/") ?
=> //[aeiou]//

1. 11, ¿ÀÈÄ 5:38, Brian C. ÀÛ¼º:

unknown · December 12, 2008, 4:08am

I mean I have a regular expression as a string.

puts aPattern
=> “/[aeiou]/”

When I convert it as a Regexp instance, the result is
=> //[aeiou]//

At this point, the given regular pattern is not regular expression
anymore, it’s just a string.

1. 11, ¿ÀÈÄ 6:06, Jun Y. Kim ÀÛ¼º:

unknown · December 12, 2008, 6:04am

Hi , all

There is a ruby parse library , as you know, called “Treetop”.

some part of logic in my program try to parse regular expressions as a
single token.

let me give example for easy understanding.

translate /[aeiou]/ “*”

this means translate all chars having a /[aeiou]/ to *.

any idea to create rule to parse it ?

unknown · December 12, 2008, 4:26am

From: Jun Y. Kim [mailto:[email protected]]

I mean I have a regular expression as a string.

puts aPattern

=> “/[aeiou]/”

When I convert it as a Regexp instance, the result is

=> //[aeiou]//

At this point, the given regular pattern is not regular expression

anymore, it’s just a string.

it is stil a regex, not just the regex that you expected though.

you can either remove the surrounding slashes

s=“/[aeiou]/”
Regexp.new s[1…-2]
#=> /[aeiou]/

or you can just eval it straight away

eval(s)
#=> /[aeiou]/