Splitting help needed

zoephoenix · May 31, 2008, 1:15am

I have a program that someone on this forum helped me fix before that
took a list of cities formatted like:

New York | Chicago | Boston |

and formatted them like this, along with a phrase added after each one:

New York
Chicago
Boston
etc.

The code looks like this:

main = 0

inside << main

newfile=File.new("state2.txt", "w")
newfile.puts inside
newfile.close

count+=1

end
end

I tried to modify it so that it would separate not on the ‘|’ character,
but on a single space (such as in a list of cities like “New York
Chicago Boston”, etc. without the ‘|’ above) and then enter the phrase
after it.

I can’t seem to get it to put each city in the file on a new line and
then add the phrase that I want after it like it did before. The only
real difference in what I want now and what I had before was the ‘|’
character. Can someone help me fix this?

zoephoenix · May 31, 2008, 1:42am

Zoe P. wrote:

I have a program that someone on this forum helped me fix before that
took a list of cities formatted like:

New York | Chicago | Boston |

and formatted them like this, along with a phrase added after each one:

New York
Chicago
Boston
etc.

The code looks like this:

main = 0

full= File.open(“state.txt”)
phrase=[", New Jersey"]
count=0
inside = []
full.each do |line|
first=[]
first=line.split(/|/)
first.each do |single|
sub=single.strip!
main = (sub).to_s + (phrase).to_s
inside << main

newfile=File.new("state2.txt", "w")
newfile.puts inside
newfile.close

count+=1
end
end

The method split splits up a string and it will put the parts in an
array.
If you don’t specify what to split on, it will split on newlines. Not
what you want. How to make clear that you want to split on " “?
Just say split(” “) .
(split will also work with a regular expression, like in your code. It’s
faster and far more powerfull, but completely unreadable if you are not
familiar with it. In your code split(”|") works.)

I have not tried, but it looks as if your code writes a new file for
each line it reads. Each time the same file. First time 1 line, second
time 2 lines, etc. You could consider taking this bit:

 newfile=File.new("state2.txt", "w")
 newfile.puts inside
 newfile.close

out of the loop.

hth,

Siep

zoephoenix · May 31, 2008, 1:56am

Zoe P. wrote:

I tried to modify it so that it would separate not on the ‘|’ character,
but on a single space (such as in a list of cities like “New York
Chicago Boston”, etc. without the ‘|’ above) and then enter the phrase
after it.

Oh. Well, you will end up with the cities New, York, Chicago, Boston.

Sorry about that.

Siep

zoephoenix · May 31, 2008, 2:02am

Siep K. wrote:

Zoe P. wrote:
I have a program that someone on this forum helped me fix before that
took a list of cities formatted like:

New York | Chicago | Boston |

and formatted them like this, along with a phrase added after each one:

New York
Chicago
Boston
etc.

The code looks like this:

main = 0

full= File.open(“state.txt”)
phrase=[", New Jersey"]
count=0
inside = []
full.each do |line|
first=[]
first=line.split(/|/)
first.each do |single|
sub=single.strip!
main = (sub).to_s + (phrase).to_s
inside << main

newfile=File.new("state2.txt", "w")
newfile.puts inside
newfile.close

count+=1
end
end
The method split splits up a string and it will put the parts in an
array.
If you don’t specify what to split on, it will split on newlines. Not
what you want. How to make clear that you want to split on " “?
Just say split(” “) .
(split will also work with a regular expression, like in your code. It’s
faster and far more powerfull, but completely unreadable if you are not
familiar with it. In your code split(”|") works.)

I have not tried, but it looks as if your code writes a new file for
each line it reads. Each time the same file. First time 1 line, second
time 2 lines, etc. You could consider taking this bit:
 newfile=File.new("state2.txt", "w")
 newfile.puts inside
 newfile.close
out of the loop.

hth,

Siep

Well, it isn’t writing a new file for each line… but, it’s not listing
the cities like I want, all it’s doing is putting the phrase on a new
line for the same number of cities there are. So, I get, say “,
Alabama” a bunch of times instead of “Montgomery, Alabama”, “Birmingham,
Alabama”, etc.

I want it to take this:

Alabaster Albertville Alexander City Andalusia Anniston Arab Ardmore
Athens Atmore Attalla Auburn

And turn it into this:

Alabaster, Alabama
Albertville, Alabama
Alexander City, Alabama
Andalusia, Alabama
etc.

What I’m getting when I run the program is,

, Alabama
, Alabama
, Alabama
etc.

I know I’ll run into a problem with some of the cities having two words
in them, like Alexander City, but fixing those manually isn’t a problem.

zoephoenix · May 31, 2008, 3:56am

Siep K. wrote:

The method split splits up a string and it will put the parts in an
array.
If you don’t specify what to split on, it will split on newlines. Not
what you want. How to make clear that you want to split on " “?
Just say split(” ") .

str = “hello world goodbye”
arr = str.split()
p arr

–output:–
[“hello”, “world”, “goodbye”]

(split will also work with a regular expression, like in your code. It’s
faster

Wrong.

and far more powerfull, but completely unreadable if you are not
familiar with it. In your code split("|") works.)

Ahh, but the governing principle in the Ruby community is to make a Ruby
script look as much like a Perl script as possible–efficiency be
damned. So who in their right mind would pass up a chance to use a
hieroglyphic regex like: /|/ in their code. That’s art.

zoephoenix · May 31, 2008, 3:07am

Hi –

On Sat, 31 May 2008, Zoe P. wrote:

Chicago
count=0
newfile=File.new(“state2.txt”, “w”)
array.

Well, it isn’t writing a new file for each line…

It’s writing a new file for each element in the input. You’re writing
the same file over and over again, a little bigger each time, instead
of gathering all the input and writing it all at once (or writing it
incrementally to a file that you keep open).

, Alabama
, Alabama
, Alabama
etc.

I know I’ll run into a problem with some of the cities having two words
in them, like Alexander City, but fixing those manually isn’t a problem.

Try this:

phrase = “, Alabama”

David

zoephoenix · May 31, 2008, 5:16am

Ohhh, I see… thank you so much!

zoephoenix · May 31, 2008, 12:11pm

Hi –

On Sat, 31 May 2008, 7stud – wrote:

p arr

–output:–
[“hello”, “world”, “goodbye”]

(split will also work with a regular expression, like in your code. It’s
faster

Wrong.

My benchmarks suggest that it’s a little faster (about 10%).

David

zoephoenix · May 31, 2008, 5:57am

No need to waste space being facetious.

On Fri, May 30, 2008 at 10:16 PM, Zoe P.
[email protected]

zoephoenix · May 31, 2008, 1:33pm

7stud – wrote:

Ahh, but the governing principle in the Ruby community is to make a Ruby
script look as much like a Perl script as possible–efficiency be
damned. So who in their right mind would pass up a chance to use a
hieroglyphic regex like: /|/ in their code. That’s art.

Larry Wall (Perl supremo) has called this sort of thing LTS: leaning
toothpick syndrome. But Perl’s syntax allows you to write a regexp as
m(|), which is a bit clearer. Or you can use the quotemeta() function
to add a backslash for you.

But this is Ruby, not Perl!

Coming from 10 years of Perl coding, I wish Ruby were less Perl-like,
as it can get confusing, especially when you’re working in both
languages at the same time.

zoephoenix · May 31, 2008, 12:12pm

Hi –

On Sat, 31 May 2008, Siep K. wrote:

The method split splits up a string and it will put the parts in an
array.
If you don’t specify what to split on, it will split on newlines. Not
what you want. How to make clear that you want to split on " “?
Just say split(” ") .

Actually without an argument it will split on any amount of
whitespace. I think you’re thinking of how strings enumerate, which is
(by default) as lines:

“abc\ndef”.to_a # [“abc\n”, “def”]

David

zoephoenix · May 31, 2008, 2:12pm

Hi –

On Sat, 31 May 2008, Dave B. wrote:

7stud – wrote:

Ahh, but the governing principle in the Ruby community is to make a Ruby
script look as much like a Perl script as possible–efficiency be
damned. So who in their right mind would pass up a chance to use a
hieroglyphic regex like: /|/ in their code. That’s art.

Larry Wall (Perl supremo) has called this sort of thing LTS: leaning
toothpick syndrome. But Perl’s syntax allows you to write a regexp as
m(|), which is a bit clearer. Or you can use the quotemeta() function
to add a backslash for you.

I always figured it’s easiest just to learn the regex stuff and get it
over with. As a result, I can read regexes fluently as long as they
don’t use /x or %r{}

But this is Ruby, not Perl!

Coming from 10 years of Perl coding, I wish Ruby were less Perl-like,
as it can get confusing, especially when you’re working in both
languages at the same time.

I remember someone (I’m too lazy to look it up) saying long ago that
while Ruby often strikes one as Perl-like initially, it actually is
much less so than it appears at first. I think that’s true. Perl also
has more of a tradition of deliberate code obfuscation, though of
course it’s generally done in a playful way. Obfuscated Ruby code
always looks kind of ridiculous to me, as Ruby really militates for
a certain clarity, and there’s such a tradition of love of clean code
in the community.

For the first RubyConf, we were going to have a “Code De-Obfuscation”
contest, since the idea of an obfuscation contest in Ruby seemed so
against the grain of what people loved about the language. We got as
far as getting some obfuscated contributions, ripe for de-obfuscation
(including one from Dave T.), but unfortunately the timing of that
conference – October 2001 – sapped some of our time and energy and
that contest was one of the things that fell by the wayside.

David

zoephoenix · May 31, 2008, 7:20pm

On 31.05.2008 16:45, Rick DeNatale wrote:

On Sat, May 31, 2008 at 8:11 AM, David A. Black [email protected] wrote:

clarity/expressiveness rather than compactness. The problem is that
it’s impossible to measure the former objectively, compared to 32 vs.
33 character comparisons.

That’s true. I usually try to compact code in order to increase
readability and often also efficiency because I do not find shortness a
value in itself. As always it’s a question of balance.

Kind regards

robert

zoephoenix · May 31, 2008, 9:44pm

Hi –

On Sat, 31 May 2008, Rick DeNatale wrote:

regularly, for example the if/unless/while statement modifiers.
some extensions like %r{} which I think make them a little clearer.
anti-code-golf contest, where the objective was maximum
clarity/expressiveness rather than compactness. The problem is that
it’s impossible to measure the former objectively, compared to 32 vs.
33 character comparisons.

I like code golf, as long as it’s clear that it’s code golf – that
is, a brain-teaser/exercise with the goal of coming up with the
minimum number of “strokes”. I’ve learned an awful lot about both Ruby
and Perl by doing that. I don’t consider it any more closely related
to real code production than, say, abdominal crunches are to baseball.
It’s just a mind-stretcher. I suspend aesthetic judgment on code in
code golf contests because I assume it isn’t really being held up as
anything other than what it is (maximally compressed).

Then again, there are certainly cases of quasi-golf-like code that
people just write, and that can be a problem…

The de-obfuscation contest idea was kind of a superset of what you’re
describing: looking to transform code that was opaque and badly
written in various ways into something more clear and expressive. The
de-obfuscations were going to be judged by a panel, as I recall, since
as you say there’s no automatic way to judge them.

David

zoephoenix · May 31, 2008, 4:46pm

On Sat, May 31, 2008 at 8:11 AM, David A. Black [email protected]
wrote:

I remember someone (I’m too lazy to look it up) saying long ago that
while Ruby often strikes one as Perl-like initially, it actually is
much less so than it appears at first. I think that’s true. Perl also
has more of a tradition of deliberate code obfuscation, though of
course it’s generally done in a playful way. Obfuscated Ruby code
always looks kind of ridiculous to me, as Ruby really militates for
a certain clarity, and there’s such a tradition of love of clean code
in the community.

Some of the features Ruby ‘stole’ from Perl are things I use fairly
regularly, for example the if/unless/while statement modifiers.

Others such as all the special and sometimes magical global variables
and pseudo-variables I don’t find useful at all, and I don’t see them
discussed much here. They’re probably quite helpful if you are using
Ruby in the same kind of swiss-army knife adjunct to shell commands
way that perl is often used by wrapping a ‘one-liner’ in one of the
various loops implied by different command line options like i, n,
and p, but I never use that.

Regular expressions are really a sub-language of their own. Ruby has
some extensions like %r{} which I think make them a little clearer.

For the first RubyConf, we were going to have a “Code De-Obfuscation”
contest, since the idea of an obfuscation contest in Ruby seemed so
against the grain of what people loved about the language. We got as
far as getting some obfuscated contributions, ripe for de-obfuscation
(including one from Dave T.), but unfortunately the timing of that
conference – October 2001 – sapped some of our time and energy and
that contest was one of the things that fell by the wayside.

That sounds like a great idea. I wish that there were some kind of
anti-code-golf contest, where the objective was maximum
clarity/expressiveness rather than compactness. The problem is that
it’s impossible to measure the former objectively, compared to 32 vs.
33 character comparisons.

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

zoephoenix · May 31, 2008, 11:11pm

David A. Black wrote:

I always figured it’s easiest just to learn the regex stuff and get it
over with.

I did–8 years ago.

My benchmarks suggest that it’s a little faster (about 10%).

Ok, I see where you’re coming from. The following test shows that
split() operates 45% faster without a regex:

require ‘benchmark’
include Benchmark

L_COUNT = 1_000_000

bm(25) do |test|
test.report(“split:”) do
L_COUNT.times do |i|
str = “hello world goodbye”
arr = str.split()
end
end

test.report(“regex:”) do
L_COUNT.times do |i|
str = “hello world goodbye”
arr = str.split(/\s+/)
end
end
end

                          user     system      total        real

split: 2.470000 0.010000 2.480000 ( 2.494421)
regex: 4.550000 0.030000 4.580000 ( 4.576609)

And this test shows that split() is 13% faster with a regex:

require ‘benchmark’
include Benchmark

L_COUNT = 1_000_000

That indicates to me that unless you use the default behavior of
split(), which splits on spaces, then split() has to spend time
converting its argument to a regex.

zoephoenix · May 31, 2008, 11:12pm

7stud – wrote:

And this test shows that split() is 13% faster with a regex:

require ‘benchmark’
include Benchmark

L_COUNT = 1_000_000

bm(25) do |test|
test.report(“split:”) do
L_COUNT.times do |i|
str = “hello|world|goodbye”
arr = str.split("|")
end
end

test.report(“regex:”) do
L_COUNT.times do |i|
str = “hello|world|goodbye”
arr = str.split(/|/)
end
end
end

Whoops. Here are the results:

                           user     system      total        real

split: 4.620000 0.030000 4.650000 ( 4.661699)
regex: 4.000000 0.030000 4.030000 ( 4.056688)

zoephoenix · June 1, 2008, 11:40am

On 31.05.2008 21:44, David A. Black wrote:

(including one from Dave T.), but unfortunately the timing of that

I do think that one of the great things about Ruby is that, to a
remarkable degree, code becomes clearer as it becomes smaller. Not,
of course, when it gets into real golf territory (see my last post) –
but over a pretty wide range.

Absolutely agree.

robert

zoephoenix · May 31, 2008, 9:45pm

Hi –

On Sun, 1 Jun 2008, Robert K. wrote:

That sounds like a great idea. I wish that there were some kind of
anti-code-golf contest, where the objective was maximum
clarity/expressiveness rather than compactness. The problem is that
it’s impossible to measure the former objectively, compared to 32 vs.
33 character comparisons.

That’s true. I usually try to compact code in order to increase readability
and often also efficiency because I do not find shortness a value in itself.
As always it’s a question of balance.

I do think that one of the great things about Ruby is that, to a
remarkable degree, code becomes clearer as it becomes smaller. Not,
of course, when it gets into real golf territory (see my last post) –
but over a pretty wide range.

David