Trying to strip characters from a line

aris · December 20, 2012, 9:36pm

I’m reading a table from a MySQL database and then processing it row by
row, stripping each line of certain characters ([, ], " and comma)
before writing it to a file. The code is running without errors, but
it’s not stripping any of the characters as I would expect.

Here’s the code:

#!/usr/bin/ruby

require ‘mysql’

@bad_chars = ‘[],"’

begin
con = Mysql.new ‘localhost’, ‘root’, ‘menagerie’, ‘haiku_archive’

rs = con.query("SELECT * FROM archive_2012")
n_rows = rs.num_rows

n_rows.times do
    begin
        file = File.open("archive.html", "a")
        line = rs.fetch_row.to_s
        line.gsub(/\[\]\,\"/,'')
        file.write(line)
        file.puts "<br>"
    end
end

end

Executing the code results in an “archive.html” file with all of the
“stripped” characters still intact. Am I invoking the gsub method
incorrectly? Thanks in advance for any help.

paul-au · December 20, 2012, 10:24pm

On 21/12/2012, at 9:36 AM, Paul M. [email protected] wrote:

Executing the code results in an “archive.html” file with all of the
“stripped” characters still intact. Am I invoking the gsub method
incorrectly? Thanks in advance for any help.

There are two forms of gsub

line = “this is a line with ***** in it”
=> “this is a line with ***** in it”
line.gsub ‘', ‘’
=> “this is a line with in it”
line
=> “this is a line with ***** in it”
line.gsub! '’, ‘’
=> “this is a line with in it”
line
=> “this is a line with in it”

The first form returns a new string, the second modifies the receiver.

Also your regexp in the gsub probably doesn’t do what you think it does.
You might want something like

line.gsub /[[],"]/, “**”

Henry

paul-au · December 20, 2012, 10:55pm

Hi,

simply use String#delete or String#delete!

No need to fumble with regexes.

paul-au · December 20, 2012, 11:41pm

Thanks for the responses. Unfortunately neither is having the desired
result. Here is a “tail” of the outputted archive.html file:

pablo@cochituate=> tail archive.html
[“only Wednesday –
I stop fast-forwarding
between
commercials
”, “Sep-12-2012”]

[“last days of summer –
one more ghost story
around the
fire
”, “Sep-13-2012”]

[“maple-colored moon –
remembering
Mom’s pancakes
”,
“Sep-14-2012”]

[“Harvard Square station –
a Mozart sonata
between buses
”,
“Sep-14-2012”]

[“between buses
a Mozart sonata
”, “Sep-14-2012”]

[“stiff sea breeze –
the drawbridge operator’s
bushy
beard
”, “Sep-15-2012”]

[“old sea port –
the tugboat’s
tattered flag
”,
“Sep-15-2012”]

[“dockside pub –
wondering where
the cormorant went
”,
“Sep-15-2012”]

[“dockside pub –
the bride-to-be’s
bright pink tiara
”,
“Sep-15-2012”]

[“after the break
somewhat less jumping
to the jump blues
band
”, “Sep-15-2012”]

paul-au · December 21, 2012, 12:10pm

Paul M. wrote in post #1089769:

Thanks for the responses. Unfortunately neither is having the desired
result. Here is a “tail” of the outputted archive.html file:

The “tail” doesn’t tell us anything, we need your code.

I’m pretty sure you’ve again confused “gsub” and “gsub!” (or “delete”
and “delete!”) like in your first post.

paul-au · December 21, 2012, 2:13am

Paul M. wrote in post #1089769:

Thanks for the responses. Unfortunately neither is having the desired
result.

Here is how computer programming forums work:

You post 15 lines or less of code that demonstrates your problem.
You post the actual output.
You state your desired/expected output.

Your question has nothing to do with mysql, so your posted code should
have no mysql lines in it.

paul-au · December 21, 2012, 2:17pm

The code is the very first post in this thread, but I’ll repeat some of
it here:

n_rows.times do
    begin
        file = File.open("archive.html", "a")
        line = rs.fetch_row.to_s
        line.gsub /\[\]\"\,/, ''
        file.write(line)
        file.puts "<br>"
    end
end

The code is processing rows from a MySQL table as lines of text, one at
a time. The desired output would look something like this:

blah blah blah
meh
foo bar
Feb-12-2012

Instead it looks like this:

[“blah blah blah
meh
foo bar
”, “Feb-12-2012
”]

paul-au · December 21, 2012, 3:01pm

I’m talking about your new code where you used the above suggestions.
You said that neither of them worked, so I’m asking you for the exact
code.

Henry told you that you need to use “gsub!” if you want the method to
actually change the string (instead of returning a new string).

I suggested using “delete!” as an alternative.

So choose one of those two options, rewrite your code and try again.

paul-au · December 21, 2012, 3:15pm

This finally worked, although it’s certainly not elegant:

n_rows.times do
    begin
        file = File.open("archive.html", "a")
        line = rs.fetch_row.to_s
        line.gsub! '[', ''
        line.gsub! '"', ''
        line.gsub! ']', ''
        line.gsub! ',', ''
        file.write(line)
        file.puts "<br>"
    end
end

Thanks once again to everyone for their suggestions.

paul-au · December 21, 2012, 4:20pm

I did read and attempt to implement every suggestion made, but
admittedly I’m new to Ruby and will make newbie mistakes (like mixing up
gsub and gsub!). I started out as a FORTRAN developer back in the early
80s but have been a Sys Admin since the early 90s, and am still trying
to wrap my mind around object-oriented programming.

paul-au · December 21, 2012, 3:34pm

Paul M. wrote in post #1089856:

Thanks once again to everyone for their suggestions.

It would be even better if you’d actually read them. :-/

What’s the purpose of the “begin-end”, by the way? This is not Pascal. A
“begin-end” block only makes sense in combination with “rescue” or
“ensure”. Re-opening the file for every single row also doesn’t really
make sense. Either open the file before the loop or collect the row
strings and then write them all at once.

File.open(“archive.html”, “a”) do |file|

I’m sure there’s a better method for this, something like “each_row”

n_rows.times do
file << rs.fetch_row.to_s.delete(’[]",’)
file.puts ‘
’
end
end

paul-au · December 23, 2012, 12:38pm

If you are trying to strip characters from a line in a web page file
.html why may I ask are you using the “a” file opening mode? The “a”
mode is to append to the bottom of a file.

paul-au · December 23, 2012, 5:52pm

Instead of all those individual gsubs, why not this:

irb(main):001:0> ‘A["],a’.gsub(/["],/,’’)
=> “Aa”

paul-au · December 23, 2012, 5:05pm

Alex,

I’m appending to the “archive.html” line by line in a loop - that’s why
I opened it with the “a” mode.

Cheers,

Paul

paul-au · December 23, 2012, 9:01pm

Strangely, this didn’t work for me:

irb(main):001:0> ‘[“the frustrated musician’s
mountain
of
unsold CDs
”, “Feb-12-2012”]
’.gsub(/["],/,’’)

=> “[“the frustrated musician’s
mountain
of unsold
CDs
”, “Feb-12-2012”]
”

paul-au · December 24, 2012, 2:05am

Yes, my mistake, i was looking for that exact string instead of the
group. It should be this: /[["],]/

that is
/ Start Regex
[ Start group
\ Escape next character
[ look for open square bracket
" Look for double quotes
\ Escape next character
] Look for close square bracket
, Look for comma
] End group
/ End Regex

paul-au · December 24, 2012, 2:10pm

Joel / Calvin,

That worked brilliantly. Thank you so much for the help!

Paul

paul-au · December 24, 2012, 12:21am

Hello,

it did not work, because his regular expression is describing a
different pattern than you want: The ["], looks for a [ (must be
escaped in the regular expression, because it is a character with
special meaning), followed by a “, followed by a ] (as with [), followed
by a ,. His regular expression worked in his case, because he had [”],
in his string. You want, as far as I can tell, remove all occurrences of
those characters.
The regular expression you want is similar: /[[]",]/
If you don’t know what this does, here’s an explanation: The [ denotes
the beginning of a set, and the ] ends it. The set matches to any
character inside of it, but only one (because I did not write a
quantifier like * or + after it).

So, this works:
“A[Aaa[]aa, “aa”.gsub( /[[]”,]/, ‘’ ) # => “AAaaaa aaa”

Alternatively, you could describe what you want with the regular
expression /[|]|"|,/ since the pipe in regular expressions can be read
as “or”.

Regards

(Tested in Ruby 1.9.3p286; I don’t know from the top of my head if the
behavior would be any different in 1.8)