Grep a csv?

dreamwave · August 16, 2007, 8:55pm

On Aug 16, 11:23 am, Kaldrenon [email protected] wrote:

[…] I’m a little confused about how a pattern
like [^,]+ gets an element, given that (unless I’m mistaken) in a
standard regexp, it would only match on a string that contained a
series of commas that beginning of a line, like “,” or “abc\n,”.
What’s my mistake/confusion here?

/,+/ # matches one or more comma characters
/[z,]+/ # matches one or more z or comma characters intermixed
/[^,]+/ # matches one or more characters that aren’t a comma

Read up on character classes in regexps.

dreamwave · August 16, 2007, 9:15pm

On Aug 16, 2:04 am, Alex Y. [email protected] wrote:

and more than 65536 rows and you’re on Windows.

res << File.readlines(‘filename.csv’).grep(/Blah1/) #thanks chris

There’s a problem with using File.readlines that I don’t think anyone’s
mentioned yet. I don’t know if it’s relevant to your dataset, but CSV
fields are allowed to contain newlines if the field is quoted. For
example, this single CSV row will break your process:

1,2,“foo
Blah1”,bar

I think that this can be handled easily by this approach:
to extract a record from the csv file, continue reading lines
until the number of double quotes in the record is even.
Something like

record = “”
begin
record << gets.chomp
end until record.count( ‘"’ ) % 2 == 0

dreamwave · August 16, 2007, 9:27pm

On Aug 16, 10:56 am, Alex Y. [email protected] wrote:

The only way around that is to actually parse the file, so unless you
For example:
=> [“foo”, “bar”, “"foo,bar"”]

That breaks at least for empty fields, fields with newlines, and fields
with ‘"’ in them.

I think that this will work correctly with any complete csv record.

class String
def csv
if include? ‘"’
ary =
“#{chomp},”.scan( /\G"([^“](?:“”[^"]))“,|\G([^,”]),/ )
raise “Bad csv record:\n#{self}” if $’ != “”
ary.map{|a| a[1] || a[0].gsub(/”“/,'”') }
else
ary = chomp.split( /,/, -1)
## “”.csv ought to be [“”], not [], just as
## “,”.csv is [“”,“”].
if [] == ary
[“”]
else
ary
end
end
end
end

dreamwave · August 16, 2007, 9:28pm

On Aug 16, 2:12 pm, William J. [email protected] wrote:

OK … first of all, define “huge” and what are your restrictions? Let
for being able to write that in Ruby and get it debugged before someone

I think that this can be handled easily by this approach:
to extract a record from the csv file, continue reading lines
until the number of double quotes in the record is even.
Something like

record = “”
begin
record << gets.chomp
end until record.count( ‘"’ ) % 2 == 0

The “chomp” is a mistake.

record = “”
begin
record << gets
end until record.count( ‘"’ ) % 2 == 0

dreamwave · August 16, 2007, 10:43pm

PeÃ±a, Botp wrote:

From: rio4ruby [mailto:[email protected]]

require ‘rio’

rio(‘filename.csv’).chomp.lines(/Blah[^,]*/) do |line,m|

rio(m) + ‘.csv’ << line + $/

end

simply amazing. btw, how does rio handle big files, does it load them
whole in memory?

thanks for rio.
kind regards -botp

it seems things have been amped a few levels of complication since my
first few post lol. The quote above might seem like the cleanest way to
do this, however if i use this method…ill still need the commas,
because when u take a csv and put it in simple text, the commas are what
seperate the columns. so maybe it should look something like this?

require ‘rio’

rio(‘filename.csv’).chomp.lines(/Blah1/) do |line,m|
rio(m) + ‘.csv’ << line + $/
end

dreamwave · August 17, 2007, 3:50am

On Aug 16, 1:43 pm, Michael L. [email protected] wrote:

thanks for rio.
rio(‘filename.csv’).chomp.lines(/Blah1/) do |line,m|
rio(m) + ‘.csv’ << line + $/
end

–
Posted viahttp://www.ruby-forum.com/.

You are right, that was a poorly thought out regular expression.
One could also use Rio’s csv mode (which uses the stdlib csv):

rio(‘filename.csv’).chomp.csv.records(/Blah\w*/) do |rec,m|
rio(m.to_s + ‘.csv’).csv << rec
end

But this also is definately NOT a robust solution.

dreamwave · August 17, 2007, 3:12am

Grep a csv?

require ‘rio’

rio(‘filename.csv’).chomp.lines(/Blah[^,]*/) do |line,m|

rio(m) + ‘.csv’ << line + $/

end

> simply amazing. btw, how does rio handle big files, does it

load them whole in memory?

Never. Examples that assume I have a file small enough to load into

memory irritate me.