Regexp for CSV header

azgar · June 17, 2009, 5:31pm

My script currently is processing various csv files. The top row/header
resembles this format:

Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you answered
the online timed retrieval quiz?,3) B19. If you want your product to be
easy to find in the supermarket then you should make its container,“4)
C19. So that he can shift attention between the radio and his
incessantly talking girl friend when she is in the car, Joe adjusts his
radio”,5) B20. Early selection is most likely to occur for,6) C20.
Early selection for a red target is most likely to occur when there
is,“7) B21. In a lexical decision task, when the target is a bird name,
e.g. robin, it is usually preceded by the prime BODY but is sometimes
preceded by the prime BIRD.”

Most of the headers begin ‘1)’, ‘5)’, etc. I need to remove this from
the csv files. Another problem I’ve encountered while doing this is that
some of the headers are encased in double quotes like, ‘“4)4) C19. So
that he can shift attention between the radio and his incessantly
talking girl friend when she is in the car, Joe adjusts his radio”, 5)
B20’

I have tried connveting the top row from an array to a string and then
gsub(/[\d]+)/,’’). This kinda works. It is unable to deal with the
double quote problem. It also replaces with whitespace, which I don’t
want. Also, I can’t figure out how to put it back in the array as it was
then write it back to the csv.

Help would be appreciated. Thanks.

azgar · June 18, 2009, 12:27am

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here’s an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require “rubygems”
require “faster_csv”

read a line of CSV

fields = FCSV.parse_line(DATA.read)

edit the fields

fields.each do |f|
f.sub!(/\A\d+)\s*/, “”)
end

show fields

puts fields

write back out as CSV

puts FCSV.generate_line(fields)

END
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,“4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio”,5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,“7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD.”

Hope that helps.

James Edward G. II

azgar · June 18, 2009, 12:28am

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here’s an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require “rubygems”
require “faster_csv”

read a line of CSV

fields = FCSV.parse_line(DATA.read)

edit the fields

fields.each do |f|
f.sub!(/\A\d+)\s*/, “”)
end

show fields

puts fields

write back out as CSV

puts FCSV.generate_line(fields)

END
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,“4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio”,5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,“7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD.”

Hope that helps.

James Edward G. II

azgar · June 18, 2009, 7:44am

James G. wrote:

On Jun 17, 2009, at 10:31 AM, Paul Shapiro wrote:

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here’s an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require “rubygems”
require “faster_csv”

read a line of CSV

fields = FCSV.parse_line(DATA.read)

edit the fields

fields.each do |f|
f.sub!(/\A\d+)\s*/, “”)
end

show fields

puts fields

write back out as CSV

puts FCSV.generate_line(fields)

END
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,“4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio”,5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,“7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD.”

Hope that helps.

James Edward G. II

#!/usr/bin/env ruby

require ‘rubygems’
require ‘roo’
require ‘csv’
require ‘fileutils’
require ‘rio’
require ‘fastercsv’

FileUtils.mkdir_p “/Users/pshapiro/Desktop/Excel/xls”
FileUtils.mkdir_p “/Users/pshapiro/Desktop/Excel/tmp”
FileUtils.mkdir_p “/Users/pshapiro/Desktop/Excel/csv”

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/*.xls"]
for file in @filesxls
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/xls")
end

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/xls/.xls"]
@filetmp = Dir["/Users/pshapiro/Desktop/Excel/xls/.xls_tmp"]

for file in @filesxls
convert = Excel.new(file)
convert.default_sheet = convert.sheets[0]
convert.to_csv(file+"_tmp")
end

@filestmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filestmp
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/tmp")
end

dir = “/Users/pshapiro/Desktop/Excel/tmp/”
files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + File.basename(f, ‘.*’)
File.rename(oldFile, newFile)
end

files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + f + “.csv”
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/tmp/*.csv"]

for file in @filescsv
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/csv")
end

FileUtils.rm_rf("/Users/pshapiro/Desktop/Excel/tmp")

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
5.times {
text=""
File.open(file,“r”){|f|f.gets;text=f.read}
File.open(file,“w+”){|f| f.write(text)}
}
end

dir = “/Users/pshapiro/Desktop/Excel/csv/”
files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + File.basename(f, ‘.*’) + “.tmp”
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
lastc = csv.headers.length-1

puts lastc

rio(file).csv.skipcolumns(1…2,lastc) > rio(file+".csv").csv(’,’)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
FileUtils.remove(file)
end

dir = “/Users/pshapiro/Desktop/Excel/csv”
files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + File.basename(f, ‘.*’)
File.rename(oldFile, newFile)
end

2.times {
files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + File.basename(f, ‘.*’)
File.rename(oldFile, newFile)
end
}

files = Dir.entries(dir)
files.each do |f|
next if f == “.” or f == “…”
oldFile = dir + “/” + f
newFile = dir + “/” + f + “.csv”
File.rename(oldFile, newFile)
end

#####################################

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
csv = csv.to_s
fields = FCSV.parse_line(csv)

fields.each do |f|
f.sub!(/[\d]+)+[\s]/,’’)
end

puts fields

wline = FCSV.generate_line(fields)
astring = rio(file).contents
rio(file).csv.print(astring).close

text=""
File.open(file,“r”){|f|f.gets;text=f.read}
File.open(file,“w+”){|f| f.write(text)}

astring = rio(file).contents
rio(file).csv.print(wline+astring).close
end

Again, Thanks!!!