Complex CSV parsing


#1

I am trying to parse data from a file where the values are common
seperated, however there is slightly more to the file than just commas,
see below

for
(;;);{“t”:“msg”,“c”:“p_114000000”,“ms”:[{“type”:“msg”,“msg”:{“text”:“you
around”}]}

From reading around fasterCSV seems the way forward therefore I have
this code

require ‘rubygems’
require ‘faster_csv’

FasterCSV.foreach(“C:\Documents and
Settings\sjc\Desktop\p_1149549999=2[1].txt”, :row_sep => “,”) do
|row|
puts row[0]
break
end

However I am getting an error like this

C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/faster_csv.rb:1
650:in shift': Illegal quoting on line 1. (FasterCSV::MalformedCSVError) from C:/Program Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa ster_csv.rb:1568:inloop’
from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1568:in shift' from C:/Program Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa ster_csv.rb:1513:ineach’
from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1017:in foreach' from C:/Program Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa ster_csv.rb:1191:inopen’
from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1016:in `foreach’
from C:/Documents and Settings/sjc/Desktop/test.rb:4

Can anyone help me out with this?

Many thanks


#2

Stuart C. wrote:

I am trying to parse data from a file where the values are common
seperated, however there is slightly more to the file than just commas,
see below

for
(;;);{“t”:“msg”,“c”:“p_114000000”,“ms”:[{“type”:“msg”,“msg”:{“text”:“you
around”}]}

That is not valid CSV, so FasterCSV won’t help you.

I think you’ll need to describe more carefully how you want this input
line broken up, and give some more examples.

It looks to me like a nested structure. If every line has exactly the
same set of fields you may get away with a regexp. But if not, you may
have to write a full-blown parser for this language.

However, this may be sufficiently close to JSON that you could use an
existing JSON parser. http://www.json.org/

(But in that case, I don’t know what the “for(;;);” is doing on the
front)


#3

On Jan 13, 8:42 am, Stuart C. removed_email_address@domain.invalid wrote:

I am trying to parse data from a file where the values are common
seperated, however there is slightly more to the file than just commas,
see below

for
(;;);{“t”:“msg”,“c”:“p_114000000”,“ms”:[{“type”:“msg”,“msg”:{“text”:“you
around”}]}

That’s not a CSV file. It looks like some sort of serialized data
structure. If you know how it was serialized, you should be able to
easily restore the structure.

If not, you can use Treetop to specify the grammar including the
balanced delimiters {}, (), and [], which apparently take precedence
over the commas, and perform the parsing.


#4

In article removed_email_address@domain.invalid,
Stuart C. removed_email_address@domain.invalid wrote:

I am trying to parse data from a file where the values are common
seperated, however there is slightly more to the file than just commas,
see below

for
(;;);{“t”:“msg”,“c”:“p_114000000”,“ms”:[{“type”:“msg”,“msg”:{“text”:“you
around”}]}

It looks more JSON than CSV, try using a JSON parser instead.