Forum: Ruby Complex CSV parsing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Stuart C. (Guest)
on 2009-01-13 15:43
I am trying to parse data from a file where the values are common
seperated, however there is slightly more to the file than just commas,
see below

for
(;;);{"t":"msg","c":"p_114000000","ms":[{"type":"msg","msg":{"text":"you
around"}]}

From reading around fasterCSV seems the way forward therefore I have
this code

require 'rubygems'
require 'faster_csv'

 FasterCSV.foreach("C:\\Documents and
Settings\\sjc\\Desktop\\p_1149549999=2[1].txt", :row_sep => ",") do
|row|
   puts row[0]
   break
end

However I am getting an error like this

C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/faster_csv.rb:1
650:in `shift': Illegal quoting on line 1.
(FasterCSV::MalformedCSVError)
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1568:in `loop'
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1568:in `shift'
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1513:in `each'
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1017:in `foreach'
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1191:in `open'
        from C:/Program
Files/ruby/lib/ruby/gems/1.8/gems/fastercsv-1.4.0/lib/fa
ster_csv.rb:1016:in `foreach'
        from C:/Documents and Settings/sjc/Desktop/test.rb:4

Can anyone help me out with this?

Many thanks
Brian C. (Guest)
on 2009-01-13 16:10
Stuart C. wrote:
> I am trying to parse data from a file where the values are common
> seperated, however there is slightly more to the file than just commas,
> see below
>
> for
> (;;);{"t":"msg","c":"p_114000000","ms":[{"type":"msg","msg":{"text":"you
> around"}]}

That is not valid CSV, so FasterCSV won't help you.

I think you'll need to describe more carefully how you want this input
line broken up, and give some more examples.

It looks to me like a nested structure. If every line has exactly the
same set of fields you may get away with a regexp. But if not, you may
have to write a full-blown parser for this language.

However, this may be sufficiently close to JSON that you could use an
existing JSON parser. http://www.json.org/

(But in that case, I don't know what the "for(;;);" is doing on the
front)
Mark T. (Guest)
on 2009-01-13 16:36
(Received via mailing list)
On Jan 13, 8:42 am, Stuart C. <removed_email_address@domain.invalid> wrote:
> I am trying to parse data from a file where the values are common
> seperated, however there is slightly more to the file than just commas,
> see below
>
> for
> (;;);{"t":"msg","c":"p_114000000","ms":[{"type":"msg","msg":{"text":"you
> around"}]}

That's not a CSV file. It looks like some sort of serialized data
structure. If you know how it was serialized, you should be able to
easily restore the structure.

If not, you can use Treetop to specify the grammar including the
balanced delimiters {}, (), and [], which apparently take precedence
over the commas, and perform the parsing.
Ollivier R. (Guest)
on 2009-01-14 14:35
(Received via mailing list)
In article <removed_email_address@domain.invalid>,
Stuart C.  <removed_email_address@domain.invalid> wrote:
>I am trying to parse data from a file where the values are common
>seperated, however there is slightly more to the file than just commas,
>see below
>
>for
>(;;);{"t":"msg","c":"p_114000000","ms":[{"type":"msg","msg":{"text":"you
>around"}]}

It looks more JSON than CSV, try using a JSON parser instead.
This topic is locked and can not be replied to.