File manipulation


#1

Hi All,

I have one more question with file manipulation.

Suppose I have the following structure in a file :

Instance J1 (
net n1()
net n2 ()
net n3()
)
Instance J2 (
net n1()
net n2()
net n3()
)
Instance J3 (
net n1()
net n2()
net n3()
)

As an example, I want to read J3/net n3.
(the files are huge …in GB)
I can grep for n3 but it will return 3 instances of n3.

How can I ensure that Im reading n3 values that belong to instance
J3?

Thank you for your time. I really appreciate your help.

Thanks
Vandana.


#2

On 2008.10.27., at 21:39, Vandana wrote:

)

As an example, I want to read J3/net n3.
(the files are huge …in GB)
I can grep for n3 but it will return 3 instances of n3.

How can I ensure that Im reading n3 values that belong to instance
J3?

If I got it correctly, something like this might work:

data[/Instance J3(.+?)^)/m, 1]

Cheers,
Peter


#3

Vandana wrote:

Suppose I have the following structure in a file :

Instance J1 (
net n1()
net n2 ()
net n3()
)
Instance J2 (
net n1()
net n2()
net n3()
)
Instance J3 (
net n1()
net n2()
net n3()
)

As an example, I want to read J3/net n3.
(the files are huge …in GB)
I can grep for n3 but it will return 3 instances of n3.

How can I ensure that Im reading n3 values that belong to instance
J3?

Using (Unix shell command) grep, or using Ruby?

In Ruby you could just set a variable whenever you see a line matching
/Instance \S+/, so when you see a line matching /n3/ you can check what
the preceding Instance was.

Reading multiple gigabytes this way is never going to be efficient,
unless you have enough GB to keep the whole dataset in RAM. If not, then
consider indexing the data, perhaps with something like cdb. This would
let you jump immediately to the data for instance J3 without scanning
through the whole file.


#4

On Mon, 27 Oct 2008 13:35:28 -0700, Vandana wrote:

Instance J2 (
As an example, I want to read J3/net n3. (the files are huge …in GB)
I can grep for n3 but it will return 3 instances of n3.

How can I ensure that Im reading n3 values that belong to instance J3?

Thank you for your time. I really appreciate your help.

Thanks
Vandana.

If your file was in XML, you could use REXML::Parsers::SAX2Parser (or
some other SAX parser) to create a solution that turns on an
@inInstanceJ3 variable when it encounters the right piece of data, turns
off @inInstanceJ3 when it encounters the matching close tag, and records
n3’s only when @inInstanceJK3 is true.

For your format, I suggest you find a parser generator that lets you
create actions for different grammar elements, to implement a similar
solution.