I have many sync blocks that work with large fixed size vectors, e.g.
converts one vector of size 12659 to another with size 18353. I have
just multiplied the sizeof(gr_complex) with 12659 and 18353 in the
signature. However, when the flow graph is running, then I get a
warning about paging: the circular buffer implementation allocates
large buffers (e.g. 4096 many to make the paging requirement). I do
not want really large buffers. I have implemented the whole thing with
padding, but that becomes also really inefficient, since when you want
to switch between vectors and streams, then you have to jump through
extra hoops with the padding. In a previous version I had streams
everywhere, but then there is absolutely no verification whether I
messed up the sizes of my “virtual vectors”.
So is there a way to work with large odd length vectors which does not
have this buffer allocation problem, and does not require padding? It
seems to me that it could be supported: regular streams but the vector
size would be verified separately at connection time and would not be
used to multiply the item size. Any advice is appreciated…
extra hoops with the padding. In a previous version I had streams
everywhere, but then there is absolutely no verification whether I
messed up the sizes of my “virtual vectors”.
So is there a way to work with large odd length vectors which does not
have this buffer allocation problem, and does not require padding? It
seems to me that it could be supported: regular streams but the vector
size would be verified separately at connection time and would not be
used to multiply the item size. Any advice is appreciated…
The best technique here is to round up your itemsize to the next integer
multiple of the machine page size, typically 4K. You can still operate
a
vector at a time, but you’ll have to do a little math to identify the
start
of each vector in the input and output buffers, as they will no longer
be
contiguous. It sounds like you might have already tried something like
this.
Yes, this is what I am doing, but it is not very nice, and you cannot
easily mix in blocks that want to work at the stream level. What
really bugs me that I think the scheduler could figure all out, and
treat my vectors as a stream, allocate nice buffers (who cares if the
vector fits in the buffer in an integer multiple times). Am I wrong
with this? I think this would be a nice further development… Miklos
The aligned-to-page-size buffer management is due to the way that mmap()
is used to mutliply-map these buffers into the address space.
That only “works” if the sizes are multiples of the native page size.
–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium
Yes, this is what I am doing, but it is not very nice, and you cannot
easily mix in blocks that want to work at the stream level. What
really bugs me that I think the scheduler could figure all out, and
treat my vectors as a stream, allocate nice buffers (who cares if the
vector fits in the buffer in an integer multiple times). Am I wrong
with this? I think this would be a nice further development… Miklos
On Wed, Aug 21, 2013 at 07:59:37PM +0200, Miklos M. wrote:
So is there a way to work with large odd length vectors which does not
have this buffer allocation problem, and does not require padding? It
seems to me that it could be supported: regular streams but the vector
size would be verified separately at connection time and would not be
used to multiply the item size. Any advice is appreciated…
Miklos,
if Johnathan’s tips aren’t helping, you might be able to use tags to
delimit vectors and then pass them as streams of scalars. You then have
to keep up with vector boundaries internally in your block.
Depending on what your application is, this could be a solution or can
make things even more inefficient. But it’s worth a try!
MB
–
Karlsruhe Institute of Technology (KIT)
Communications Engineering Lab (CEL)
Hi Martin, Yes, I know of stream tags, but it would just make the
blocks complicated: now I can rely on the fact that data is coming in
a multiple of the vector length. For now, padding solves my immediate
needs, but it is not an ideal solution. Miklos
Just to add my two cents:
Depending on your actual application, your large vectors might not
actually quite fit the idea of “streams”; they might, for example, be a
valid, decoded network packet or something of the like. If they don’t
need sample-synchronous handling, using messages to pass them around
might work well.
Downside of that is of course that you can’t use your favourite GR block
on messages. You break the sample synchronous architecture of a
flowgraph with multiple paths from source(s) to sink(s); and if you
convert from message to stream and back, you basically lose the vector
attribute of your data (or run into the same problems as before).
I can’t really tell you much about computational performance of passing
around large messages, however.
On the other hand, you can reduce your per-block coding overhead for
Martin’s suggested tag-based solution:
Write a base class that implements a input and output buffer and a
minimal state machine based on stream tag evaluation. Let your blocks
inherit from that. Always copy as many items from your general_work’s
input vector to your input buffer as you can, and copy as many samples
from the output buffer to your general_works output vector as possible.
Execute your computation when your input buffer is full and your output
buffer empty. That way, you’ll get a quasi-fixed relative rate, but get
all the freedom and scheduling disadvantages of a general_work block
with an itemsize of gr_complex (or whatever your data type is).
I know from experience that this might be hard to debug. However, once
your state machine is watertight, you’re not very likely to run into
issues later.
Yes, I understand the page size limitation. However, if your vector is
1234 bytes, then you can happily allocate 4096 size buffer, but the
the block you always give out the multiple of 1234 byes (i.e. 1, 2 or
3 vectors). The address space wrapping would work fine, so the start
of the vectors would not be always at the same place. I think it could
be done, the question is whether it is worth to do it.
with sync blocks and fixed rate decimators/interpolators, the scheduler
inherently knows how many buffers to allocate etc down the signal
processing line to always keep all blocks busy.
With general blocks, this is not possible; calls to forecast are
necessary to determine how much data needs to be supplied to keep the
signal processing chain running.
I’m not quite sure if there is a performance penalty for blocks that
just forecast a need for as many samples as they’re asked to produce or
if they are scheduled identically to sync blocks;
Documentation on the original GR scheduler is really really sparse and
the code itself is your primary source of help… I really can’t give
you any hints, as I’ve (most of the time) tried to get along without
going too deep into the GR scheduling framework (and hoping for
something more readible to come along for the next major release of GR )
On Thu, Aug 22, 2013 at 12:00 PM, Marcus Müller [email protected]
wrote:
run into the same problems as before).
computation when your input buffer is full and your output buffer empty.
That way, you’ll get a quasi-fixed relative rate, but get all the freedom
and scheduling disadvantages of a general_work block with an itemsize of
gr_complex (or whatever your data type is).
I know from experience that this might be hard to debug. However, once your
state machine is watertight, you’re not very likely to run into issues
later.
Thank you for the excellent advice. I did not thought of a generic
base class, which might help me. There is one block that produces one
vector of bytes (packet data) and one vector of ints (number of errors
corrected and uncorrected), which would be impossible to solve with a
stream since the output rate is not the same. Other than this, I think
your suggestion would work.
You say that there are scheduling disadvantages of a general_work
block. What are they? Sometimes I am running into issues with the
scheduler, but it is not clear how it really works, what should I try
to do and what should I try to avoid. Can you describe it in a few
words or give a pointer where I can read upon the technical (!!)
details.
Best,
Miklos
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.