Why alignment feature work only when output_multiple not set

Hi,
After go through the block_executor.cc, I found alignment feature work
only when output_multiple not set, why them can not work at the same
time?


Thanks
Tiankun

Hi Tom,
Thanks your reply, I have another question, in function
“min_available_space” why buffer_size/2 is best ?

??? 2015???06???16??? 21:31, Tom R. ???:

On Tue, Jun 16, 2015 at 8:57 AM, Tiankun Hu [email protected]
wrote:

Hi,
After go through the block_executor.cc, I found alignment feature work
only when output_multiple not set, why them can not work at the same time?


Thanks
Tiankun

Because they are competing objectives. The alignment tries to keep
buffers
aligned and therefore the number of items will be based on a multiple of
the alignment requirement. If you need an output multiple that’s
different
than that, which one should the scheduler choose?

Note that the buffers always begin on a page and so are inherently
aligned.
If your output multiple is also a multiple of the alignment for the data
type, then you’ll always be aligned.

Tom

Hi Tiankun, Hi Tom,

I vaguely remember me wondering about that line. It comes from the
single threaded scheduler; back then, wondering, I came to the
conclusion that for the STS that it’s probably been considered useful to
half the the “buffer usage ripple” that could occur when blocks started
to always use the full buffer. That sounds helpful in a scheduler where,
per iteration, you either do something or you don’t, so a buffer without
any space left to write to might be worse than the overhead that you get
by only being allowed to use buffer/2. Basically, back then, I had
another problem, and just shrugged.

Now, for the thread-per-block scheduler, there’s no monolithic iteration
over all blocks, so I guess that problem wouldn’t occur. I would have to
try; however, it’s hard to find a single proper benchmark for this (if
it doesn’t break anything). Tiankun, maybe you have an idea?

Best regards,
Marcus

On Thu, Jun 18, 2015 at 8:29 AM, Tiankun Hu [email protected]
wrote:

Hi Tom,
Thanks your reply, I have another question, in function
“min_available_space” why buffer_size/2 is best ?

I’m not really sure. That’s a question for Eric.

What happens when you change it? How does it affect performance?

Tom

On Fri, Jun 19, 2015 at 6:09 AM, Tom R. [email protected] wrote:

On Thu, Jun 18, 2015 at 8:29 AM, Tiankun Hu [email protected]
wrote:

Hi Tom,
Thanks your reply, I have another question, in function
“min_available_space” why buffer_size/2 is best ?

I’m not really sure. That’s a question for Eric.

The scheduler aims to keep the buffer no more than half full, so that in
the steady state the producing block can write into the free half and
the
downstream consuming block(s) read from the filled half.

But I don’t know why do it can increase parallelism?

Assume the following situation

file_source-> multiply_const -> file_sink

assume the complete file fits into the buffer, and n_best wouldn’t be
buffer/2, then file_source would read the whole file, and write it to
the buffer. Only then, multiply_const would have something to work. It
would also consume the whole input buffer at once, and produce the full
output buffer. Only then, file_sink could start to work.

so, at no time, more than one block would be active. You could that run
on a single-core computer, and wouldn’t see any difference to a
multi-core computer.

Even if the file is larger than the buffer, then multiply_const couldn’t
do anything, because it must wait for the whole time that file_sink
needs to write away a complete buffer.

If you restrict the maximum number of produced samples to half the
output buffer size, the multipy_const block would much more often be
able to execute, whilst file_sink is still writing away half of the
buffer.

Greetings,
Marcus

Hi Tom, Johnathan,
Thanks your reply.

Hi Marcus,
Seems your conclusion is make sense.
I think it might relate with buffer size which was allocated in
“flat_flowgraph.cc / allocate_buffer”

 // *2 because we're now only filling them 1/2 way in order to
 // increase the available parallelism when using the TPB scheduler.
 // (We're double buffering, where we used to single buffer)
 int nitems = s_fixed_buffer_size * 2 / item_size

But I don’t know why do it can increase parallelism?

??? 2015???06???19??? 22:48, Johnathan C. ???:

Hi Marcus,
Sorry for responding lately. I got it, thanks your reply

??? 2015???06???20??? 23:05, Marcus M??ller ???: