Benchmark_* not working correctly

Dev_R · October 2, 2007, 1:11pm

Eric,

See reply embedded

On 10/1/07, Eric B. [email protected] wrote:

I believe that there is an additional requirement that the data passed
Hmmm. Does it ever use the 0RCR case? I would expect only the first
two. It may be reusing the fff simd code which generates all 4
alignments for the taps, but I wouldn’t expect to see the 0RCR or 000R
input cases.

Yes I do see the 0RCR or 0000R case. For example when I change the QA
code
to use stack allocation for the input (uncommenting a piece of code
that
was originally there, lines 110 and 111 in the QA code from trunk) the
check
will fail.
Input is at address 0xbcd87d4 this gets 16-byte aligned to address
0xbfcd87d0
This illustrates the 0RCR case.

real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?
yes, see example below.

If so, (1) where’s the input data coming from, (2) what version of the

compiler are you using?

In the example above the data was allocated on the stack from the qa
code
with
i_type input[INPUT_LEN]; //(i_type is gr_complex)

which will case the QA code fail
instead of

i_type *input = (i_type *)malloc16Align(INPUT_LEN *
sizeof(i_type));

which is in the QA code, and will make it pass.

I am using three different compiles / versions of gcc on two different
machines getting the same results

gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
gcc (GCC) 4.2.1 (Debian 4.2.1-3)
gcc (GCC) 4.1.2.20070502 (Red Hat 4.1.2-12)

However, back to your first point, if we are using the 0RCR case, then

the code is completely wrong, and I don’t see how it could ever pass
the QA tests (which it seem to). On the other hand, there could be
some problem with how the float taps are mapped across the complex
input (It’s been along time since I looked at the code…)

The QA tests are passing because they force the 16-byte alignment.

Thanks for looking at this!

Dev_R · October 2, 2007, 5:26pm

On Tue, Oct 02, 2007 at 04:10:07AM -0700, Tim M. wrote:

Eric,

See reply embedded

Thanks.

Hmmm. Does it ever use the 0RCR case? I would expect only the first
two. It may be reusing the fff simd code which generates all 4
alignments for the taps, but I wouldn’t expect to see the 0RCR or 000R
input cases.

Yes I do see the 0RCR or 0000R case. For example when I change the QA code
to use stack allocation for the input (uncommenting a piece of code that
was originally there, lines 110 and 111 in the QA code from trunk) the check
will fail.

OK, I’m not surprised by that. I wouldn’t consider that a problem,
unless we get in the habit of calling this with stack allocated
input. It could of course be worked around with an attribute((…))
on the array definition.

Input is at address 0xbcd87d4 this gets 16-byte aligned to address
0xbfcd87d0. This illustrates the 0RCR case.

real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?

the QA tests (which it seem to). On the other hand, there could be
some problem with how the float taps are mapped across the complex
input (It’s been along time since I looked at the code…)

The QA tests are passing because they force the 16-byte alignment.

OK. This is as expected.

In the production code, I.e., where you are seeing problems (was it
around gri_mmse_fir_interpolator?), do you see the alignment
problem occur?

If so, I think we should fix the caller. If the calling site is using
stack or heap allocated data, we should fix it there. If it’s using
input passed to it by “work” or “general_work”, they are already
aligned. In any case, we should add a check at the site of the call
to the SSE code that checks the alignment and raises an exception in
the bad cases. Of course the SSE code could be modified to handle the
other two alignment cases, but I’d like to know the performance cost
of doing it that way before committing to that path.

Summary question: is there an alignment problem when called from the
non-QA code? If so, where?

Eric

Dev_R · October 2, 2007, 6:55pm

On Tue, Oct 02, 2007 at 08:52:06AM -0700, Tim M. wrote:

http://gnuradio.org/trac/browser/gnuradio/trunk/gnuradio-core/src/lib/general/gr_mpsk_receiver_cc.h

Great troubleshooting!

I’ll fix it a bit later today. IIRC correctly there’s an align
attribute that should do the trick.

I agree. A check in the SSE code will be required if addressed by a caller
fix, else someone in the future will repeat the same effort we are going through
today.

I’ll add that too.

Thanks again for your efforts!

Eric

Dev_R · October 2, 2007, 5:52pm

On 10/2/07, Eric B. [email protected] wrote:

OK, I’m not surprised by that. I wouldn’t consider that a problem,
unless we get in the habit of calling this with stack allocated
input. It could of course be worked around with an attribute((…))
on the array definition.

agreed. But I am not sure placing a restriction (no stack allocation)
makes
sense.

In the production code, I.e., where you are seeing problems (was it
around gri_mmse_fir_interpolator?), do you see the alignment
problem occur?

In the production code the “input” is declared in gr_mpsk_receiver_cc.h
line
300

gr_complex d_dl[2*DLLEN];

http://gnuradio.org/trac/browser/gnuradio/trunk/gnuradio-core/src/lib/general/gr_mpsk_receiver_cc.h

If so, I think we should fix the caller. If the calling site is using

stack or heap allocated data, we should fix it there. If it’s using
input passed to it by “work” or “general_work”, they are already
aligned. In any case, we should add a check at the site of the call
to the SSE code that checks the alignment and raises an exception in
the bad cases. Of course the SSE code could be modified to handle the
other two alignment cases, but I’d like to know the performance cost
of doing it that way before committing to that path.

I agree. A check in the SSE code will be required if addressed by a
caller
fix,
else someone in the future will repeat the same effort we are going
through
today.

Summary question: is there an alignment problem when called from the

non-QA code? If so, where?

Yes
Line 300 of gr_mpsk_receiver_cc.h

Tim

Dev_R · October 3, 2007, 12:43am

The update seems to work. I re-ran configure and verified that the SSE
code
was being used. make check passes and the original code Dev had a
problem
with “benchmark_loopback.py” works correctly.

Tim

Dev_R · October 2, 2007, 11:36pm

Tim,

I’ve checked a trial fix into the trunk as of r6575.
Can you please update and let me know if it fixes your problem?

Thanks,
Eric

Dev_R · October 3, 2007, 9:42pm

Just reconfigured and tested, looks like benchmark_loopback is working
fine now on all my systems. Thanks everyone!

Dev

Dev_R · October 11, 2007, 1:01pm

does ./benchmark_loopback.py work?
If this works then the generic vs simd is not an issue for you.

If the loopback does not work I would try an svn update and rebuild.
Eric
fixed
the issue with the simd code last week.

The original poster (dev) had some issues with the _tx and _rx but I
have
been unable to
test that because I am without a transmit USRP module at the moment.

There is probably an easier way but to determine if you are using simd
and
generic take a look at the MD_CPU line in the Makefile in the filter
directory. If this says x86 you are using the simd code

Tim

Dev_R · October 3, 2007, 1:18am

On Tue, Oct 02, 2007 at 03:42:43PM -0700, Tim M. wrote:

The update seems to work. I re-ran configure and verified that the SSE code
was being used. make check passes and the original code Dev had a problem
with “benchmark_loopback.py” works correctly.

Tim

Thanks!

Eric