FIFO latency

luislavena · May 28, 2011, 8:07pm

I evaluated latency of a FIFO (actually an ordinary pipe, but the kernel
mechanisms are identical), and measured 30usecs average on my
1.2GHz AMD Phenom system with plenty 'o memory.

I sent timestamps across the FIFO (struct timeval), and the reader
grabbed the local time of day, and computed the difference. There’s
a fair amount of uncertainty on the reader due to gettimeofday() call
overhead. But 30usec on a wimpy CPU is certainly comfortably
below 1msec.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · May 28, 2011, 8:43pm

On Sat, May 28, 2011 at 22:06, Marcus D. Leech [email protected]
wrote:

I evaluated latency of a FIFO (actually an ordinary pipe, but the kernel
mechanisms are identical), and measured 30usecs average on my
1.2GHz AMD Phenom system with plenty 'o memory.

I sent timestamps across the FIFO (struct timeval), and the reader grabbed
the local time of day, and computed the difference. There’s
a fair amount of uncertainty on the reader due to gettimeofday() call
overhead. But 30usec on a wimpy CPU is certainly comfortably
below 1msec.

gettimeofday() is a fast function. But if you want real high-fidelity

read CPU clock counter. Just make sure your app runs on a one
selected core.

Could you post your app and raw results? I’m interested in
min/mean/max values and distribution graphs. Because max values do
play role when playing with real-time.

–
Regards,
Alexander C…

Marcus_DSLeech · May 28, 2011, 10:17pm

On Sat, May 28, 2011 at 22:50, Marcus D. Leech [email protected]
wrote:

overhead. But 30usec on a wimpy CPU is certainly comfortably
<skip…>
I just run it like:

./latency_writer | ./latency_reader

Thank you for your tests. I slightly updated your tests to make them
less dependent of printf() timing which is very non-realtime.

I run the test with “chrt 80 ./run_test.sh”. Mean delay is much
better then yours (mean 8.2us), but latency jumps up and down even
under RT priority. This probably can be improved, or may be not.

If I run 10 tests with short sampling time, I get picture data.png
(raw data is in data.txt)
If I do a single run with long sampling time, I get picture
data2.png (raw data is in data2.txt)

You see, that delay is far from predictable. This may be possible to
improve and may be not - file operations in usual Linux are not meant
for real-time operation.

So, while this method is simple and good for non-realtime
applications, it doesn’t fit our needs. It may be usable for PHY<->MAC
interaction, but even here I’m not sure it would work well.

PS I test on Core 2 Duo 1.6 GHz with all the GUI stuff running.

Marcus_DSLeech · May 28, 2011, 8:51pm

gettimeofday() is a fast function. But if you want real high-fidelity

read CPU clock counter. Just make sure your app runs on a one
selected core.

Could you post your app and raw results? I’m interested in
min/mean/max values and distribution graphs. Because max values do
play role when playing with real-time.

====== latency_writer.c ========
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
main ()
{
struct timeval tv;
while (1)
{
gettimeofday (&tv, NULL);
fwrite (&tv, sizeof(tv), 1, stdout);
fflush (stdout);
usleep (250000);
}
}

============ latency_reader.c ==============
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
main ()
{
struct timeval now;
struct timeval sender;
long long int t1, t2;

 while (fread (&sender, sizeof(sender), 1, stdin) != 0)
 {
     gettimeofday (&now, NULL);
     t1 = sender.tv_sec * 1000000;
     t1 += sender.tv_usec;

     t2 = now.tv_sec * 1000000;
     t2 += now.tv_usec;
     fprintf (stderr, "%lld\n", t2 - t1);
 }

}

I just run it like:

./latency_writer | ./latency_reader

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · May 28, 2011, 10:30pm

On Sun, May 29, 2011 at 00:14, Alexander C.
[email protected] wrote:

a fair amount of uncertainty on the reader due to gettimeofday() call

better then yours (mean 8.2us), but latency jumps up and down even
for real-time operation.

So, while this method is simple and good for non-realtime
applications, it doesn’t fit our needs. It may be usable for PHY<->MAC
interaction, but even here I’m not sure it would work well.

PS I test on Core 2 Duo 1.6 GHz with all the GUI stuff running.

Ok, setting CPU affinity and cutting off startup artifacts definitely
helps.
Results are in attachment.
Still you can see quite some uncertainty.

Marcus_DSLeech · May 29, 2011, 12:29am

Just want to throw this out there because it seems relevant:
http://gnuradio.org/cgit/jblum.git/tree/gruel/src/include/gruel/high_res_timer.h?h=wip/high_res_timer&id=71b911d28a391ad0c67540e3658a6680d7449e1f

Marcus_DSLeech · May 29, 2011, 12:56am

On 05/28/2011 06:28 PM, Josh B. wrote:

Just want to throw this out there because it seems relevant:

http://gnuradio.org/cgit/jblum.git/tree/gruel/src/include/gruel/high_res_timer.h?h=wip/high_res_timer&id=71b911d28a391ad0c67540e3658a6680d7449e1f

Yup, I know about clock_gettime().

But really, that would only give us a bit finer resolution on an answer
that is already much coarser.

Knowing whether the average latency is 10.2usec or 10.157usec is not
that interesting

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · May 29, 2011, 1:06am

On 05/28/2011 04:28 PM, Alexander C. wrote:

So, while this method is simple and good for non-realtime
applications, it doesn’t fit our needs. It may be usable for PHY<->MAC
interaction, but even here I’m not sure it would work well.

PS I test on Core 2 Duo 1.6 GHz with all the GUI stuff running.
Ok, setting CPU affinity and cutting off startup artifacts definitely helps.
Results are in attachment.
Still you can see quite some uncertainty.

OK, so a roughly 3:1 improvement in peak latency, and somewhat better
predicability.

But I’d still counter-assert, to your assertion, that latencies in the
10s-of-usec are entirely acceptable for
a wide-range of “real-time” applications, even with occasional
latency excursions that increase the variability
by 50:1 or so.

I can well imagine that they aren’t acceptable for your application.
I mean, if all applications were the same, it would
be a very boring world, with most of us working at fast-food
restaurants

But I’ll stand by my original suggestion that use of FIFOs are an
acceptable technique for a wide variety of applications, including
“real-time” applications, depending on constraints and requirements.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · May 29, 2011, 9:59pm

On Sun, May 29, 2011 at 1:22 AM, Alexander C.
[email protected] wrote:

helps.
by 50:1 or so.
Sure, I don’t say that no one should use queues

Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page

Would it be possible (legally) for the closed source and open source
threads to share the same “memory space”, and use interrupts (or
semaphores) to trigger arrivals and departure of information. The only
aspect the two systems share is how to package and format the
information. This should work similar to DMA for software-hardware
interfaces.

?

–Colby

Marcus_DSLeech · May 30, 2011, 11:11am

On 05/29/2011 10:22 AM, Alexander C. wrote:

helps.
by 50:1 or so.
Sure, I don’t say that no one should use queues
I just want to say that it may not be suitable for applications with
more tight requirements - i.e. some alternative may be needed.

But to say truth - I’m surprised by their performance, I thought it
would be much worse. So it may be a good starting point from which we
could refine later.

Linux’ pipe implementation is known to be quite slow. I would suggest to
use UNIX sockets instead. They should perform much better in terms of
latency and performance.

Cheers,
Andre

Marcus_DSLeech · May 30, 2011, 3:51pm

On Sun, May 29, 2011 at 23:57, Colby B. [email protected]
wrote:

a wide-range of “real-time” applications, even with occasional latency
“real-time” applications, depending on constraints and requirements.
Regards,
semaphores) to trigger arrivals and departure of information. The only
aspect the two systems share is how to package and format the
information. This should work similar to DMA for software-hardware
interfaces.

IANAL, but AFAIK this was possible with GPLv2 is this usage was
considered “an intended usage”. Like running applications is an
intended usage of a kernel. I admit I’m not sure whether this works
with GPLv3 or not.

–
Regards,
Alexander C…

Marcus_DSLeech · May 29, 2011, 10:23am

On Sun, May 29, 2011 at 03:05, Marcus D. Leech [email protected]
wrote:

Results are in attachment.

I can well imagine that they aren’t acceptable for your application. I
mean, if all applications were the same, it would
be a very boring world, with most of us working at fast-food restaurants

But I’ll stand by my original suggestion that use of FIFOs are an acceptable
technique for a wide variety of applications, including
“real-time” applications, depending on constraints and requirements.

Sure, I don’t say that no one should use queues
I just want to say that it may not be suitable for applications with
more tight requirements - i.e. some alternative may be needed.

But to say truth - I’m surprised by their performance, I thought it
would be much worse. So it may be a good starting point from which we
could refine later.

–
Regards,
Alexander C…

Marcus_DSLeech · May 30, 2011, 3:52pm

On Mon, May 30, 2011 at 12:54, Andre P.
[email protected] wrote:

Ok, setting CPU affinity and cutting off startup artifacts definitely
excursions that increase the variability

use UNIX sockets instead. They should perform much better in terms of
latency and performance.

Good idea.

–
Regards,
Alexander C…

Marcus_DSLeech · May 30, 2011, 3:56pm

On 30/05/2011 9:51 AM, Alexander C. wrote:

Linux’ pipe implementation is known to be quite slow. I would suggest to
use UNIX sockets instead. They should perform much better in terms of
latency and performance.
Good idea.

I’m dubious of such a claim–the core mechanisms between Unix-domain
sockets and FIFOs are very similar.

While it’s true that it used to be the case that pipes/FIFOs were
handled as disk files, that’s no longer true–they
just implement ring-buffer objects within the kernel, and Unix-domain
sockets are also quite similar–in fact, they
are likely higher overhead, because they have to go through the
labyrinthine socket stack, which FIFOs don’t.

I did my part to put together a FIFO test, so if someone wants to do a
Unix-domain socket benchmark we could settle
that question.

Marcus_DSLeech · May 30, 2011, 6:44pm

On Mon, May 30, 2011 at 18:30, Andre P.
[email protected] wrote:

There are various papers out there dealing with IPC mechanisms in Linux.
There is at least one [1] that indicates that IPC is performing quite
good. On the other hand, I’ve seen others claiming the opposite.
Unfortunately, I don’t have any recent performance measurements
available personally. But I agree, would be interesting to see some
up-to-date benchmark results.

It would be even more interesting to have a set of tests which one
could run on its very own hardware. GnuRadio may be used from high-end
x86 to all flavors of ARMs with a variety of kernel versions and
options and it’s hard to say once for all.

[1] http://osnet.cs.binghamton.edu/publications/TR-20070820.pdf

–
Regards,
Alexander C…

Marcus_DSLeech · May 30, 2011, 4:31pm

On 05/30/2011 03:55 PM, Marcus D. Leech wrote:

While it’s true that it used to be the case that pipes/FIFOs were
handled as disk files, that’s no longer true–they
just implement ring-buffer objects within the kernel, and Unix-domain
sockets are also quite similar–in fact, they
are likely higher overhead, because they have to go through the
labyrinthine socket stack, which FIFOs don’t.

I did my part to put together a FIFO test, so if someone wants to do a
Unix-domain socket benchmark we could settle
that question.

There are various papers out there dealing with IPC mechanisms in Linux.
There is at least one [1] that indicates that IPC is performing quite
good. On the other hand, I’ve seen others claiming the opposite.
Unfortunately, I don’t have any recent performance measurements
available personally. But I agree, would be interesting to see some
up-to-date benchmark results.

Cheers,
Andre

[1] http://osnet.cs.binghamton.edu/publications/TR-20070820.pdf