Forum: GNU Radio proposed change to ugen to enable USRP to work well on NetBS

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
5e81c66258333eb8e665cc4814d0a6d5?d=identicon&s=25 Greg Troxel (Guest)
on 2006-05-03 20:03
(Received via mailing list)
At BBN we are working on a project involving teams of
cognitively-controlled software radios, funded by the US government.
As part of this, we will be using GNU Radio on NetBSD.  Due to the
current implementation of ugen(4) (generic USB devices), reads from
the USRP are not pipelined and transfer rates top out at around 4
MB/s.

Joanne has written a proposal to modify NetBSD's ugen(4) to get good
performance for the USRP.  We'd like feedback about the technical
approach.  (Once we have it working, the changes will be commited to
NetBSD-current.)

http://acert.ir.bbn.com/downloads/adroit/NetBSD-US...

--
        Greg Troxel <gdt@ir.bbn.com>
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-04 04:33
(Received via mailing list)
On Thursday 04 May 2006 03:30, Greg Troxel wrote:
> Joanne has written a proposal to modify NetBSD's ugen(4) to get good
> performance for the USRP.  We'd like feedback about the technical
> approach.  (Once we have it working, the changes will be commited to
> NetBSD-current.)
>
> http://acert.ir.bbn.com/downloads/adroit/NetBSD-US...

I think the ioctl would be the cleanest approach - it would not break
any
existing software.

I wonder if it may be possible to have the ioctl specify a packet size
and the
kernel will keep reading data of that packet size into the buffer as
long as
it isn't full. I think that would give you something that looks a lot
closer
to a normal device (normal being "what the unix IO model expects" :)

It would be nice if you could do a readv() and then
poll/kqueue/select/signal
to see when an iovec has been filled, however I suspect that would
require
severe modification of the kernel internals.
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-05 02:38
(Received via mailing list)
On Thursday 04 May 2006 11:58, Daniel O'Connor wrote:
> It would be nice if you could do a readv() and then
> poll/kqueue/select/signal to see when an iovec has been filled, however I
> suspect that would require severe modification of the kernel internals.

Ah now I think about it, this is called "aio_read" :)

I don't know how widely supported it is - in FreeBSD it's optional (via
a
kernel option or KLD).

It does allow you to enqueue read requests and then later check if they
have
been completed. IMO this is the best match for the USB IO model, but my
USB
fu is fairly weak.

(And it's not much use if it isn't widely supported..)
5e81c66258333eb8e665cc4814d0a6d5?d=identicon&s=25 Greg Troxel (Guest)
on 2006-05-05 19:51
(Received via mailing list)
"Daniel O'Connor" <darius@dons.net.au> writes:

> On Thursday 04 May 2006 11:58, Daniel O'Connor wrote:
>> It would be nice if you could do a readv() and then
>> poll/kqueue/select/signal to see when an iovec has been filled, however I
>> suspect that would require severe modification of the kernel internals.
>
> Ah now I think about it, this is called "aio_read" :)
>
> I don't know how widely supported it is - in FreeBSD it's optional (via a
> kernel option or KLD).

This seems to be a POSIX 1003.2 feature.  It doesn't exist in NetBSD.

This is essentially what the Linux USB driver does, but apparently
without using a standard interface.

> It does allow you to enqueue read requests and then later check if they have
> been completed. IMO this is the best match for the USB IO model, but my USB
> fu is fairly weak.

It's a reasonable match, but it's far more general than what the USRP
needs.  The problem arises because we're trying to use the USRP like a
traditional data input device, where data arrives and is buffered.
The current NetBSD ugen(4) code, and it seems the Linux code without
fusb, overload the read system call to both move data to userspace and
to inititiate a read operation over the USB.  aio (like Linux fusb)
allows one to avoid the temporal coupling of this overloading.

I don't see any reason why continuous read mode and aio are mutually
exclusive - one could still do an async read.

I think it comes down to two issues:

1) Do we want the user-space program to explicitly schedule all the
   hardware reads?

2) Is continuous read mode easier to implement than AIO?

I find user-space code managing lots of pending reads to be more
complex than it ought to be when we really want to just say "start
reading and keep going".

I didn't find a clear explanation of how AIO works with multiple
outstanding reads on the same file descriptor.

I'm not sure how hard AIO is to implement - the kernel already is
doing things async and the read system call is in tsleep.  But it's
harder to do that than continuous read mode.


Thanks for pointing out aio, though - I had forgotten about that.


--
        Greg Troxel <gdt@ir.bbn.com>
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-05 23:56
(Received via mailing list)
On Saturday 06 May 2006 03:19, Greg Troxel wrote:
> > I don't know how widely supported it is - in FreeBSD it's optional (via a
> > kernel option or KLD).
>
> This seems to be a POSIX 1003.2 feature.  It doesn't exist in NetBSD.

OK, a pity.

> This is essentially what the Linux USB driver does, but apparently
> without using a standard interface.

Right.

> It's a reasonable match, but it's far more general than what the USRP
> needs.  The problem arises because we're trying to use the USRP like a
> traditional data input device, where data arrives and is buffered.
> The current NetBSD ugen(4) code, and it seems the Linux code without
> fusb, overload the read system call to both move data to userspace and
> to inititiate a read operation over the USB.  aio (like Linux fusb)
> allows one to avoid the temporal coupling of this overloading.

OK.

> I don't see any reason why continuous read mode and aio are mutually
> exclusive - one could still do an async read.

Indeed, I just wanted to make sure you didn't implement a hack when the
real
thing would be easier :)
(ie if NetBSD did AIO then no kernel mods would be necessary)

> I think it comes down to two issues:
>
> 1) Do we want the user-space program to explicitly schedule all the
>    hardware reads?

I am not sure, but I think the answer is no (for the USRP case
especially)

> 2) Is continuous read mode easier to implement than AIO?

Yes :)
[since AIO has hooks deep in the kernel]

> I find user-space code managing lots of pending reads to be more
> complex than it ought to be when we really want to just say "start
> reading and keep going".
>
> I didn't find a clear explanation of how AIO works with multiple
> outstanding reads on the same file descriptor.

I'm not sure to be honest, but I believe it's just a queue of reads.

> I'm not sure how hard AIO is to implement - the kernel already is
> doing things async and the read system call is in tsleep.  But it's
> harder to do that than continuous read mode.

Absolutely, it would also be difficult to ensure standards compliance,
etc..
IMO not worth it for this one application. I remember there where plenty
of
subtle bugs fixed in the FreeBSD implementation and even then it is
still an
optional API due to potential bugs.

> Thanks for pointing out aio, though - I had forgotten about that.

I should try using it in FreeBSD and see if it works in practice :)

Good luck with your implementation!
99cc8dfb87a09ad9853e903cb1fc29d4?d=identicon&s=25 LRK (Guest)
on 2006-05-06 16:08
(Received via mailing list)
On Wed, May 03, 2006 at 02:00:23PM -0400, Greg Troxel wrote:
> approach.  (Once we have it working, the changes will be commited to
> NetBSD-current.)

I would like to run GR under FreeBSD so I am looking forward to your
progress on this.


Under FreeBSD (and probably NetBSD) the ugen(4) device is generic and
gets
assigned to any unknown device. I would suggest you look into getting a
number assigned to the USRP and a name assigned for it, say usrp(4). Any
application more serious than experimental would seem to justify it.


The ehci(4) driver is for USB 2.0 and the FreeBSD version contains
comments which indicate it may be the problem. I don't quite understand
all the intracacies here but the code seems much too complicated for the
job. Ah for the good ol' days when I/O was less than a hundred lines of
assembler code and two buffers were enough. :)



--
LRK
gr@ovillatx.sytes.net
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-07 05:02
(Received via mailing list)
On Saturday 06 May 2006 23:35, LRK wrote:
> I would like to run GR under FreeBSD so I am looking forward to your
> progress on this.
>
>
> Under FreeBSD (and probably NetBSD) the ugen(4) device is generic and gets
> assigned to any unknown device. I would suggest you look into getting a
> number assigned to the USRP and a name assigned for it, say usrp(4). Any
> application more serious than experimental would seem to justify it.

Creating a kernel driver won't magically make things go faster, it is
also
more complicated as you have to write a driver and the code to talk to
it..

IMO the best solution would be AIO but that isn't very portable, next
best is
hacking ugen to not have so many restrictions.

This is because of the USB IO model - the problem is that ugen does not
know
how big a block of data to request from the device, and unlike more
traditional devices it does make a difference (since USB allows you to
request certain size transfers as well as allow short transfers). The
fix for
ugen is to supply it with a hint to say what size transfer it should
queue up
after the current one has been done. This gives you more transfers in
flight
and greatly improves throughput.

There have been attempts to fix ugen in FreeBSD to do this but it didn't
work
very well because the API was not extended to provide the hint so it was
backed out.

A kernel driver could issue the multiple reads but it is a fair amount
of work
to write one, and a bit of a waste if ugen could be extended instead
(hence
benefiting other applications)
99cc8dfb87a09ad9853e903cb1fc29d4?d=identicon&s=25 LRK (Guest)
on 2006-05-07 17:51
(Received via mailing list)
On Sun, May 07, 2006 at 12:29:15PM +0930, Daniel O'Connor wrote:
> Creating a kernel driver won't magically make things go faster, it is also
> after the current one has been done. This gives you more transfers in flight
> and greatly improves throughput.
>
> There have been attempts to fix ugen in FreeBSD to do this but it didn't work
> very well because the API was not extended to provide the hint so it was
> backed out.
>
> A kernel driver could issue the multiple reads but it is a fair amount of work
> to write one, and a bit of a waste if ugen could be extended instead (hence
> benefiting other applications)

Obviously it would be neat to extend ugen if the fixes were generic but
if
there need to be USRP-specific fixes they would best be done in a
different
module. Maybe I'm not understanding this but it looks to me like ugen
just
responds with a code saying it will take a device if no other driver
wants
it. A copy of ugen named usrp could respond only to being offered a USRP
but the USRP should have a unique number assigned rather than the
general
one used now. If there was a driver unique to the USRP, it would not
need
to work with other USB devices, thus my suggesting that direction.

It also seems that the USRP tx and rx paths normally use the same block
size after each open. If that is right, the driver could simply use that
block size until the stream is closed, reading ahead on rx and providing
flow control on tx.

It appears the attempts to read the USRP at more than 4 MB/s just lock
and transfer no data. Changing the 'read' in libusb to just return as
though it had finished results in the 'test_usrp_standard_rx' giving
similar results at all speeds including the pattern of overrun errors.
The transfer rate calculated is very fast so the overrun error count
seems to be a function of the USRP code rather than actual overruns.

I guess this is getting much too complicated for the old guys like me
to comprehend so I'll offer encouragement and await a solution, sooner
the better.


--
LRK
gr@ovillatx.sytes.net
5e81c66258333eb8e665cc4814d0a6d5?d=identicon&s=25 Greg Troxel (Guest)
on 2006-05-07 18:09
(Received via mailing list)
LRK <gr@ovillatx.sytes.net> writes:

> Obviously it would be neat to extend ugen if the fixes were generic
> but if there need to be USRP-specific fixes they would best be done
> in a different module.

Agreed in genearl, but not that any USRP-specific changes are needed.

> Maybe I'm not understanding this but it looks to me like ugen just
> responds with a code saying it will take a device if no other driver
> wants it. A copy of ugen named usrp could respond only to being
> offered a USRP but the USRP should have a unique number assigned
> rather than the general one used now. If there was a driver unique
> to the USRP, it would not need to work with other USB devices, thus
> my suggesting that direction.

True, but then there's more forked code to maintain, which is a big
minus.

> It also seems that the USRP tx and rx paths normally use the same
> block size after each open. If that is right, the driver could
> simply use that block size until the stream is closed, reading ahead
> on rx and providing flow control on tx.

That'a s good point: write transactions need to be some speed,
controllable by the user.

> It appears the attempts to read the USRP at more than 4 MB/s just
> lock and transfer no data.

What system?  Could you be more precise?  On NetBSD one gets missing
data according to the test program (presumably due to overruns in the
on-USRP buffer because USB transactions don't happen fast enough).
But nothing else bad happens.

> Changing the 'read' in libusb to just return as though it had
> finished results in the 'test_usrp_standard_rx' giving similar
> results at all speeds including the pattern of overrun errors.

You mean if you change the code to just skip the reads?  I don't see
what you are trying to find out from this experiment.

>  The transfer rate calculated is very fast so the overrun error
> count seems to be a function of the USRP code rather than actual
> overruns.

I don't see how this follows.

--
        Greg Troxel <gdt@ir.bbn.com>
5e81c66258333eb8e665cc4814d0a6d5?d=identicon&s=25 Greg Troxel (Guest)
on 2006-05-07 18:18
(Received via mailing list)
[We're working on making USRP work well on NetBSD.]

  http://acert.ir.bbn.com/downloads/adroit/NetBSD-US...

Regarding BSD ugen(4) support for the USRP, I have a few questions
about desired transfer sizes.  With ugen now, and with Linux URBs, the
size of each USB transaction is controlled from user space.

1) Is the USB transaction size fixed by the USRP firmware?  Is it
   happy with a range of sizes?

2) Does the USRP ever send a short transaction on read?  Are there
   start/stop issues?

3) Has an optimal transaction size been determined?  This seems to be
   a latency/efficiency tradeoff, but the amount of on-board buffering
   seems key.

4) Is the write size tightly linked to the read size?  Do they have to
   be the same?


The initial proposal didn't discuss how to set the read transaction
size, and a way to do that is clearly needed.  This could easily be
part of setting the read buffer size.  For write, one also needs to
set the write transaction size, and limit the amount of buffered write
data, so perhaps a symmetric ioctl would be good; this would allow
different read and write sizes, and totally decouple reading and
writing.

--
        Greg Troxel <gdt@ir.bbn.com>
3596cfe1d579c65b9babd35e8787977c?d=identicon&s=25 Matt Ettus (Guest)
on 2006-05-07 21:27
(Received via mailing list)
Some notes on the data pipeline and buffering:

Max length USB2 packets are 512 bytes, and that is all that we use.

The FIFOs in the FPGA are each 8192 bytes (16 packets), one for TX and
one for RX.  This is limited by the RAM in the FPGA.  At full data speed
of 32 MB/s over the USB, this is 256 uS worth of buffer.

The buffering in the USB controller chip is 2K bytes each for in and
out, or 4 packets worth.

The interface between the FPGA and the FX2 will only transfer full
512-byte chunks.


On Sun, 2006-05-07 at 12:15 -0400, Greg Troxel wrote:
> 1) Is the USB transaction size fixed by the USRP firmware?  Is it
>    happy with a range of sizes?

Fixed by several factors.
>
> 2) Does the USRP ever send a short transaction on read?  Are there
>    start/stop issues?

No.

> 3) Has an optimal transaction size been determined?  This seems to be
>    a latency/efficiency tradeoff, but the amount of on-board buffering
>    seems key.

No, we haven't looked into short packets for better latency.

Matt
--
Matt Ettus <matt@ettus.com>
99cc8dfb87a09ad9853e903cb1fc29d4?d=identicon&s=25 LRK (Guest)
on 2006-05-08 00:53
(Received via mailing list)
On Sun, May 07, 2006 at 12:06:31PM -0400, Greg Troxel wrote:
>
> > It appears the attempts to read the USRP at more than 4 MB/s just
> > lock and transfer no data.
>
> What system?  Could you be more precise?  On NetBSD one gets missing
> data according to the test program (presumably due to overruns in the
> on-USRP buffer because USB transactions don't happen fast enough).
> But nothing else bad happens.

This test used FreeBSD 6.1-RC and GR updated from CVS a couple of weeks
ago. 1.8 GHz 686-class CPU, VIA VT6202 USB 2.0 controller on
motherboard.


This was discussed before and there seemed to be agreement that it does
not look right:

  Running test_usrp_standard_rx with different decimation ( -D ) values.

xfered 1.34e+08 bytes in 68.1 seconds.  1.97e+06 bytes/sec.  cpu time =
0.3424
noverruns = 3
xfered 1.34e+08 bytes in 33.6 seconds.  3.998e+06 bytes/sec.  cpu time =
0.3356
noverruns = 1
xfered 1.34e+08 bytes in 32.8 seconds.  4.088e+06 bytes/sec.  cpu time =
0.3381
noverruns = 167
xfered 1.34e+08 bytes in 32.8 seconds.  4.091e+06 bytes/sec.  cpu time =
0.334
noverruns = 83
xfered 1.34e+08 bytes in 32.8 seconds.  4.092e+06 bytes/sec.  cpu time =
0.3337
noverruns = 41

  Note that the overrun counts go down as the speed should be going up.
The USRP is queried for overruns and answers some with the error code.
How many seems only to depend on the decimation rate. The first two
tests
may have different overrun counts but the last three are always the same
as seen above (maybe +/- 1 count).


> > Changing the 'read' in libusb to just return as though it had
> > finished results in the 'test_usrp_standard_rx' giving similar
> > results at all speeds including the pattern of overrun errors.
>
> You mean if you change the code to just skip the reads?  I don't see
> what you are trying to find out from this experiment.

I changed libusb to skip the actual reads. Since this happens fast, one
would expect the USRP to return every call as an error since no read was
actually done. The test doesn't actually check the data so it thinks it
got the data in the time it took to run the test and calculates the high
transfer rate:

xfered 1.34e+08 bytes in 0.157 seconds.  8.523e+08 bytes/sec.  cpu time
= 0.01688
noverruns = 614
xfered 1.34e+08 bytes in 0.0817 seconds.  1.642e+09 bytes/sec.  cpu time
= 0.01491
noverruns = 319
xfered 1.34e+08 bytes in 0.0422 seconds.  3.178e+09 bytes/sec.  cpu time
= 0.01432
noverruns = 164
xfered 1.34e+08 bytes in 0.0211 seconds.  6.372e+09 bytes/sec.  cpu time
= 0.01363
noverruns = 82
xfered 1.34e+08 bytes in 0.0207 seconds.  6.475e+09 bytes/sec.  cpu time
= 0.01324
noverruns = 41

> >  The transfer rate calculated is very fast so the overrun error
> > count seems to be a function of the USRP code rather than actual
> > overruns.
>
> I don't see how this follows.

Since the modified test doesn't actually transfer data and now gets
similar
results for the first two tests, it appears the last three probably
never
actually transfer the data.

It also seems that if I command the USRP to send data but never issue a
read,
it would return an error on every call to check overrun and the numbers
from
the test are not powers of two.


Doesn't make sense to me so I mentioned it as a possible clue.


--
LRK
gr@ovillatx.sytes.net
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-08 02:45
(Received via mailing list)
On Monday 08 May 2006 01:18, LRK wrote:
> > A kernel driver could issue the multiple reads but it is a fair amount of
> > work to write one, and a bit of a waste if ugen could be extended instead
> > (hence benefiting other applications)
>
> Obviously it would be neat to extend ugen if the fixes were generic but if
> there need to be USRP-specific fixes they would best be done in a different
> module. Maybe I'm not understanding this but it looks to me like ugen just

That's where you are wrong. Writing a new module means writing (or
copying)
new code. If you copy ugen that means that any fixes for it need to get
duplicated. Also, since the USRP isn't a "main stream" device it is
likely OS
changes will break it unless the maintainer is very active.

If you write a driver from scratch (which would be pretty silly given
that
ugen is almost exactly what you need) then you will take longer and have
to
fix more bugs.

> It also seems that the USRP tx and rx paths normally use the same block
> size after each open. If that is right, the driver could simply use that
> block size until the stream is closed, reading ahead on rx and providing
> flow control on tx.

Yes, but I think extended ugen to allow the user program to supply a
hint to
say "here is the block size, and keep reads queued" (via ioctl for
exampe) is
probably a lot simpler than creating a new driver.

> It appears the attempts to read the USRP at more than 4 MB/s just lock
> and transfer no data. Changing the 'read' in libusb to just return as
> though it had finished results in the 'test_usrp_standard_rx' giving
> similar results at all speeds including the pattern of overrun errors.
> The transfer rate calculated is very fast so the overrun error count
> seems to be a function of the USRP code rather than actual overruns.

The problem is that with no back to back transfer you are wasting slots
(USB
divides time into slots it will transfer data in). The limit of
4Mbytes/sec
is because of the number of slots per second (divided by 2 I guess)
multipled
by the maximum block size for a bulk transfer.

> I guess this is getting much too complicated for the old guys like me
> to comprehend so I'll offer encouragement and await a solution, sooner
> the better.

It's OK, USB is complicated for younger minds too 8-)
C7587810780b7d714e062e93c6955868?d=identicon&s=25 Daniel O'Connor (Guest)
on 2006-05-08 02:45
(Received via mailing list)
On Monday 08 May 2006 04:54, Matt Ettus wrote:
> Some notes on the data pipeline and buffering:
[snip]

Seems like a hint to tell ugen what size block to read ahead with should
work
very well.

I wonder what bugs you'll find in the EHCI code though ;)
745d8202ef5a58c1058d0e5395a78f9c?d=identicon&s=25 Eric Blossom (Guest)
on 2006-05-11 06:08
(Received via mailing list)
On Sun, May 07, 2006 at 12:15:07PM -0400, Greg Troxel wrote:
>   [We're working on making USRP work well on NetBSD.]
>
>   http://acert.ir.bbn.com/downloads/adroit/NetBSD-US...
>
> Regarding BSD ugen(4) support for the USRP, I have a few questions
> about desired transfer sizes.  With ugen now, and with Linux URBs, the
> size of each USB transaction is controlled from user space.
>
> 1) Is the USB transaction size fixed by the USRP firmware?  Is it
>    happy with a range of sizes?

We currently always use 512 bytes on the Tx and Rx end points.  This
of course only works with USB 2.0.  We may want to vary this when we
start adding fixed headers to the packets, but it wouldn't be the end
of the world if the packets stayed at 512 bytes.

> 2) Does the USRP ever send a short transaction on read?  Are there
>    start/stop issues?

We currently don't every send a short read.

> 3) Has an optimal transaction size been determined?  This seems to be
>    a latency/efficiency tradeoff, but the amount of on-board buffering
>    seems key.

I haven't run the experiment.

> 4) Is the write size tightly linked to the read size?  Do they have to
>    be the same?

No.  No.

> The initial proposal didn't discuss how to set the read transaction
> size, and a way to do that is clearly needed.  This could easily be
> part of setting the read buffer size.  For write, one also needs to
> set the write transaction size, and limit the amount of buffered write
> data, so perhaps a symmetric ioctl would be good; this would allow
> different read and write sizes, and totally decouple reading and
> writing.

Decoupling them seems like a good idea.

>         Greg Troxel <gdt@ir.bbn.com>

Eric
745d8202ef5a58c1058d0e5395a78f9c?d=identicon&s=25 Eric Blossom (Guest)
on 2006-05-11 06:15
(Received via mailing list)
On Sun, May 07, 2006 at 05:50:48PM -0500, LRK wrote:
> This test used FreeBSD 6.1-RC and GR updated from CVS a couple of weeks
> xfered 1.34e+08 bytes in 33.6 seconds.  3.998e+06 bytes/sec.  cpu time = 0.3356
> How many seems only to depend on the decimation rate. The first two tests
> may have different overrun counts but the last three are always the same
> as seen above (maybe +/- 1 count).

As currently implemented, the overrun/underrun detection is
implemented by the host library polling the USRP at approximately
10Hz, based on sample counting.  Thus the absolute number of overruns
returned does not mean what you might expect.  You should think of it
as more of a binary value:  overruns <= 1 implies things are "good".

Eric
This topic is locked and can not be replied to.