Software stopping of runaway realtime processes?

danhobbs · February 9, 2007, 8:18pm

Hey,

I’m hoping someone has run into this problem before… We have a that
lab we access remotely, and sometimes the students get a little
overzealous in e.g. their rate selection, and the machines become
unresponsive. I seem to be having a hard time getting them to stop using
realtime mode.

2 Questions:

Can I somehow disable their ability to use realtime mode? The
software fails gracefully in that case, I just can’t find anything
online about how to make the enable_realtime() call fail.
Can I set up some sort of watchdog process to auto-kill a process
that is clearly going to cause the machine to become unresponsive? I’m
not even sure how to identify such a process…

Thanks,

Dan

danhobbs · February 9, 2007, 9:40pm

On Fri, Feb 09, 2007 at 11:17:12AM -0800, Dan H. wrote:

Hey,

I’m hoping someone has run into this problem before… We have a that
lab we access remotely, and sometimes the students get a little
overzealous in e.g. their rate selection, and the machines become
unresponsive. I seem to be having a hard time getting them to stop using
realtime mode.

Yep, you can blow your foot off

2 Questions:

Can I somehow disable their ability to use realtime mode? The
software fails gracefully in that case, I just can’t find anything
online about how to make the enable_realtime() call fail.

If they have access to root, it’s kind of hard to keep them from
hosing themselves:

rm -fr /

In general, the only GNU Radio application I run as root is
tunnel.py. That’s because it opens the the tun interface, and because
I’ve been too lazy to code up the suggestion below. Sounds like you
may have a higher motivation level than I do

In order to run tunnel.py (or other programs using tap/tun) without
being root, you could implement a small setuid-root wrapper that would
acquire the CAP_NET_ADMIN capability, drop root, then exec python
or whatever.

You’ll probably also need a udev rule to set the permissions on
/dev/net/tun to something that they can open without being root,
perhaps making it rw for group usrp. This is similar to the rule that
we use to enable the USRP to be opened without being root.

http://gnuradio.org/trac/wiki/UdevConfig

If they’re not running as root (or holding CAP_SYS_NICE), the call
sched_setscheduler (the system call that enables realtime) will fail.

To reduce the likelihood of people wedging the machine accidentally,
probably the easiest thing to do is to add a command line parameter
that is required to be set to enable realtime mode. I suspect that
would reduce wedging of the machine through the choice of bad
parameters. Once they’ve got a set of parameters that do work, they
could, if they thought it useful, pass the “–enable-realtime” flag.

[I wedged a couple of machines in the ORBIT testbed this way a while back. It took me a while to figure out what happened. Why are these machines crashing??? The machines were less powerful than the machines I normally ran on, and they couldn’t keep up. Ooops…]

Can I set up some sort of watchdog process to auto-kill a process
that is clearly going to cause the machine to become unresponsive? I’m
not even sure how to identify such a process…

The traditional way to handle this problem is with a shell session,
connected via a serial link, running at higher priority than any of
your experiments. The default real-time priority set by
gr_enable_realtime_scheduling is in the middle of the RT priorities.

You might be able to cook up some kind of “UDP packet of death” that
would kill all python processes belonging to a particular user.
The question is whether the networking stack will get enough cycles to
deliver the packet. The serial port trick is known to work.

FWIW, the code that enables real time scheduling is contained in
gnuradio-core/src/lib/runtime/gr_realtime.cc

Eric

danhobbs · February 9, 2007, 10:50pm

Eric B. wrote:

If they’re not running as root (or holding CAP_SYS_NICE), the call
sched_setscheduler (the system call that enables realtime) will fail.

They’re not running as root, I just added the Ubuntu udev rules on the
website. I never explicitly enabled the SYS_CAP_NICE, it just happened,
and I can’t figure out how to remove it.

-Dan

danhobbs · February 10, 2007, 3:57am

Dan H. wrote:

I can’t think of what else to try…

/etc/security/limits.conf?

Frank

danhobbs · February 9, 2007, 11:52pm

Dan H. wrote:

Eric B. wrote:

If they’re not running as root (or holding CAP_SYS_NICE), the call
sched_setscheduler (the system call that enables realtime) will fail.

They’re not running as root, I just added the Ubuntu udev rules on the
website. I never explicitly enabled the SYS_CAP_NICE, it just happened,
and I can’t figure out how to remove it.

Update: I installed lcap, and ran lcap 23, which disables CAP_SYS_NICE:

root@tanami:~# lcap
Current capabilities: 0xFF7FFEFF
   0) *CAP_CHOWN                   1) *CAP_DAC_OVERRIDE
   2) *CAP_DAC_READ_SEARCH         3) *CAP_FOWNER
   4) *CAP_FSETID                  5) *CAP_KILL
   6) *CAP_SETGID                  7) *CAP_SETUID
   8)  CAP_SETPCAP                 9) *CAP_LINUX_IMMUTABLE
  10) *CAP_NET_BIND_SERVICE       11) *CAP_NET_BROADCAST
  12) *CAP_NET_ADMIN              13) *CAP_NET_RAW
  14) *CAP_IPC_LOCK               15) *CAP_IPC_OWNER
  16) *CAP_SYS_MODULE             17) *CAP_SYS_RAWIO
  18) *CAP_SYS_CHROOT             19) *CAP_SYS_PTRACE
  20) *CAP_SYS_PACCT              21) *CAP_SYS_ADMIN
  22) *CAP_SYS_BOOT               23)  CAP_SYS_NICE
  24) *CAP_SYS_RESOURCE           25) *CAP_SYS_TIME
  26) *CAP_SYS_TTY_CONFIG         27) *CAP_MKNOD
  28) *CAP_LEASE                  29) *CAP_AUDIT_WRITE
  30) *CAP_AUDIT_CONTROL
    * = Capabilities currently allowed

If I as root attempt to set the priority of a process via nice, it
fails:

root@tanami:~# nice -n -50 top
nice: cannot set niceness: Permission denied

The same thing happens to user processes.

However, if I run an application that calls the GNU Radio enable_runtime
function, that call succeeds and Python gets priority -50. Even though I
have disabled CAP_SYS_NICE system-wide. This happens even if I log out
all users and re-log in or do so remotely via ssh, do so with a user
that’s not even in the wheel group, etc.

I can’t think of what else to try…

-Dan

danhobbs · February 10, 2007, 5:59pm

On Fri, Feb 09, 2007 at 09:56:24PM -0500, Frank B. wrote:

Dan H. wrote:

I can’t think of what else to try…

/etc/security/limits.conf?

Frank

Never noticed that before. Is it part of SELINUX?

Eric