Inline Assembly / Inline C

aris · September 17, 2012, 6:49pm

Hello,

Is it possible to use inline C or Assembly in Ruby?

Thanks,
Alex

neomex · September 17, 2012, 6:54pm

neomex писал 17.09.2012 20:48:

Hello,

Is it possible to use inline C or Assembly in Ruby?

Thanks,
Alex

Using inline assembly on platforms where Ruby runs doesn’t
make sense these days, and you can write inline C with e.g.
RubyInline: GitHub - seattlerb/rubyinline

You can (obviously) use inline assembly in inline C, but I’d
repeat that you aren’t going to need it.

neomex · September 17, 2012, 7:06pm

W dniu 2012-09-17 18:52, Peter Z. pisze:

RubyInline: GitHub - seattlerb/rubyinline

You can (obviously) use inline assembly in inline C, but I’d
repeat that you aren’t going to need it.

Wouldn’t it improve performance in tasks such as image editions, where
you have to loop through big amount of data? Or give access to features
normally impossibile to accomplish in C or Ruby?

neomex · September 17, 2012, 7:53pm

2012/9/17 Aleksander C. [email protected]:

Wouldn’t it improve performance in tasks such as image editions, where you
have to loop through big amount of data?

Not really. Compilers are smarter than us these days; they can
interchange loops and vectorize the instructions to speed them up (on
platforms which support it). Writing such code by hand would be
painful, platform-dependent and error-prone.

Or give access to features normally
impossibile to accomplish in C or Ruby?

Do you have any in mind? Me neither.

– Matma R.

neomex · September 17, 2012, 7:52pm

Aleksander C. писал 17.09.2012 21:05:

make sense these days, and you can write inline C with e.g.
RubyInline: GitHub - seattlerb/rubyinline

You can (obviously) use inline assembly in inline C, but I’d
repeat that you aren’t going to need it.

Wouldn’t it improve performance in tasks such as image editions,
where you have to loop through big amount of data? Or give access to
features normally impossibile to accomplish in C or Ruby?

Basically there are two kinds of operations which require stepping
down to assembly: vectorized instructions and ring-0 system management
instructions. You can’t use the latter anyway, and a good modern C
compiler can do vectorizing better than humans do.

The amount of downsides for using assembly is enormous. It’s dependent
on your architecture and precise type and version of the toolchain, to
say at least; you can forget about portability.

neomex · September 18, 2012, 1:59am

Bartosz Dziewoński wrote in post #1076335:

Not really. Compilers are smarter than us these days; they can
interchange loops and vectorize the instructions to speed them up (on
platforms which support it). Writing such code by hand would be
painful, platform-dependent and error-prone.
– Matma R.

Compilers can optimise better, but only if the code is written in such a
way as to let them know it’s safe. For example:

void add_numbers(int* a, int* b, int* results, unsigned count) {
unsigned i;
for (i = 0; i < count; ++i) {
results[i] = a[i] + b[i];
}
}

The compiler can unroll that loop a bit, but it will never be able to
vectorise the arithmetic. Why? Because the pointers a, b and results
could overlap. Vectorising can change the result, so the compiler will
never do it.

You could use restrict to tell the compiler to assume these don’t
overlap:

void add_numbers(int restrict* a, int restrict* b, int restrict*
results, unsigned count) {

That can lead to unexpected results if you pass overlapping ranges
though - restrict is quite dangerous. A lot of high performance code
works by explicitly unrolling:

void add_numbers(int* a, int* b, int* results, unsigned count) {
unsigned i;

/* Process in blocks of 4 /
int r1, r2, r3, r4;
for (i = 0; i + 3 < count; i += 4) {
/ Compute first */
r1 = a[i] + b[i];
r2 = a[i + 1] + b[i + 1];
r3 = a[i + 2] + b[i + 2];
r4 = a[i + 3] + b[i + 3];

/* Save second */
results[i] = r1;
results[i + 1] = r2;
results[i + 2] = r3;
results[i + 3] = r4;

}

/* Finish portion not divisible by 4 */
for (; i < count; ++i) {
results[i] = a[i] + b[i];
}
}

The second is logically equivalent to a vectorised loop, even if the
ranges overlap, so the compiler is entitled to vectorise if it’s
worthwhile. Of course it now can’t not unroll the loop. Actually testing
this case shows the unrolled version as being slower for me

Compilers are pretty smart, but they can’t change the behaviour of your
code. Nobody should be writing in assembly any more, but to squeeze
performance out of those really tight loops you still have to understand
what’s going on down there.

Cheers,

Tim