Forum: Ruby Inline Assembly / Inline C

Posted by neomex (Guest)
on 2012-09-17 18:49
(Received via mailing list)
Hello,

Is it possible to use inline C or Assembly in Ruby?

Thanks,
Alex
Posted by Peter Zotov (Guest)
on 2012-09-17 18:54
(Received via mailing list)
neomex писал 17.09.2012 20:48:
> Hello,
>
> Is it possible to use inline C or Assembly in Ruby?
>
> Thanks,
> Alex

Using inline assembly on platforms where Ruby runs doesn't
make sense these days, and you can write inline C with e.g.
RubyInline: https://github.com/seattlerb/rubyinline

You can (obviously) use inline assembly in inline C, but I'd
repeat that you aren't going to need it.
Posted by Aleksander Ciesielski (Guest)
on 2012-09-17 19:06
(Received via mailing list)
W dniu 2012-09-17 18:52, Peter Zotov pisze:
> RubyInline: https://github.com/seattlerb/rubyinline
>
> You can (obviously) use inline assembly in inline C, but I'd
> repeat that you aren't going to need it.
>
>
Wouldn't it improve performance in tasks such as image editions, where
you have to loop through big amount of data? Or give access to features
normally impossibile to accomplish in C or Ruby?
Posted by Peter Zotov (Guest)
on 2012-09-17 19:52
(Received via mailing list)
Aleksander Ciesielski писал 17.09.2012 21:05:
>> make sense these days, and you can write inline C with e.g.
>> RubyInline: https://github.com/seattlerb/rubyinline
>>
>> You can (obviously) use inline assembly in inline C, but I'd
>> repeat that you aren't going to need it.
>>
>>
> Wouldn't it improve performance in tasks such as image editions,
> where you have to loop through big amount of data? Or give access to
> features normally impossibile to accomplish in C or Ruby?

Basically there are two kinds of operations which require stepping
down to assembly: vectorized instructions and ring-0 system management
instructions. You can't use the latter anyway, and a good modern C
compiler can do vectorizing better than humans do.

The amount of downsides for using assembly is enormous. It's dependent
on your architecture and precise type and version of the toolchain, to
say at least; you can forget about portability.
Posted by Bartosz Dziewoński (matmarex)
on 2012-09-17 19:53
(Received via mailing list)
2012/9/17 Aleksander Ciesielski <neomex@onet.eu>:
> Wouldn't it improve performance in tasks such as image editions, where you
> have to loop through big amount of data?

Not really. Compilers are smarter than us these days; they can
interchange loops and vectorize the instructions to speed them up (on
platforms which support it). Writing such code by hand would be
painful, platform-dependent and error-prone.


> Or give access to features normally
> impossibile to accomplish in C or Ruby?

Do you have any in mind? Me neither.


-- Matma Rex
Posted by Timothy G. (timothy_g60)
on 2012-09-18 01:59
Bartosz Dziewoński wrote in post #1076335:
> Not really. Compilers are smarter than us these days; they can
> interchange loops and vectorize the instructions to speed them up (on
> platforms which support it). Writing such code by hand would be
> painful, platform-dependent and error-prone.
> -- Matma Rex

Compilers can optimise better, but only if the code is written in such a 
way as to let them know it's safe. For example:

void add_numbers(int* a, int* b, int* results, unsigned count) {
  unsigned i;
  for (i = 0; i < count; ++i) {
    results[i] = a[i] + b[i];
  }
}

The compiler can unroll that loop a bit, but it will never be able to 
vectorise the arithmetic. Why? Because the pointers a, b and results 
could overlap. Vectorising can change the result, so the compiler will 
never do it.

You could use restrict to tell the compiler to assume these don't 
overlap:

void add_numbers(int restrict* a, int restrict* b, int restrict* 
results, unsigned count) {

That can lead to unexpected results if you pass overlapping ranges 
though - restrict is quite dangerous. A lot of high performance code 
works by explicitly unrolling:

void add_numbers(int* a, int* b, int* results, unsigned count) {
  unsigned i;

  /* Process in blocks of 4 */
  int r1, r2, r3, r4;
  for (i = 0; i + 3 < count; i += 4) {
    /* Compute first */
    r1 = a[i] + b[i];
    r2 = a[i + 1] + b[i + 1];
    r3 = a[i + 2] + b[i + 2];
    r4 = a[i + 3] + b[i + 3];

    /* Save second */
    results[i] = r1;
    results[i + 1] = r2;
    results[i + 2] = r3;
    results[i + 3] = r4;
  }

  /* Finish portion not divisible by 4 */
  for (; i < count; ++i) {
    results[i] = a[i] + b[i];
  }
}

The second is logically equivalent to a vectorised loop, even if the 
ranges overlap, so the compiler is entitled to vectorise if it's 
worthwhile. Of course it now can't not unroll the loop. Actually testing 
this case shows the unrolled version as being slower for me :D

Compilers are pretty smart, but they can't change the behaviour of your 
code. Nobody should be writing in assembly any more, but to squeeze 
performance out of those really tight loops you still have to understand 
what's going on down there.

Cheers,

Tim
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.