Gem for basic code protecction?

Hi,

I’m looking for an already made tool to mangle source code of a ruby
program. By mangling I mean renaming all identifiers (classes, methods,
variables, constants) with autogenerated alphanumeric ones, stripping
all code indentation etc. Something that would prevent an easy code
understanding and copy&paste in another project.

I know there are some limitations like code evaled from a string or
invoking methods indirectly by sending their name to an object, but
that’s all I can live with and keep care.
Ideally something like bytecode compiler for YARV based rubys.

Does anybody know such or similar code mangler ?

You could use JRuby, and compile to java bytecode. But as for an
obfuscator/munger, I don’t know. Sorry.

On Tue, Oct 1, 2013 at 8:40 AM, David U. [email protected]
wrote:

Does anybody know such or similar code mangler ?

I wonder whether that would be possible at all. For example, for a
library all your public methods are part of the interface and you must
not change them. Then, if you have code in your program which uses
send() (probably even constructing the method name dynamically) or
accesses instance variables via name I am not sure whether obfuscating
would be possible. If it was, you might end up with loads of readable
strings in the “obfuscated” code rendering the obfuscation useless.

Kind regards

robert

On 10/02/2013 06:58 PM, Stu wrote:

You can make your own, ultimately you’ll grok reverse engineering as well.

There is also RubyEncoder:

It’s not free nor have I personally used it, so I can’t vouch for it.
It claims to do what you want and more though.

BTW, this is exactly like DRM. You’re giving people something of value
locked in a safe (your encoded software) as well as the instructions for
how to get it out (the decoder and interpreter). This will really only
prevent casual inspection of your code. Don’t trust it with any code of
significant value. If game companies and movie studios can’t truly
solve this problem, what hope do you have? :wink:

-Jeremy

Clever, amazing how much is up for sale nowadays. Dealt with encryption
for
years on my bsd boxes. I wouldn’t pay a dime for that. Trivial enough to
implement your own policy.

It looks like I’ve been misunderstood. I don’t intend to annoy customers
or trying to protect code with DRM, serials or other obtrusive ways.
I’m trying to protect algorithms our code contains to prevent misuse by
other subject that would simply copy & paste from our sources and with
slight modifications would sell it as its own. You can raise an
objection there are laws & lawyers that can protect us however in our
country it’s a very long, complicated and expensive process with an
unclear result, during such period we would either lose incomes or even
bankrupt.

You can admit we were chosen a wrong tool for our task (offline, localy
installed ruby application) but in our team we are most experienced with
rails and ruby and their standard libraries.

If there really is no such tool besides rubyencoder which does something
more and different then our goal is, can anybody recommend some gems
that would help at our attempt to write own source mangler ? Like
ruby_parser ?

You can make your own, ultimately you’ll grok reverse engineering as
well.

On 10/04/2013 07:28 AM, David U. wrote:

It looks like I’ve been misunderstood. I don’t intend to annoy customers
or trying to protect code with DRM, serials or other obtrusive ways.
I’m trying to protect algorithms our code contains to prevent misuse by
other subject that would simply copy & paste from our sources and with
slight modifications would sell it as its own.

I don’t think you were misunderstood at all. According to the
documentation, RubyEncoder will do exactly what you want. It should be
invisible to your users, and will prevent trivial access to your
sources. It should also be more robust against reversal than a simple
obfuscation tool is likely to be.

It /is/ effectively DRM though, just like DVDs and Blu-rays, but that’s
what you want given your needs. Things just work for regular users
accessing the work via an authorized means (the RubyEncoder runtime in
this case), while unauthorized access is made more difficult.

Personally, I wouldn’t put much stock in such means of protection for
long. Someone can find a way to do something similar even without
access to your sources, especially once your product proves that such
features are possible and valuable. If /how/ your code works is the key
to the value proposition of using your software vs. an alternative,
you’ll be in trouble no matter how your competition arises. Plan to
offer something more than just code, support or installation services
for the product for example. :wink:

You can raise an
objection there are laws & lawyers that can protect us however in our
country it’s a very long, complicated and expensive process with an
unclear result, during such period we would either lose incomes or even
bankrupt.

This is a valid point in just about any country. Copyright enforcement
can be expensive especially if you have to go to court, so you’re right
to not depend on it if you don’t have deep pockets.

You can admit we were chosen a wrong tool for our task (offline, localy
installed ruby application) but in our team we are most experienced with
rails and ruby and their standard libraries.

Honestly, this is a risk with any software you distribute, regardless of
the implementation language. Debuggers and decompilers exist even for
fully compiled languages, and if there is value in getting an
understanding of at least your algorithms, you can rest assured that
someone will use such tools on your product. Ruby makes the job of the
would-be hacker easier, but if your team is more productive using Ruby
than, say, C, it’s probably a reasonable trade off.

Something else to consider is writing your ultra secret stuff in C as a
Ruby extension. The bulk of your code is probably nothing special but
would be tedious to write in C for your team. There’s little need to
hide that stuff. The secret logic though is probably small enough to
write and maintain in C, and there are potential efficiency/performance
advantages to this approach as well. Again, it’s not a guarantee of
protection, but it’s probably the least involved method of hiding your
code and making reverse engineering of your algorithms difficult.

-Jeremy

On Sat, Oct 5, 2013 at 3:01 PM, David U. [email protected]
wrote:

not mangle object identifiers so besides some other kind of protection,
original code including original identifiers can be retrieved back. This
is a big no-no for me.

Apparently you did not reply to
Gem for basic code protecction? - Ruby - Ruby-Forum yet. Maybe you
overlooked it. I am still curios how you think anything could fulfill
your requirements.

Cheers

robert

Robert K. wrote in post #1123602:

On Sat, Oct 5, 2013 at 3:01 PM, David U. [email protected]
wrote:

not mangle object identifiers so besides some other kind of protection,
original code including original identifiers can be retrieved back. This
is a big no-no for me.

Apparently you did not reply to
Gem for basic code protecction? - Ruby - Ruby-Forum yet. Maybe you
overlooked it. I am still curios how you think anything could fulfill
your requirements.

Cheers

robert
I’ve seen your post and it just stated what am I aware of or already
said (invoking methods indirectly with send method, code evaled from a
string, handling of missing methods etc.) .

I think there is a way to do code mangling if kept some limitations like
access imported module objects fully qualified only and avoid
metaprogramming ‘tricks’ like above mentioned. It seems relatively
simple for a single source file and can imagine it would also work
similary for own modules if marked on top with some “magic” string in a
comment. Mangler then would allow to translate identifiers when required
from such modules and also would check cyclic dependencies.

On Sat, Oct 5, 2013 at 5:34 PM, David U. [email protected]
wrote:

your requirements.
from such modules and also would check cyclic dependencies.
As you are aware, I am far less optimistic. Let’s look at a simple
example without any metaprogramming “tricks” (whether Ruby without
those is still as productive or the same language is another
interesting question):

class Foo
include Enumerable
def each; yield 1; self; end
end

You define this class in your code and use Enumerable from the core.
Currently there is no formal way to specify that Enumerable relies on
classes including it to provide a method #each. So the obfuscator
would either change the name of #each to #89sd867d4as4343a3sd and any
call on an Enumerable method will fail OR it uses knowledge about core
modules to prevent this rename operation.

Now, assume Enumerable is another module - for example, in a gem or
other library you are using. The situation is much more difficult:
either provider of the obfuscator must keep track of all these
libraries and maintain a database of invalid rename operations. Or it
will rename and code will fail.

Even runtime analysis won’t help since there is no guarantee that all
code paths will be used. So how do you imagine any code mangler to
rectify these issues?

Kind regards

robert

Jeremy B. wrote in post #1123504:

I don’t think you were misunderstood at all. According to the
documentation, RubyEncoder will do exactly what you want. It should be
invisible to your users, and will prevent trivial access to your
sources. It should also be more robust against reversal than a simple
obfuscation tool is likely to be.

I’m sorry, but RubyEncoder does not do what I want. According to the
following post I found way to protect Source Code! :) - Ruby - Ruby-Forum it does
not mangle object identifiers so besides some other kind of protection,
original code including original identifiers can be retrieved back. This
is a big no-no for me.

Robert K. wrote in post #1123621:

On Sat, Oct 5, 2013 at 5:34 PM, David U. [email protected]
wrote:

your requirements.
from such modules and also would check cyclic dependencies.
As you are aware, I am far less optimistic. Let’s look at a simple
example without any metaprogramming “tricks” (whether Ruby without
those is still as productive or the same language is another
interesting question):

class Foo
include Enumerable
def each; yield 1; self; end
end

You define this class in your code and use Enumerable from the core.
Currently there is no formal way to specify that Enumerable relies on
classes including it to provide a method #each. So the obfuscator
would either change the name of #each to #89sd867d4as4343a3sd and any
call on an Enumerable method will fail OR it uses knowledge about core
modules to prevent this rename operation.

No eval, no method names sending, no monkey-patching, no mixins - can
live with a single inheritance if that helps. I’m aware there are some
standard ruby modules written as mixins, but in most cases they can be
avoided. Like your example with Enumerable can be rewritten as a
descendant of Enumerator. The initial idea is to limit possible Ruby
constructs (a subset of the language) to achieve unambiguity of
identifiers and write in the most explicit style as possible.

On 10/05/2013 08:01 AM, David U. wrote:

not mangle object identifiers so besides some other kind of protection,
original code including original identifiers can be retrieved back. This
is a big no-no for me.

You’ve done more research than I have on RubyEncoder. Still, that post
is from 2008. Have you tested to see if the current version is just as
easily circumvented?

I get it now that you have a very particular solution you’re looking to
use. Don’t let your preconceptions blind you to other possible
solutions though.

Did you consider my suggestion to convert the essential core of your
code into a C-based Ruby extension? Again, it’s clear that this
solution isn’t what you originally asked for. Just curious.

Robert has posted some pretty good points about why it would be
extremely challenging to do what you’re proposing. That’s not to say
you couldn’t find a way, but it seems that it’s a project you would have
to bake for yourself. If you succeed in finding or building such a
thing, I’m certain there would be interest. At least the company I work
for would likely take a look if you publish it.

Please do post again with details if you’re successful in solving this
the way you originally planned.

-Jeremy

David U. wrote in post #1123650:

No eval, no method names sending, no monkey-patching, no mixins - can
live with a single inheritance if that helps. I’m aware there are some
standard ruby modules written as mixins, but in most cases they can be
avoided. Like your example with Enumerable can be rewritten as a
descendant of Enumerator. The initial idea is to limit possible Ruby
constructs (a subset of the language) to achieve unambiguity of
identifiers and write in the most explicit style as possible.

You might as well write in C. There’s plenty of stuff out there on the
web (and in the ruby source itself) for writing C extensions for Ruby.
As far as I’m aware, JRuby has pretty decent support for C extensions,
especially ones that aren’t doing anything too fancy, as it sounds like
what you’re going for here (correct me if I’m wrong).

Matthew K. wrote in post #1123653:

You might as well write in C. There’s plenty of stuff out there on the
web (and in the ruby source itself) for writing C extensions for Ruby.
As far as I’m aware, JRuby has pretty decent support for C extensions,
especially ones that aren’t doing anything too fancy, as it sounds like
what you’re going for here (correct me if I’m wrong).

You are right and still consider rewriting more parts in C, however it
would took reasonable amount of time including testings and
complications with maintaining such are also questionable.
YARV Ruby also has a decent support for FFI which got even polished in
2.0 version - Fiddle module (former DL).

Jeremy B. wrote in post #1123654:

You’ve done more research than I have on RubyEncoder. Still, that post
is from 2008. Have you tested to see if the current version is just as
easily circumvented?
Not yet. Would give it another try however doubt it did changed on the
fundamental basis.

I get it now that you have a very particular solution you’re looking to
use. Don’t let your preconceptions blind you to other possible
solutions though.
Yep, I’m looking for a solution for our use case. If more generic way
would be discovered incidentally, I’d be more satisfied and probably a
wider benefit to be expected.

Did you consider my suggestion to convert the essential core of your
code into a C-based Ruby extension? Again, it’s clear that this
solution isn’t what you originally asked for. Just curious.
Currently there is some functionality solved by using external C libs,
but quite minimal (about 1-2% of total SLOC). Affraid to be a bit
cumbersome to write & maintain large part in C.

Robert has posted some pretty good points about why it would be
extremely challenging to do what you’re proposing. That’s not to say
you couldn’t find a way, but it seems that it’s a project you would have
to bake for yourself. If you succeed in finding or building such a
thing, I’m certain there would be interest. At least the company I work
for would likely take a look if you publish it.
I’d agree it looks like a quite challenging task. I’m right at the
beginning, just looked for a ready-made solution or some worth to
investigate attempt at least.

Please do post again with details if you’re successful in solving this
the way you originally planned.
Sure. On the other hand I have to admit we may also give up, after all
the specifications and limits will be declared and pinpointed.

-Jeremy

On Wed, Oct 16, 2013 at 11:07 AM, Ezra Z. [email protected] wrote:

What about taking the secret sauce algorithm and removing it from the
client side software you give out and putting it up as a hosted web
service API call on a server you control fully. The client will just
change to not run the computation locally and therefor not need the
secret code locally either so no need to obfuscate. A small one hit API
server is cheaply hosted and easily to fully secure so you do not leak
your secret code at all no matter what. Any code mangling is reversable
with enough effort but a properly configured server with an SSL HTTPS
API endpoint that hides the secret code is an easy way forward?

Well, but that changes runtime behavior dramatically: you depend on
network connectivity and access which introduces a whole lot of issues
on its own. Plus there is of course latency which is not acceptable
for all kinds of applications. Ah, and release handling also becomes
more difficult.

Kind regards

robert

What about taking the secret sauce algorithm and removing it from the
client side software you give out and putting it up as a hosted web
service API call on a server you control fully. The client will just
change to not run the computation locally and therefor not need the
secret code locally either so no need to obfuscate. A small one hit API
server is cheaply hosted and easily to fully secure so you do not leak
your secret code at all no matter what. Any code mangling is reversable
with enough effort but a properly configured server with an SSL HTTPS
API endpoint that hides the secret code is an easy way forward?

-Ezra