Ruby Drops

Jake_McArthur · April 28, 2006, 6:31pm

Must keep you busy, checking all those submissions…

Matt

On 28 Apr , 2006, at 12:09 PM, Matthew M. wrote:

Quote: “To date, the main application of Moss has been in detecting
plagiarism in programming classes.”

Really… I’ve been known to do other things.

–
Matt Long [email protected] /
[email protected]
University of South Florida, CRASAR
GnuPG public key: http://www.csee.usf.edu/~mtlong/public_key.html

The wars of the future will not be fought on the battlefield or at
sea. They will be fought in space, or possibly on top of a very tall
mountain. In either case, most of the actual fighting will be done by
small robots. And as you go forth today remember always your duty is
clear: To build and maintain those robots. Thank you.

-The Simpsons

Jake_McArthur · April 28, 2006, 8:06pm

You have a point. This would not be caught. I see only a few
potential solutions, but they are kind of a hassle:

a) No world editing. The creator of a branch can give access to other
people at will, but only explicitly.
b) No automatic running of tests. Updating to recent revisions of
dependencies prompts the developer to review the changes to the code
before running tests, and then gives an option to run the tests and/
or revert to an older version that works.
c) Both of the above.
d) A sort of karma system in which a dependency update will update
and test code automatically that is edited by “trusted” developers,
but uses method B for any code that has been edited (“tainted”) by
untrusted developers. A developer gains karma in the network whenever
somebody decides to keep their revisions in their own projects. A
developer loses karma whenever somebody updates a dependency,
examines their code, and decides not to use it. You can define the
karma threshold at which an update is automatically tested and
integrated. A dependency itself may also be marked as editable only
to those with a high enough karma. This is my favorite idea here, but
also the most difficult to implement.
e) Some programmatic way of checking for or eliminating the
possibility of malicious code. (???)

Jake McArthur

Jake_McArthur · April 28, 2006, 8:18pm

I’m working on something similar also, although it’s hardly even
started, and it isn’t optimized for code sharing.

Also, somebody mentioned they were working on techniques to identify
similar strings. I think what you’re looking for may be Levenstein
distance.

–
Giles B.
http://www.gilesgoatboy.org

Jake_McArthur · April 29, 2006, 12:24am

[skip]

accordingly if they desire to do so.
I am afraid the problem here is: what I would do even if I find that
some of
my code repeats code in repository? I already had written the code,
so,
where is my benefits? To understand “Oops, I’m and idiot” ? I already
knew
:))

Much more useful repository must help me to find the code I only
intend to
write - and here code comparison isn’t necessary, because I have nothing
to
compare yet.

What do you think about this?

Jake McArthur

Victor.

Jake_McArthur · April 28, 2006, 8:40pm

You’re right. It is very similar. We do, however, have slightly
different ideas. Yours seems to be a repository based around nuggets
of code which is searched by comparing the nuggets’ “intent” with
your own. I really like it too, but it’s not the same.

Mine is to compare code directly, even code that normally wouldn’t be
classified as a stand-alone “nugget,” like inline code inside large
projects, code that is a bit interspersed with other code, etc. In
this way, similarities within individual projects can be located and
factored out. This approach seems to focus less on explicitly sharing
everything (and trying to make code work for everybody) and more
on getting your own project done, with the improvement of the
collective code base for everybody coming almost as a side-effect.

Jake McArthur

Jake_McArthur · May 3, 2006, 7:08pm

Jake McArthur wrote:

There are many benefits:

a) You help others to not repeat themselves (obvious).
b) You open parts of your code up so that others have reason to find and
fix your bugs.
c) It creates a much more useful repository of code than ordinarily
because this is code that people actually are using and maintaining, not
just things people figured might be useful later.

d) You find not only the bit you’ve already written, but also the bit
that goes with it that you were about to write.

Jake_McArthur · April 29, 2006, 3:47am

There are many benefits:

a) You help others to not repeat themselves (obvious).
b) You open parts of your code up so that others have reason to find
and fix your bugs.
c) It creates a much more useful repository of code than ordinarily
because this is code that people actually are using and maintaining,
not just things people figured might be useful later.

The point isn’t to see that you repeated somebody when you shouldn’t
have. The point is to see the repeat and save everybody the trouble
and factor it into one central location. It’s not too difficult; like
you said, you already wrote the code. It’s just a matter of branching
it off and making it shiny. Code that has been factored out would, of
course, be tagged and searchable so that people (or you) can look
for the code they “intend” to write.

Jake McArthur