Rspec + github == !submodules

crucoba · April 17, 2008, 4:43am

Hey all,

We ran into some problems last night with the repos up at github. We
initially thought it was a git-submodules bug. We learned that it was
something else, but in the process of trying to find the source of the
bug, we learned a few things about git-submodules and have decided
that it creates an unfortunate set of complications and dependencies
for a development effort like ours.

To that end, we have simplified. No more submodules. To update, do the
following:

cd rspec-dev
git pull
rake git:update

See GitHub: Let’s build from here · GitHub
for more info.

Cheers,
David

crucoba · April 17, 2008, 1:13pm

On Apr 17, 2008, at 5:20 AM, Jonathan L.
[email protected] wrote:

Hi David,

Is there any chance you could elaborate on the problems please, so
that
other projects can make an informed decision whether to use submodules
or not?

There is always a chance

I’ll be at a computer slightly bigger than my phone in a little while
and will follow up.

crucoba · April 17, 2008, 2:54pm

On Apr 17, 2008, at 7:11 AM, David C. wrote:

There is always a chance

What I learned was that submodules are great for things like consuming
plugins in your rails projects, but not so great for a development
effort in multiple people are pushing and pulling to multiple
repositories with dependencies and no multi-project transaction support.

The parent repository depends on specific versions of the subs. As
you’re making changes in your local repos, the last thing you do is
commit the parent with the references to the new versions of the subs.
Every time you make a change to any of the subs you have to commit a
change to the parent. These changes are useful as documentation if
you’re updating a plugin to the latest release. Not that useful if the
log is polluted with these for every commit to every submodule.

When you pull, you pull the parent first, and then use git-submodule
to pull the correct versions of the subs. The parent is in control of
the situation and it somewhat guarantees that you’re getting all the
right stuff. This is GREAT for consumers, but problematic for
contributors. And even then, if consumers are pulling from a
development branch while developers are pushing to it, then consumers
might run into problems.

Let’s say you’re doing a pull while I’m doing a push. If I push the
parent first, and I push it before you do, there is a chance that when
you go to pull the subs those versions of the subs might not be there
yet. Conversely, if I push the subs first and you grab an old parent,
you’ll be pulling old subs. No problem when you’re pulling, but it’s
going to create problems when you go to push because you’re that much
further down the history.

Of course, these problems exist even when you’re dealing with a single
repository on a team that believes in frequent commits, continuous
integration, etc. And just by virtue of the fact that we have several
repos with dependencies means that we’re going to run into conflicts
now and then. It just seems that the explicit references from parent
to children adds a layer of complexity to this for both consumers and
developers.

This all make sense?

crucoba · April 17, 2008, 4:30pm

Great information, David.

Sounds like a useful blog post!

On Thu, Apr 17, 2008 at 7:49 AM, David C. [email protected]
wrote:

submodules
you’re making changes in your local repos, the last thing you do is
contributors. And even then, if consumers are pulling from a

the
rake git:update
http://rubyforge.org/mailman/listinfo/rspec-users

–
Bryan R.

“Programming today is a race between software engineers striving to
build
bigger and better idiot-proof programs, and the Universe trying to
produce
bigger and better idiots. So far, the Universe is winning.”

crucoba · April 17, 2008, 11:21am

Hi David,

Is there any chance you could elaborate on the problems please, so that
other projects can make an informed decision whether to use submodules
or not?

Cheers,
Jon

On Wed, 2008-04-16 at 22:39 -0400, David C. wrote:

following:

rspec-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/rspec-users
–
Jonathan L.
http://jonathanleighton.com/

crucoba · April 18, 2008, 5:46am

On Thu, Apr 17, 2008 at 9:01 AM, Jonathan L.
[email protected] wrote:

one single project into a number of pieces and manage that project
through submodules? However, you do consider submodules to be a good
idea if you are using and wish to track third-party upstream code, for
example plugins in a Rails project?

RSpec was split into four repos…and it still is actually. But
originally the rspec-dev project was a superproject that included the
other three as submodules.

The problem with submodules is if two people are making changes to the
submodules at the same time.

Let’s say I work on the rspec submodule, and my final commit is
abc123. You work on the rspec submodule as well and your final commit
is def456. The superproject tracks the head of each submodule,
meaning we each need to commit a reference to the heads of rspec. At
some point you pull from my…and the incoming commits say that the
head is abc123, but you say def456. merge conflict. Not a big deal,
since you have all the latest code, so you can safely point it at
def456. But it’s a bit of a hassle because you have to do that every
single time. I don’t actually know what all the potential problems
are, but beyond just the hassle, it seems very easy for someone to
make a mistake, causing a lot of headaches.

We still have stuff split up, but we realized there’s no reason for
the rspec-dev repo to track the others as submodules. We wrote a rake
task to check out all the other repos beneath the rspec-dev dir. It’s
basically the exact same setup, but without the submodule tracking.
And it avoids any problems with submodules, because it’s all just
standard git push/pull/merge stuff.

Pat

crucoba · April 18, 2008, 9:45am

With mercurial I nearly did a similar thing, working on my own but
committing from two different machines. Luckily mercurial gave me a
warning
that allowed me to make sense of what I was doing. Not sure how this
works
with git but here goes.

I push from laptop1 to my central server. All is good.
I push from laptop2 to my central server. Mercurial doesn’t allow
this
and warns me that the remote repo will have two heads (which is allowed
but
probably not what I want). I can override this with --force.
Oh silly sod - of course I committed from the other machine.
I pull from the central server, merge locally and commit, creating a
new
single head representing the merge
I then push the result, meaning there is only ever a single
head/tip/edge/whatever in the repository.
I realise that this is what I always do with subversion anyway -
update,
merge, [run tests], commit.

It seems git doesn’t protect you from yourself like hg does - which is
understandable, it’s designed for and used by scarier people!

Could a pull-merge-commit before pushing have avoided this, and should
we
make that our endorsed way of working? Or am I missing something else
about
how dscm works?

Cheers,
Dan

crucoba · April 18, 2008, 11:28am

On Fri, Apr 18, 2008 at 12:43 AM, Dan N. [email protected] wrote:

Could a pull-merge-commit before pushing have avoided this, and should we
make that our endorsed way of working? Or am I missing something else about
how dscm works?

I’m still fuzzy on the details of exactly what happened. I believe it
was the result of a “commit -f” which forced the remote repository to
rewrite the history when there was branched histories that needed
resolving.

I believe that pull-merge-commit would work fine, I experimented
locally to understand the effects of handling submodule reference
merge conflicts. As I mentioned before, it is just a bit of a hassle
to have to do. David also pointed out that even without the
conflicts, you still have to commit the reference, leading to lots of
“updated rspec-rails” type commits in rspec-dev.

pull-merge-commit is probably a good workflow (and indeed the only
one, because otherwise it’s push-REJECTED-pull-merge-commit). The
main advantage to not using submodules is that you’ll only have to
merge is when git can’t intelligently merge the repos, rather than
every time two repositories have different HEADs.

Pat

crucoba · April 17, 2008, 6:06pm

On Thu, 2008-04-17 at 08:49 -0400, David C. wrote:

This all make sense?

Ok, I have to confess I haven’t been paying that much attention to the
way things are or were set out on github, so let me see if I’m fully
understanding what you’re saying…

Was “rspec” previously split up into several repositories, with a
“parent” repository which contained the other repositories as
submodules? So you are essentially saying that it is a bad idea to split
one single project into a number of pieces and manage that project
through submodules? However, you do consider submodules to be a good
idea if you are using and wish to track third-party upstream code, for
example plugins in a Rails project?

Cheers

–
Jonathan L.
http://jonathanleighton.com/

crucoba · April 18, 2008, 2:22pm

On Apr 18, 2008, at 3:43 AM, Dan N. wrote:

Oh silly sod - of course I committed from the other machine.
This is actually what happened. Two people were doing work at the same
time and one got the warning from the repo and did a “push --force.”
We’ve all learned a lesson from this and it won’t happen again.

In my opinion, even if you are allowed to force a push, the repo
should maintain some reachable history somewhere of the commits that
you are “hiding.” So the public “view” removes those commits but they
can be retrieved.

I pull from the central server, merge locally and commit,
creating a new single head representing the merge

I then push the result, meaning there is only ever a single head/
tip/edge/whatever in the repository.

I realise that this is what I always do with subversion anyway -
update, merge, [run tests], commit.

It seems git doesn’t protect you from yourself like hg does - which
is understandable, it’s designed for and used by scarier people!

It actually does in much the same way. You get a warning, but you can
still force the push.

Could a pull-merge-commit before pushing have avoided this, and
should we make that our endorsed way of working? Or am I missing
something else about how dscm works?

I do think this should be the way we do things. We have some rake
tasks that manage these bits one step at a time. I’ll add one that
combines them. You’ll still be able to do them one at a time, and
you’ll still need to pull/merge again if central repo warns you on
commit.

Cheers,
David

crucoba · April 19, 2008, 8:41am

El 18/4/2008, a las 14:16, David C. [email protected]
escribió:> On Apr 18, 2008, at 3:43 AM, Dan N. wrote:

force.
3. Oh silly sod - of course I committed from the other machine.

This is actually what happened. Two people were doing work at the same
time and one got the warning from the repo and did a “push --force.”
We’ve all learned a lesson from this and it won’t happen again.

In my opinion, even if you are allowed to force a push, the repo
should maintain some reachable history somewhere of the commits that
you are “hiding.” So the public “view” removes those commits but they
can be retrieved.

This is a potentially painful lesson that I think we all learn once we
start using Git. If you’re lucky, you learn it on an unimportant repo
or one which you fully control and can dig around in.

By way of counter-opinion, the ability to force a push like that may
be considered a feature. Example case: you accidentally include
confidential company files when you push to a remote repo; if you fix
the mistake by altering the history in your local repo and then trying
pushing again Git will correctly warn you that this won’t be a fast-
forward merge, but the ability to force the push anyway allows you to
“remove” the unwanted history from the remote repo.

Note that the history should still be there, it’s just that it won’t
be “reachable” when someone clones the public repo. Does GitHub
provide shell access so that you can get into the remote repo and run
stuff like “git fsck” on it? The missing commits should still be in
the object database, unless the GitHub crew are doing overly
aggressive automatic repacking and pruning on the repos (or unless
there is something about the way bare repos work which I don’t know;
quite possible!). In the case that GitHub doesn’t provide shell access
then it’s really like pushing into a blackhole drop box, with no way
to do any “archaeology” on the object database.

Could a pull-merge-commit before pushing have avoided this, and
should we make that our endorsed way of working? Or am I missing
something else about how dscm works?

I do think this should be the way we do things. We have some rake
tasks that manage these bits one step at a time. I’ll add one that
combines them. You’ll still be able to do them one at a time, and
you’ll still need to pull/merge again if central repo warns you on
commit.

If you get into the habit of rebasing before committing rather than
merging you’ll get a much nicer history. It will basically look linear:

A–B–C–D–E–F–G–etc

Rather than full of crisscrosses from multiple small merges:

        E'--F'
       /      \

A–B–C–D–E–F–G–H–I–J
\ / \ \ /
B’–C’ E’‘–F’‘–G’–H’

Sure, Git can easily do the merges but the resulting history is harder
to analyse and less “bisectable” (with “git bisect”).

Cheers,
Wincent