Proving ownership of code with mercurial?

Matt Mackall mpm at selenic.com
Wed Jan 2 10:57:12 CST 2008


On Wed, 2008-01-02 at 10:44 +0100, Francesc Esplugas wrote:
> I've been working on a project during three months and as I work  
> freelance they say I only worked the first 6 weeks. Everything is  
> under Mercurial with daily changesets (and these changesets contain  
> data). The repo distributed in a few machines, but I keep the master.
> 
> I know all the work that I've made, and I've printed the list of  
> changesets, so I can explain my client all the work, but they might  
> say I've changed the repos to reflect this "log history".
> 
> The question should be ... should I trust the repos? In a Git  
> conference Linus Torvals talked about the security of Git, about  
> trust. He can know if someone has hacked the repository ... bla bla  
> bla ... so the idea would be to show to my client, that the repos  
> hasn't changed and that he can trust it ...
> 
> Francesc
> 
> On Jan 2, 2008, at 10:32 AM, Peter Arrenbrecht wrote:
> 
> > On Jan 2, 2008 9:34 AM, Dustin Sallings <dustin at spy.net> wrote:
> >>
> >> On Jan 1, 2008, at 23:59, Francesc Esplugas wrote:
> >>
> >>> I want to prove all the work I've done. I understand some of my code
> >>> can be a copy and paste from other projects or other developers, but
> >>> the idea is to prove I've been working on the project.
> >>>
> >>> Proving the integrity (hg verify) of the repository, could prove
> >>> nobody hacked it?
> >>
> >>
> >>        If you can update to a version whose ancestry includes your  
> >> changes,
> >> then those changes are there.  Anyone who's cloned the tree would
> >> notice if you tried to rewrite history somewhere and pass it off as
> >> their upstream.
> >>
> >>        Sounds like you might be fighting an uphill battle, though.   
> >> Good luck.
> >
> > Key point: If no one ever cloned your repo, then the repo's integrity
> > won't help you. It is easy to create a brand new repo with backdated
> > history. And if the repo contains only your changes (not interleaved
> > with changes by others), you would need the other party to have
> > demonstrably old clones which show your changes really are as old as
> > you claim.
> >
> > But, as someone noted before, this is about who has to prove what. So:
> > what exactly do you want to prove? Time spent? Goals achieved? Lines
> > of code contributed? What?
> >
> > And why do you feel you need to prove this? If they claim you
> > reattributed changes done by others to yourself, where are those
> > others? What grounds have they for making the claim? If they claim the
> > result is not worth 3 months' time, then the repo should show the
> > evolution of the result, which might show otherwise. But this does not
> > hinge on the repo's integrity.

Here's what we can know about a repository:

If you give me a changeset ID, and I compare it to a changeset ID in my
repository, I can know that the following things have not changed if the
IDs match and the repo passes verify:

 - commit description (date/author/branch/etc.)
 - list of files in the changeset, including their permissions
 - contents of each of those files[1]
 - the parent commit IDs (and therefore, their contents and history!)

Any of these things may have tampered with, plagiarized, etc. before you
gave me the commit ID, but I know they haven't changed in between.

It's also possible to GPG-sign commits. This demonstrates that the
person who signed a particular commit had access to the private key
corresponding with signature.

The question then becomes what can we show about authorship and date of
creation from the above?

Logically, there is very little we can show about authorship. Anything
can be cut and pasted from other unknown sources. Legally, if no one
steps forward to claim authorship, then there's nothing to say you
weren't the author. But once you've published, someone can always claim
to have authored it and then it falls to what documentation each side
has about the creation process. Having documentation (ie your repo
history) and earlier publication greatly helps. One nice thing about
Mercurial here is you can publically post just your changeset IDs to
create a verifiable record of when your not-yet-published work was
actually created.

As for dates, it's not possible to prove that you didn't complete work
all on the first day and then check it in over the course of months. The
best you can do is document it with a version control system, and
correlate that with other activities (related email, phone calls, login
records, etc.). Also, there are a number of industry-standard techniques
for estimating developer hours from lines of code[2]. If you're close to
these, your employer really has no grounds for complaint.

[1] There are attacks based on SHA1 collisions that allow you to change
the contents of specially-designed files without detection, but these
won't work on typical source code. If you need stronger protection, you
can always publish stronger hashes of your source tree (and you can
check these in)

[2]http://en.wikipedia.org/wiki/Cocomo

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial mailing list