Re: [darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

AntC Thu, 20 Sep 2012 23:01:53 -0700

Stephen J. Turnbull <stephen <at> xemacs.org> writes:

> 
> AntC writes:
> 
>  > THis is exactly the sort of example I'm trying to work through. So
>  > my approach is (trying to) separate out what applies for the
>  > container (file) vs the contents (lines).
> 
> Well, I think you should try to sort this in conceptual terms first,


Thanks Stepehen. You're possibly not reading this in context of my "very 
speculative approach"
 http://lists.osuosl.org/pipermail/darcs-users/2012-September/026698.html

Conceptually, this is trying to achieve the same thing as L/S/L's approach for 
a line-id, but piggy-backing on darcs' current approach for precisely 
observing hunk changes. (From what you're saying, it would be even better to 
piggy-back on git's ability to spot hunk moves. Is there some reason darcs 
doesn't/can't look for those?)


> ... most importantly, what are the use cases where the programmer might
> care about the difference between a file's container (which might have
> different names either sequentially (renames) or concurrently (links)),
> and the file's contents?

Essentially I am agreeing with git that tracking the contents is far more 
important than tracking the container. But in complex code bases (with 
programs and scripting and install routines, etc in a variety of languages) 
there are semantic connections between content in different files and 
connections for file dirs/names to/from content.

> 
> ...  git's find-copy/move-harder features
> defines content movement by equality of lines as strings; it is 100%
> reliable at finding those moves.

Given that there are typically many lines in a repo with exactly the same 
content (blanks, a single opening brace or single closing brace, single open 
comment or single closing comment, horizontal line separators between 
sections, standard program initiation sequences/shutdown sequences/error 
handlers/template calls), how can it be so sure? Is git looking only at the 
content the programmer can see, or does it look 'under the covers' at disk 
address, etc?

Suppose the programmer moves some text, then edits it before recording? (Yes, 
bad practice I know -- record early! record often!)


> 
>  > I'm envisaging a move-file command (as per darcs), and a move-lines
>  > command, so that the programmer can be explicit about their intent:
>  > - are these two completely new files?
>  > - or one with continuing identity, one new?
>  > - (whether or not one of the files has the same name as before
>  >    is an orthogonal issue)
>  > - for each file, is this completely new content?
>  > - or continuing content (from where)?
>  > - or (more likely) a mix of new and continuing?
> 
> This is a rather large burden to place on the programmer.  Will they
> really bother to learn to do this correctly?

With move-file we expect the programmer to instruct darcs, so that it can both 
record the patch and make the move. For move-lines, I agree this is less 
convenient. Perhaps the best of both worlds is:
- at record points use git-like methods to detect moved lines
- confirm with the programmer that this is a move (rather than new text)
- make sure it's capturing all and only the moved lines


> 
>  > The critical issue is determining how to apply patches pulled from
>  > other repos where the file splitting hasn't occured (perhaps a
>  > bugfix on the pre-refactored code).
> 
> Simple.  You apply it to the same lines. ...

Exactly what I'm aiming for. Where "same" means same line-id, as tracked 
through move-lines, to wherever the lines are now. (The target lines might be 
in a different file in this repo compared to where we've pulled the patch 
from.)

> ... In git, if you've changed
> the content of the lines (eg, variable rename) you won't be able to
> find them, but you won't be able to apply the patch anyway because git
> doesn't know how to commute patches.

So I'm aiming to cope with variable renames. I'm representing patches in a 
context-independent way, so that you can apply patches in a different sequence 
(or omit some patches), but I'm not using a commute-like mechanism.

> 
>  > 
>  > I'd prefer to handle that as a move-lines for one (or both) of the
>  > ranges.
> 
> Theoretically, yes.  But will users properly discriminate between
> those commands?
> 

Don't worry, the VCS is going to validate that the move-lines does exactly 
capture the change in content. The risk is that the programmer will move lines 
(through edit/copy/paste) and then 'shuffle' the sequence and then change some 
stuff before they remember to record. So now it's too difficult to trace the 
movements by algorithm, and the programmer's forgotten exactly what they did.

So at worst that ends up being unconnected hunk deletes and hunk inserts, and 
the VCS has lost track of the line identities. But is that any worse than 
darcs or git?


AntC


_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

Reply via email to