Stephen J. Turnbull <stephen <at> xemacs.org> writes: > > AntC writes: > > > THis is exactly the sort of example I'm trying to work through. So > > my approach is (trying to) separate out what applies for the > > container (file) vs the contents (lines). > > Well, I think you should try to sort this in conceptual terms first,
Thanks Stepehen. You're possibly not reading this in context of my "very speculative approach" http://lists.osuosl.org/pipermail/darcs-users/2012-September/026698.html Conceptually, this is trying to achieve the same thing as L/S/L's approach for a line-id, but piggy-backing on darcs' current approach for precisely observing hunk changes. (From what you're saying, it would be even better to piggy-back on git's ability to spot hunk moves. Is there some reason darcs doesn't/can't look for those?) > ... most importantly, what are the use cases where the programmer might > care about the difference between a file's container (which might have > different names either sequentially (renames) or concurrently (links)), > and the file's contents? Essentially I am agreeing with git that tracking the contents is far more important than tracking the container. But in complex code bases (with programs and scripting and install routines, etc in a variety of languages) there are semantic connections between content in different files and connections for file dirs/names to/from content. > > ... git's find-copy/move-harder features > defines content movement by equality of lines as strings; it is 100% > reliable at finding those moves. Given that there are typically many lines in a repo with exactly the same content (blanks, a single opening brace or single closing brace, single open comment or single closing comment, horizontal line separators between sections, standard program initiation sequences/shutdown sequences/error handlers/template calls), how can it be so sure? Is git looking only at the content the programmer can see, or does it look 'under the covers' at disk address, etc? Suppose the programmer moves some text, then edits it before recording? (Yes, bad practice I know -- record early! record often!) > > > I'm envisaging a move-file command (as per darcs), and a move-lines > > command, so that the programmer can be explicit about their intent: > > - are these two completely new files? > > - or one with continuing identity, one new? > > - (whether or not one of the files has the same name as before > > is an orthogonal issue) > > - for each file, is this completely new content? > > - or continuing content (from where)? > > - or (more likely) a mix of new and continuing? > > This is a rather large burden to place on the programmer. Will they > really bother to learn to do this correctly? With move-file we expect the programmer to instruct darcs, so that it can both record the patch and make the move. For move-lines, I agree this is less convenient. Perhaps the best of both worlds is: - at record points use git-like methods to detect moved lines - confirm with the programmer that this is a move (rather than new text) - make sure it's capturing all and only the moved lines > > > The critical issue is determining how to apply patches pulled from > > other repos where the file splitting hasn't occured (perhaps a > > bugfix on the pre-refactored code). > > Simple. You apply it to the same lines. ... Exactly what I'm aiming for. Where "same" means same line-id, as tracked through move-lines, to wherever the lines are now. (The target lines might be in a different file in this repo compared to where we've pulled the patch from.) > ... In git, if you've changed > the content of the lines (eg, variable rename) you won't be able to > find them, but you won't be able to apply the patch anyway because git > doesn't know how to commute patches. So I'm aiming to cope with variable renames. I'm representing patches in a context-independent way, so that you can apply patches in a different sequence (or omit some patches), but I'm not using a commute-like mechanism. > > > > > I'd prefer to handle that as a move-lines for one (or both) of the > > ranges. > > Theoretically, yes. But will users properly discriminate between > those commands? > Don't worry, the VCS is going to validate that the move-lines does exactly capture the change in content. The risk is that the programmer will move lines (through edit/copy/paste) and then 'shuffle' the sequence and then change some stuff before they remember to record. So now it's too difficult to trace the movements by algorithm, and the programmer's forgotten exactly what they did. So at worst that ends up being unconnected hunk deletes and hunk inserts, and the VCS has lost track of the line identities. But is that any worse than darcs or git? AntC _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
