On Sun, Sep 8, 2013 at 10:55 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 13-09-06 11:07 PM, Karl Millar wrote: >> >> On Fri, Sep 6, 2013 at 7:03 PM, Duncan Murdoch <murdoch.dun...@gmail.com> >> wrote: >>> >>> On 13-09-06 9:21 PM, Karl Millar wrote: >>>> >>>> >>>> Hi Duncan, >>>> >>>> I like the interface of this version a lot better, but there's still a >>>> bunch of implementation details that need fixing: >>>> >>>> * As previously mentioned, there are important cases where the mtime >>>> values change in ways that this code doesn't detect. >>>> * If the timestamp file (which is usually in the temp directory) gets >>>> deleted (which can happen after a moderate amount of time of >>>> inactivity on some systems), then the file_test('-nt', ...) will >>>> always return false, even if the file has changed. >>> >>> >>> >>> If that happened without user intervention, I think it would break other >>> things in R -- the temp directory is supposed to last for the whole >>> session. >>> But I should be checking anyway. >> >> >> Yes, it does break other things in R -- my experience has been that >> the help system seems to be the one that is impacted the most by this. >> FWIW, I've never seen the entire R temp directory deleted, just >> individual files and subdirectories in it, but even that probably >> depends on how the machine is configured. I suspect only a few users >> ever notice this, but my R use is probably somewhat anomalous and I >> think it only happens to R sessions that I haven't used for a few >> days. > > > I use Windows and never see this; deleting temp files is up to me, not to > the system. But my understanding was the *nix systems should only clean up > /tmp on restart, and I don't think an R session will survive a restart. > > However, you have convinced me that the use of the timestamp file is not > beneficial enough to be the default. I'll leave it as an option, but add > warnings that it might be unreliable. > > >> >>>> * If files get added or deleted between the two calls to list.files in >>>> fileSnapshot, it will fail with an error. >>> >>> >>> >>> Yours won't work if path contains more than one directory. This is >>> probably >>> a reasonable restriction, but it's inconsistent with list.files, so I'd >>> like >>> to avoid it if I can find a way. >> >> >> I'm currently unsure what the behaviour when comparing snapshots with >> multiple directories should be. >> >> Presumably we should have the property that (horribly abusing notation >> for succinctness): >> compareSnapshots(c(a1, a2), c(a1, a2)) >> is the same as concatenating (in some form) >> compareSnapshots(a1, a1) and compareSnapshots(a2, a2) >> and there's a bunch of ways we could concatenate -- we could return a >> list of results, or a single result where each of the 'added, deleted, >> modified' fields are a list, or where we concatenate the 'added, >> deleted, modified' fields together into three simple vectors. >> Concatenating the vectors together like this is appealing, but unless >> you're using the full names, it doesn't include the information of >> which directory the changes are in, and using the full names doesn't >> work in the case where you're comparing different sets of directories, >> e.g. compareSnapshots(c(a1, a2), c(b1, b2)), where there is no >> sensible choice for a full name. The list options don't have this >> problem, but are harder to work with, particularly for the common case >> where there's only a single directory. You'd also have to be somewhat >> careful with filenames that occur in both directories. >> >> Maybe I'm just being dense, but I don't see a way to do this thats >> clear, easy to use and wouldn't confuse users at the moment. > > > The way I've done this is to require full.names when multiple dirs are on > the path. I've reduced it to one list.files() call per dir, by iterating > over the path variable and using your approach of calling it with full.names > = FALSE, then adding the dir if necessary. > > I haven't adopted your change that forces comparison of only size and mtime > from file.info. I don't see a big cost in storing whatever file.info > returns (which is system dependent; on Windows I don't see the user and > group related columns; on Unix I don't see the exe column). > Users might want to detect changes to anything there, and I shouldn't make > it harder for them. > > I've also kept the special-casing of md5sum; it really needs to be wrapped > in suppressWarnings() (on Unix only). And I've kept the options to specify > what changedFiles checks among the file.info columns; I can see that you > might want a snapshot with everything, but sometimes only want to be told > about changes in a subset of the attributes. > > I've uploaded > <http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.1.tar.gz> if anyone > is interested.
Works well. Scott -- Scott Kostyshak Economics PhD Candidate Princeton University ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel