On 3/20/10 2:03 PM, Seth Falcon wrote:
On 3/20/10 1:36 PM, Alistair Gee wrote:
I fixed my build problems. I also noticed that my patch wasn't
correct, so I have attached a new version.
This fix still grows the vector by doubling it until it is big enough,
but the length is reset to the correct size at the end once it is
known.
This fix differs from the existing fix in subversion in the following scenario:
1.Create file Z in directory with 1 other file named Y
2. Call dir() to retrieve list of files.
3. dir() counts 2 files.
4. While dir() is executing, some other process creates file X in the directory.
5. dir() retrieves the list of files, stopping after 2 files. But by
chance, it retrieves files X and Y (but not Z).
6. dir() returns files X and Y, which could be misinterpreted to mean
that file Z does not exist.
In contrast, with the attached fix, dir() would return all 3 files.
I think the scenario you describe could happen with either version.
Once you've read the files in a directory, all bets are off. Anything
could happen between the time you readdir() and return results back to
the user. I agree, though, that avoiding two calls to readdir narrows
the window.
Also, the existing fix in subversion doesn't seem to handle the case
where readdir() returns fewer files than was originally counted as it
doesn't decrease the length of the vector.
Yes, that's a limitation of the current fix.
Have you run 'make check-devel' with your patch applied? Have you run
any simple tests for using dir() or list.files() with recursive=TRUE on
a reasonably large directory and compared times and memory use reported
by gc()?
It is often the case that writing the patch is the easy/quick part and
making sure that one hasn't introduced new infelicities or unintended
behavior is the hard part.
I will try to take another look at your latest patch.
I've applied a modified version of your patch. In the testing that I
did, avoiding the counting step resulted in almost 2x faster times for
large directory listings with recursive=TRUE at the cost of a bit more
memory.
The code also now includes a check for user interrupt, so that you can
C-c out of dir/list.files call more quickly.
Thanks for putting together the patch.
+ seth
--
Seth Falcon | @sfalcon | http://userprimary.net/
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel