On Thu, Dec 2, 2010 at 4:10 AM, Peter Sturge <peter.stu...@gmail.com> wrote:
> The Win7 crashes aren't from disk drivers - they come from, in this
> case, a Broadcom wireless adapter driver.
> The corruption comes as a result of the 'hard stop' of Windows.
>
> I would imagine this same problem could/would occur on any OS if the
> plug was pulled from the machine.

Actually, Lucene should be robust to this -- losing power, OS crash,
hardware failure (as long as the failure doesn't flip bits), etc.
This is because we do not delete files associated with an old commit
point until all files referenced by the new commit point are
successfully fsync'd.

However it sounds like something is wrong, at least on Windows 7.

I suspect it may be how we do the fsync -- if you look in
FSDirectory.fsync, you'll see that we take a String fileName in.  We
then open a new read/write RandomAccessFile, and call its
.getFD().sync().

I think this is potentially risky, ie, it would be better if we called
.sync() on the original file we had opened for writing and written
lots of data to, before closing it, instead of closing it, opening a
new FileDescriptor, and calling sync on it.  We could conceivably take
this approach, entirely in the Directory impl, by keeping the pool of
file handles for write open even after .close() was called.  When a
file is deleted we'd remove it from that pool, and when it's finally
sync'd we'd then sync it and remove it from the pool.

Could it be that on Windows 7 the way we fsync (opening a new
FileDescriptor long after the first one was closed) doesn't in fact
work?

Mike

Reply via email to