On Thu, Dec 2, 2010 at 4:10 AM, Peter Sturge <peter.stu...@gmail.com> wrote: > The Win7 crashes aren't from disk drivers - they come from, in this > case, a Broadcom wireless adapter driver. > The corruption comes as a result of the 'hard stop' of Windows. > > I would imagine this same problem could/would occur on any OS if the > plug was pulled from the machine.
Actually, Lucene should be robust to this -- losing power, OS crash, hardware failure (as long as the failure doesn't flip bits), etc. This is because we do not delete files associated with an old commit point until all files referenced by the new commit point are successfully fsync'd. However it sounds like something is wrong, at least on Windows 7. I suspect it may be how we do the fsync -- if you look in FSDirectory.fsync, you'll see that we take a String fileName in. We then open a new read/write RandomAccessFile, and call its .getFD().sync(). I think this is potentially risky, ie, it would be better if we called .sync() on the original file we had opened for writing and written lots of data to, before closing it, instead of closing it, opening a new FileDescriptor, and calling sync on it. We could conceivably take this approach, entirely in the Directory impl, by keeping the pool of file handles for write open even after .close() was called. When a file is deleted we'd remove it from that pool, and when it's finally sync'd we'd then sync it and remove it from the pool. Could it be that on Windows 7 the way we fsync (opening a new FileDescriptor long after the first one was closed) doesn't in fact work? Mike