On Thu, Dec 2, 2010 at 4:53 AM, Peter Sturge <peter.stu...@gmail.com> wrote:
> As I'm not familiar with the syncing in Lucene, I couldn't say whether
> there's a specific problem with regards Win7/2008 server etc.
>
> Windows has long had the somewhat odd behaviour of deliberately
> caching file handles after an explicit close(). This has been part of
> NTFS since NT 4 days, but there may be some new behaviour introduced
> in Windows 6.x (and there is a lot of new behaviour) that causes an
> issue. I have also seen this problem in Windows Server 2008 (server
> version of Win7 - same file system).
>
> I'll try some further testing on previous Windows versions, but I've
> not previously come across a single segment corruption on Win 2k3/XP
> after hard failures. In fact, it was when I first encountered this
> problem on Server 2008 that I even discovered CheckIndex existed!
>
> I guess a good question for the community is: Has anyone else
> seen/reproduced this problem on Windows 6.x (i.e. Server 2008 or
> Win7)?
>
> Mike, are there any diagnostics/config etc. that I could try to help
> isolate the problem?

Actually it might be easiest to make a standalone Java test, maybe
using Lucene's FSDir, that opens files in sequence (0.bin, 1.bin,
2.bin...), writes verifiable them (eg random bytes from a fixed seed)
and then closes & syncs each one.  Then, crash the box while this is
running.  Finally, run a verify step that checks that the data is
"correct"?  Ie that our attempt to fsync "worked"?

It could very well be that windows 6.x is now "smarter" about fsync in
that it only syncs bytes actually written with the currently open file
descriptor, and not bytes written agains the same file by past file
descriptors (ie via a global buffer cache, like Linux).

Mike

Reply via email to