viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771963385
>Please stop arguing here about problems that don't exist. Issue https://github.com/apache/lucene/issues/10906 has nothing to do with temporary files. This issue is not only about temporary files, but about the way Lucene doing or not doing fsync. The issue #10906, I think, was about a lost IOException due to closing without fsyncing. I used it to illustrate the kind of errors we get from corrupted files, if we don't check the IO errors correctly. >This is not possible because Lucene solely uses MMap for reading files, and for writing it uses OutputStreams, which can't be reused to read again. It's possible to to continue using `OutputStream` API for writing and mmap API for reading, with a single underlying read-write file descriptor: ```java public static void main(String[] args) throws IOException { Path tempFolder = Files.createTempDirectory("foo"); Path filePath = tempFolder.resolve("test_file"); try (FileChannel ch = FileChannel.open(filePath, READ, WRITE, CREATE_NEW)) { // writing OutputStream os = Channels.newOutputStream(ch); os.write("hello world".getBytes()); // When done writing, we must not close the output stream, because it will close the channel. But // we must flush it so that all in-flight changes are sent to the underlying channel (e.g. if the output // stream was wrapped in some buffering output stream). os.flush(); // reading try (Arena arena = Arena.ofConfined()) { MemorySegment segment = ch.map(READ_ONLY, 0, ch.size(), arena); byte[] buf = new byte[toIntExact(segment.byteSize())]; segment.asByteBuffer().get(buf); System.out.println("Data read: " + new String(buf)); } } // channel closed now // we can delete the file now Files.delete(filePath); Files.delete(tempFolder); } ``` For example, `Directory.createTempOutput()` can open such read-write channel, create `OutputStream` from it and use that to create an `IndexOutput` so that existing code doesn't have to be modified. Then the `IndexOutput` should not be closed, but it can be converted to `IndexInput`, e.g. with a new method `IndexOutput.convertToInput()`, reusing the file channel. We need to figure out how to handle the `IOContext` with this. When the `IndexInput` is closed, the channel is closed. There are probably many ways to implement this, the above was just an example. I have checked a few of the `createTempOutput` usages, they might be pretty straightforward to update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org