viliam-durina commented on issue #14334:
URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771963385

   >Please stop arguing here about problems that don't exist. Issue 
https://github.com/apache/lucene/issues/10906 has nothing to do with temporary 
files.
   
   This issue is not only about temporary files, but about the way Lucene doing 
or not doing fsync. The issue #10906, I think, was about a lost IOException due 
to closing without fsyncing. I used it to illustrate the kind of errors we get 
from corrupted files, if we don't check the IO errors correctly.
   
   >This is not possible because Lucene solely uses MMap for reading files, and 
for writing it uses OutputStreams, which can't be reused to read again.
   
   It's possible to to continue using `OutputStream` API for writing and mmap 
API for reading, with a single underlying read-write file descriptor:
   
   ```java
   public static void main(String[] args) throws IOException {
     Path tempFolder = Files.createTempDirectory("foo");
     Path filePath = tempFolder.resolve("test_file");
     try (FileChannel ch = FileChannel.open(filePath, READ, WRITE, CREATE_NEW)) 
{
       // writing
       OutputStream os = Channels.newOutputStream(ch);
       os.write("hello world".getBytes());
       // When done writing, we must not close the output stream, because it 
will close the channel. But
       // we must flush it so that all in-flight changes are sent to the 
underlying channel (e.g. if the output
       // stream was wrapped in some buffering output stream).
       os.flush();
   
       // reading
       try (Arena arena = Arena.ofConfined()) {
         MemorySegment segment = ch.map(READ_ONLY, 0, ch.size(), arena);
         byte[] buf = new byte[toIntExact(segment.byteSize())];
         segment.asByteBuffer().get(buf);
         System.out.println("Data read: " + new String(buf));
       }
     } // channel closed now
   
     // we can delete the file now
     Files.delete(filePath);
     Files.delete(tempFolder);
   }
   ```
   
   For example, `Directory.createTempOutput()`  can open such read-write 
channel, create `OutputStream` from it and use that to create an `IndexOutput` 
so that existing code doesn't have to be modified. Then the `IndexOutput` 
should not be closed, but it can be converted to `IndexInput`, e.g. with a new 
method `IndexOutput.convertToInput()`, reusing the file channel. We need to 
figure out how to handle the `IOContext` with this. When the `IndexInput` is 
closed, the channel is closed. 
   
   There are probably many ways to implement this, the above was just an 
example. I have checked a few of the `createTempOutput` usages, they might be 
pretty straightforward to update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to