gbbafna commented on issue #15054: URL: https://github.com/apache/lucene/issues/15054#issuecomment-3194931981
Thanks @uschindler , @mikemccand . Your inputs helped us narrow down the root cause mentioned below. >It is wrong if you do this on the same IndexInput without really sequential access. If you want to slice your IndexInput and use multiple threads then it is no longer (strictly speaking) "read once". Because the kernel can't do read-ahead correctly. IndexInput is also not the correct classes to do this. If you want to do that really effective use asynchronous file channels and so on. This is something that Lucene's Directory abstractions are not made for. Noted . I have created a backlog item to rearchitect our upload flow using the right abstractions . We were not slicing the IndexInput for multiple threads, but creating it naively for all the part uploads. This was causing the number of maps to just explode. We are changing that to use clones instead of creating it for each part. PR : https://github.com/opensearch-project/OpenSearch/pull/19072 >For copying files over the network, MMapDirectory is a bad choice, sorry. And read-once only works correct if you have a single stream and copy it. Ack. >There is a lot of information missing to us. We can't figure out what you are doing without an reproducer. Do you have index files open on the same directory at the same time? > If you have heap dumps: Are the shared arenas still alive, is the refcounter of the grouping arena 0 or not 0? We need more input. The current code works well for Lucene and passs all tests, so theres no refcounting bug on how the code is used in Lucene itsself. We noticed that for same file very high number of maps were open , due to the fact that we are creating index input repeatedly for multipart upload >If you have a dump of all mapped files /proc/pid/maps can you check if you see many *.si files? If this is the case the problem you are seing here is just a side effect because you consume already many mappings during normal indexing/searching and the remote directory just consumes the rest and then fails. It is still strange why the maxPermits sysprop does not help! We are seeing files of all type . not specifically `*.si` . That also explains why maxPermits sysprop is not helping our case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org