Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869385378 I found another quite tricky issue: If we write the FST directly to the IndexOutput, there might be a chance that there's no term accepted by the FST, in that case we still write the padding 0 byte. This padding byte is to ensure no node having the 0 address: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java#L172-L174 However, since we are writing the FST consecutively for each field, appending to the same file, that means there could be a case we still write that additional padding byte, which is mapped to no field: [ FST_Field_1 ] [ 0 ] [ FST_Field_2 ] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] lazily write the FST padding byte [lucene]
dungba88 opened a new pull request, #12981: URL: https://github.com/apache/lucene/pull/12981 ### Description Lazily write the FST padding byte, so that in case the FST is empty (no accepted nodes) nothing will be written. This is important for off-heap writing, as we don't want to add that extra byte when the FST would be thrown away. Found while working on https://github.com/apache/lucene/pull/12980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
uschindler closed issue #12968: [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file URL: https://github.com/apache/lucene/issues/12968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
uschindler commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869432712 The problem is that your packaging as JAR does not preserve all required files. See here for instructions: #12307 It looks like you suppress some exceptions. The problem is that it does not find all classes to support java 21 and/or the index codecs are not found. Make sure that all META-INF files and the flag "Multi-Release: true" is part of you manifest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
uschindler commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869469388 Most likely you see this issue due to broken Maven tooling: https://issues.apache.org/jira/browse/MSHADE-385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
setokk commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869471695 Thanks! Adding the line "Multi-Release: true" to the MANIFEST.MF file and creating a "org.apache.lucene.codecs.PostingsFormat" file into the "META-INF/services" directory solved the issue. Happy Holidays! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
uschindler commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869530344 > creating a "org.apache.lucene.codecs.PostingsFormat" file into the "META-INF/services" directory solved the issue. You shouldn't manually create those files. Use https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer for that. Happy holidays. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1869575788 There seems to be a speedup on [prefix queries](http://people.apache.org/~mikemccand/lucenebench/Prefix3.html) in nightly benchmarks. For reference, here is the benchmark in branch_9x with this PR, the performance of `ByteBuffersIndexInput` looks better than java17, `NIOFSDirectoryInputs` is similar than java17, but `MMapDirectoryInputs` has performance regression( we don't change the MMapDirectory impl) java11: ``` Benchmark(size) Mode Cnt Score Error Units GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVInt 64 thrpt5 5.064 ± 0.069 ops/us GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVIntBaseline 64 thrpt5 1.612 ± 0.078 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVInt64 thrpt5 6.334 ± 0.914 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVIntBaseline64 thrpt5 7.362 ± 0.505 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readVInt 64 thrpt5 4.885 ± 0.545 ops/us GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVInt 64 thrpt5 2.977 ± 0.346 ops/us GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVIntBaseline 64 thrpt5 2.660 ± 1.700 ops/us ``` I ran with `-XX:+PrintInlining` parameter in `benchMMapDirectoryInputs_readGroupVInt` task, the output is similar to java17 ``` 702 799 3 org.apache.lucene.store.DataInput::readGroupVInts (41 bytes) made not entrant @ 12 org.apache.lucene.store.DataInput::readGroupVInt (7 bytes) inline (hot) \-> TypeProfile (26795/26795 counts) = org/apache/lucene/store/ByteBufferIndexInput$SingleBufferImpl @ 3 org.apache.lucene.util.GroupVIntUtil::readGroupVInt (77 bytes) inline (hot) ! @ 1 org.apache.lucene.store.ByteBufferIndexInput::readByte (100 bytes) inline (hot) @ 8 org.apache.lucene.store.ByteBufferGuard::getByte (9 bytes) inline (hot) @ 1 org.apache.lucene.store.ByteBufferGuard::ensureValid (16 bytes) inline (hot) ! @ 5 java.nio.DirectByteBuffer::get (28 bytes) inline (hot) \-> TypeProfile (13760/13760 counts) = java/nio/DirectByteBufferR @ 5 java.nio.Buffer::nextGetIndex (31 bytes) inline (hot) @ 8 java.nio.DirectByteBuffer::ix (10 bytes) inline (hot) @ 11 jdk.internal.misc.Unsafe::getByte (7 bytes) force inline by annotation @ 3 jdk.internal.misc.Unsafe::getByte (0 bytes) (intrinsic) @ 16 java.lang.ref.Reference::reachabilityFence (1 bytes) force inline by annotation @ 39 org.apache.lucene.util.GroupVIntUtil::readLongInGroup (81 bytes) inline (hot) !@ 49 org.apache.lucene.store.ByteBufferIndexInput::readShort (25 bytes) inline (hot) @ 8 org.apache.lucene.store.ByteBufferGuard::getShort (9 bytes) inline (hot) @ 1 org.apache.lucene.store.ByteBufferGuard::ensureValid (16 bytes) inline (hot) !@ 5 java.nio.DirectByteBuffer::getShort (27 bytes) inline (hot) \-> TypeProfile (59062/59062 counts) = java/nio/DirectByteBufferR @ 4 java.nio.Buffer::nextGetIndex (38 bytes) inline (hot) @ 7 java.nio.DirectByteBuffer::ix (10 bytes) inline (hot) ! @ 10 java.nio.DirectByteBuffer::getShort (32 bytes) inline (hot) @ 9 jdk.internal.misc.Unsafe::getShortUnaligned (12 bytes) inline (hot) @ 5 jdk.internal.misc.Unsafe::getShortUnaligned (33 bytes) (intrinsic) @ 8 jdk.internal.misc.Unsafe::convEndian (16 bytes) inline (hot) @ 17 java.lang.ref.Reference::reachabilityFence (1 bytes) force inline by annotation @ 15 java.lang.ref.Reference::reachabilityFence (1 bytes) force inline by annotation ``` -- This is
Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1869642731 Hi, Thanks for the measurements, @easyice. So basically, we can backport the commit without any modifications. I can do this and move change entries afterwards. No PR is needed for that. It's easy. Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869867635 There is only 1 failed test left: TestFSTPostingFormat.testRandomException ``` > Caused by: > java.lang.RuntimeException: unclosed IndexOutput: _s_FST50_0.tfp.meta > at org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783) ``` Seems like some file might not be closed correctly when there are exception -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869917203 Fixed the above unclosed issue by moving `openInput` and `createOutput` to try-catch block -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org