Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-26 Thread via GitHub


dungba88 commented on PR #12980:
URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869385378

   I found another quite tricky issue:
   
   If we write the FST directly to the IndexOutput, there might be a chance 
that there's no term accepted by the FST, in that case we still write the 
padding 0 byte. This padding byte is to ensure no node having the 0 address: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java#L172-L174
   
   However, since we are writing the FST consecutively for each field, 
appending to the same file, that means there could be a case we still write 
that additional padding byte, which is mapped to no field:
   [ FST_Field_1 ] [ 0 ] [ FST_Field_2 ]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] lazily write the FST padding byte [lucene]

2023-12-26 Thread via GitHub


dungba88 opened a new pull request, #12981:
URL: https://github.com/apache/lucene/pull/12981

   ### Description
   
   Lazily write the FST padding byte, so that in case the FST is empty (no 
accepted nodes) nothing will be written. This is important for off-heap 
writing, as we don't want to add that extra byte when the FST would be thrown 
away. Found while working on https://github.com/apache/lucene/pull/12980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-26 Thread via GitHub


uschindler closed issue #12968: [BUG] FSDirectory stuck at open(Path path) 
method when ran from .jar file
URL: https://github.com/apache/lucene/issues/12968


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-26 Thread via GitHub


uschindler commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869432712

   The problem is that your packaging as JAR does not preserve all required 
files.
   
   See here for instructions: #12307
   
   It looks like you suppress some exceptions. The problem is that it does not 
find all classes to support java 21 and/or the index codecs are not found.
   
   Make sure that all META-INF files and the flag "Multi-Release: true" is part 
of you manifest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-26 Thread via GitHub


uschindler commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869469388

   Most likely you see this issue due to broken Maven tooling: 
https://issues.apache.org/jira/browse/MSHADE-385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-26 Thread via GitHub


setokk commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869471695

   Thanks! Adding the line "Multi-Release: true" to the MANIFEST.MF file and 
creating a "org.apache.lucene.codecs.PostingsFormat" file into the 
"META-INF/services" directory solved the issue. Happy Holidays!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-26 Thread via GitHub


uschindler commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1869530344

   > creating a "org.apache.lucene.codecs.PostingsFormat" file into the 
"META-INF/services" directory solved the issue.
   
   You shouldn't manually create those files. Use 
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer
 for that.
   
   Happy holidays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-26 Thread via GitHub


easyice commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1869575788

   There seems to be a speedup on [prefix 
queries](http://people.apache.org/~mikemccand/lucenebench/Prefix3.html) in 
nightly benchmarks.
   
   For reference, here is the benchmark in branch_9x with this PR, the 
performance of `ByteBuffersIndexInput` looks better than java17,   
`NIOFSDirectoryInputs` is similar than java17, but `MMapDirectoryInputs` has 
performance regression( we don't change the MMapDirectory impl)
   
   java11:
   
   ```
   Benchmark(size)  
 Mode  Cnt  Score   Error   Units
   GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVInt  64  
thrpt5  5.064 ± 0.069  ops/us
   GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVIntBaseline  64  
thrpt5  1.612 ± 0.078  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVInt64  
thrpt5  6.334 ± 0.914  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVIntBaseline64  
thrpt5  7.362 ± 0.505  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readVInt 64  
thrpt5  4.885 ± 0.545  ops/us
   GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVInt   64  
thrpt5  2.977 ± 0.346  ops/us
   GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVIntBaseline   64  
thrpt5  2.660 ± 1.700  ops/us
   ```
   
   I ran with `-XX:+PrintInlining` parameter in 
`benchMMapDirectoryInputs_readGroupVInt` task, the output is similar to java17
   ```
   702  799   3   org.apache.lucene.store.DataInput::readGroupVInts 
(41 bytes)   made not entrant
 @ 12   
org.apache.lucene.store.DataInput::readGroupVInt (7 bytes)   inline (hot)
  \-> TypeProfile (26795/26795 counts) = 
org/apache/lucene/store/ByteBufferIndexInput$SingleBufferImpl
   @ 3   
org.apache.lucene.util.GroupVIntUtil::readGroupVInt (77 bytes)   inline (hot)
  !  @ 1   
org.apache.lucene.store.ByteBufferIndexInput::readByte (100 bytes)   inline 
(hot)
   @ 8   
org.apache.lucene.store.ByteBufferGuard::getByte (9 bytes)   inline (hot)
 @ 1   
org.apache.lucene.store.ByteBufferGuard::ensureValid (16 bytes)   inline (hot)
  !  @ 5   java.nio.DirectByteBuffer::get 
(28 bytes)   inline (hot)
  \-> TypeProfile (13760/13760 counts) 
= java/nio/DirectByteBufferR
   @ 5   java.nio.Buffer::nextGetIndex 
(31 bytes)   inline (hot)
   @ 8   java.nio.DirectByteBuffer::ix 
(10 bytes)   inline (hot)
   @ 11   
jdk.internal.misc.Unsafe::getByte (7 bytes)   force inline by annotation
 @ 3   
jdk.internal.misc.Unsafe::getByte (0 bytes)   (intrinsic)
   @ 16   
java.lang.ref.Reference::reachabilityFence (1 bytes)   force inline by 
annotation
 @ 39   
org.apache.lucene.util.GroupVIntUtil::readLongInGroup (81 bytes)   inline (hot)
  !@ 49   
org.apache.lucene.store.ByteBufferIndexInput::readShort (25 bytes)   inline 
(hot)
 @ 8   
org.apache.lucene.store.ByteBufferGuard::getShort (9 bytes)   inline (hot)
   @ 1   
org.apache.lucene.store.ByteBufferGuard::ensureValid (16 bytes)   inline (hot)
  !@ 5   
java.nio.DirectByteBuffer::getShort (27 bytes)   inline (hot)
\-> TypeProfile (59062/59062 
counts) = java/nio/DirectByteBufferR
 @ 4   
java.nio.Buffer::nextGetIndex (38 bytes)   inline (hot)
 @ 7   
java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
  !  @ 10   
java.nio.DirectByteBuffer::getShort (32 bytes)   inline (hot)
   @ 9   
jdk.internal.misc.Unsafe::getShortUnaligned (12 bytes)   inline (hot)
 @ 5   
jdk.internal.misc.Unsafe::getShortUnaligned (33 bytes)   (intrinsic)
 @ 8   
jdk.internal.misc.Unsafe::convEndian (16 bytes)   inline (hot)
   @ 17   
java.lang.ref.Reference::reachabilityFence (1 bytes)   force inline by 
annotation
 @ 15   
java.lang.ref.Reference::reachabilityFence (1 bytes)   force inline by 
annotation
   ```


-- 
This is 

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-26 Thread via GitHub


uschindler commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1869642731

   Hi,
   
   Thanks for the measurements, @easyice. So basically, we can backport the 
commit without any modifications. I can do this and move change entries 
afterwards.
   
   No PR is needed for that. It's easy.
   
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-26 Thread via GitHub


dungba88 commented on PR #12980:
URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869867635

   There is only 1 failed test left: TestFSTPostingFormat.testRandomException
   
   ```
  > Caused by:
  > java.lang.RuntimeException: unclosed IndexOutput: 
_s_FST50_0.tfp.meta
  > at 
org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783)
   ```
   
   Seems like some file might not be closed correctly when there are exception


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-26 Thread via GitHub


dungba88 commented on PR #12980:
URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869917203

   Fixed the above unclosed issue by moving `openInput` and `createOutput` to 
try-catch block


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org