Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
bajibalu commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1868838686 Hi @setokk I am a new contributor to this repo. Unfortunately, I don't have both Windows 10 and Arch Linux. However, I ran the sample code above in the below setting and was able to see the output. It did not behave like you described above for me. Try to move the data and index location to the project folder and see if that resolves the issue. OS: Debian 12 Lucene Version: Lucene 9.8.0 JDK Version: openjdk 21.0.1 Command `java -classpath test.jar org.eample.Main` Output ``` /temp/index /temp/data.txt Dec 25, 2023 1:46:15 AM org.apache.lucene.store.MemorySegmentIndexInputProvider INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false Rebuilding index... 0 Done! 1 documents indexed. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]
setokk commented on issue #12968: URL: https://github.com/apache/lucene/issues/12968#issuecomment-1868911761 I tried but to no avail. I'm using maven for packaging the app in .jar. Folder structure:  Output:  I mvn cleaned to see if there were any problems related to that but still the same output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 opened a new pull request, #12980: URL: https://github.com/apache/lucene/pull/12980 ### Description Note: This PR is not ready yet. There are still some failed tests I'm trying to figure out. This is an attempt to make FSTPostingFormat to write the FST off-heap. Instead of write it on-heap then save to disk, we configure the compiler to write the FST off-heap right from the start. Some additional changes: - As we can't write the FST metadata and FST data on the same file, now we need to break the `tfp` file into 2 files: `tfp.meta` and `tfp.data` - We need to write the starting address of the FST data in the posting metadata file, then seek to that address when read -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on code in PR #12980: URL: https://github.com/apache/lucene/pull/12980#discussion_r1436285373 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java: ## @@ -278,6 +298,7 @@ public void finish(long sumTotalTermFreq, long sumDocFreq, int docCount) throws // save FST dict if (numTerms > 0) { final FST fst = fstCompiler.compile(); +fst.saveMetadata(metaOut); Review Comment: This seems to be incorrect, we don't need to save the metadata here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869323228 I temporarily made this PR to just split the meta and data into 2 files, but still use on-heap DataOutput. Seems like the test passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on code in PR #12980: URL: https://github.com/apache/lucene/pull/12980#discussion_r1436295737 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java: ## @@ -187,33 +200,38 @@ public void write(Fields fields, NormsProducer norms) throws IOException { @Override public void close() throws IOException { -if (out != null) { +if (metaOut != null) { + assert dataOut != null; boolean success = false; try { // write field summary -final long dirStart = out.getFilePointer(); +final long dirStart = metaOut.getFilePointer(); -out.writeVInt(fields.size()); +metaOut.writeVInt(fields.size()); for (FieldMetaData field : fields) { - out.writeVInt(field.fieldInfo.number); - out.writeVLong(field.numTerms); + metaOut.writeVInt(field.fieldInfo.number); + metaOut.writeVLong(field.numTerms); if (field.fieldInfo.getIndexOptions() != IndexOptions.DOCS) { -out.writeVLong(field.sumTotalTermFreq); +metaOut.writeVLong(field.sumTotalTermFreq); } - out.writeVLong(field.sumDocFreq); - out.writeVInt(field.docCount); - field.dict.save(out, out); + metaOut.writeVLong(field.sumDocFreq); + metaOut.writeVInt(field.docCount); + // write the starting file pointer + metaOut.writeVLong(dataOut.getFilePointer()); Review Comment: Oh I think I found the bug, and why using an on-heap DataOutput works. This `close()` method is called after the FST for all fields have been saved (streamed), and thus `dataOut.getFilePointer()` always points to the same pointer (EOF). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]
dungba88 commented on code in PR #12980: URL: https://github.com/apache/lucene/pull/12980#discussion_r1436285373 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java: ## @@ -278,6 +298,7 @@ public void finish(long sumTotalTermFreq, long sumDocFreq, int docCount) throws // save FST dict if (numTerms > 0) { final FST fst = fstCompiler.compile(); +fst.saveMetadata(metaOut); Review Comment: This seems to be incorrect, we don't need to save the metadata here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org