Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-25 Thread via GitHub


bajibalu commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1868838686

   Hi @setokk I am a new contributor to this repo. Unfortunately, I don't have 
both Windows 10 and Arch Linux. However, I ran the sample code above in the 
below setting and was able to see the output. It did not behave like you 
described above for me. Try to move the data and index location to the project 
folder and see if that resolves the issue.
   
   
   OS: Debian 12
   Lucene Version: Lucene 9.8.0
   JDK Version: openjdk 21.0.1
   
   Command
   `java -classpath test.jar org.eample.Main`
   
   Output
   ```
   /temp/index
   /temp/data.txt
   Dec 25, 2023 1:46:15 AM 
org.apache.lucene.store.MemorySegmentIndexInputProvider 
   INFO: Using MemorySegmentIndexInput with Java 21; to disable start with 
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
   Rebuilding index...
   0
   Done!
   1 documents indexed.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [BUG] FSDirectory stuck at open(Path path) method when ran from .jar file [lucene]

2023-12-25 Thread via GitHub


setokk commented on issue #12968:
URL: https://github.com/apache/lucene/issues/12968#issuecomment-1868911761

   I tried but to no avail. I'm using maven for packaging the app in .jar.
   
   Folder structure:
   
![image](https://github.com/apache/lucene/assets/73780295/fb6055f2-0df5-4897-ac93-cdf237f1c620)
   
   Output:
   
![image](https://github.com/apache/lucene/assets/73780295/a32c8be7-22f6-4ff0-b707-ad46bdb9f302)
   
   I mvn cleaned to see if there were any problems related to that but still 
the same output.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-25 Thread via GitHub


dungba88 opened a new pull request, #12980:
URL: https://github.com/apache/lucene/pull/12980

   ### Description
   
   Note: This PR is not ready yet. There are still some failed tests I'm trying 
to figure out.
   
   This is an attempt to make FSTPostingFormat to write the FST off-heap. 
Instead of write it on-heap then save to disk, we configure the compiler to 
write the FST off-heap right from the start.
   
   Some additional changes:
   - As we can't write the FST metadata and FST data on the same file, now we 
need to break the `tfp` file into 2 files: `tfp.meta` and `tfp.data`
   - We need to write the starting address of the FST data in the posting 
metadata file, then seek to that address when read


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-25 Thread via GitHub


dungba88 commented on code in PR #12980:
URL: https://github.com/apache/lucene/pull/12980#discussion_r1436285373


##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java:
##
@@ -278,6 +298,7 @@ public void finish(long sumTotalTermFreq, long sumDocFreq, 
int docCount) throws
   // save FST dict
   if (numTerms > 0) {
 final FST fst = fstCompiler.compile();
+fst.saveMetadata(metaOut);

Review Comment:
   This seems to be incorrect, we don't need to save the metadata here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-25 Thread via GitHub


dungba88 commented on PR #12980:
URL: https://github.com/apache/lucene/pull/12980#issuecomment-1869323228

   I temporarily made this PR to just split the meta and data into 2 files, but 
still use on-heap DataOutput. Seems like the test passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-25 Thread via GitHub


dungba88 commented on code in PR #12980:
URL: https://github.com/apache/lucene/pull/12980#discussion_r1436295737


##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java:
##
@@ -187,33 +200,38 @@ public void write(Fields fields, NormsProducer norms) 
throws IOException {
 
   @Override
   public void close() throws IOException {
-if (out != null) {
+if (metaOut != null) {
+  assert dataOut != null;
   boolean success = false;
   try {
 // write field summary
-final long dirStart = out.getFilePointer();
+final long dirStart = metaOut.getFilePointer();
 
-out.writeVInt(fields.size());
+metaOut.writeVInt(fields.size());
 for (FieldMetaData field : fields) {
-  out.writeVInt(field.fieldInfo.number);
-  out.writeVLong(field.numTerms);
+  metaOut.writeVInt(field.fieldInfo.number);
+  metaOut.writeVLong(field.numTerms);
   if (field.fieldInfo.getIndexOptions() != IndexOptions.DOCS) {
-out.writeVLong(field.sumTotalTermFreq);
+metaOut.writeVLong(field.sumTotalTermFreq);
   }
-  out.writeVLong(field.sumDocFreq);
-  out.writeVInt(field.docCount);
-  field.dict.save(out, out);
+  metaOut.writeVLong(field.sumDocFreq);
+  metaOut.writeVInt(field.docCount);
+  // write the starting file pointer
+  metaOut.writeVLong(dataOut.getFilePointer());

Review Comment:
   Oh I think I found the bug, and why using an on-heap DataOutput works.
   
   This `close()` method is called after the FST for all fields have been saved 
(streamed), and thus `dataOut.getFilePointer()` always points to the same 
pointer (EOF).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-25 Thread via GitHub


dungba88 commented on code in PR #12980:
URL: https://github.com/apache/lucene/pull/12980#discussion_r1436285373


##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java:
##
@@ -278,6 +298,7 @@ public void finish(long sumTotalTermFreq, long sumDocFreq, 
int docCount) throws
   // save FST dict
   if (numTerms > 0) {
 final FST fst = fstCompiler.compile();
+fst.saveMetadata(metaOut);

Review Comment:
   This seems to be incorrect, we don't need to save the metadata here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org