Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2023-12-27 Thread via GitHub


dungba88 commented on code in PR #12980:
URL: https://github.com/apache/lucene/pull/12980#discussion_r1436295737


##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java:
##
@@ -187,33 +200,38 @@ public void write(Fields fields, NormsProducer norms) 
throws IOException {
 
   @Override
   public void close() throws IOException {
-if (out != null) {
+if (metaOut != null) {
+  assert dataOut != null;
   boolean success = false;
   try {
 // write field summary
-final long dirStart = out.getFilePointer();
+final long dirStart = metaOut.getFilePointer();
 
-out.writeVInt(fields.size());
+metaOut.writeVInt(fields.size());
 for (FieldMetaData field : fields) {
-  out.writeVInt(field.fieldInfo.number);
-  out.writeVLong(field.numTerms);
+  metaOut.writeVInt(field.fieldInfo.number);
+  metaOut.writeVLong(field.numTerms);
   if (field.fieldInfo.getIndexOptions() != IndexOptions.DOCS) {
-out.writeVLong(field.sumTotalTermFreq);
+metaOut.writeVLong(field.sumTotalTermFreq);
   }
-  out.writeVLong(field.sumDocFreq);
-  out.writeVInt(field.docCount);
-  field.dict.save(out, out);
+  metaOut.writeVLong(field.sumDocFreq);
+  metaOut.writeVInt(field.docCount);
+  // write the starting file pointer
+  metaOut.writeVLong(dataOut.getFilePointer());

Review Comment:
   Oh I think I found the bug, and why using an on-heap DataOutput works.
   
   This `close()` method is called after the FST for all fields have been saved 
(streamed), and thus `dataOut.getFilePointer()` always points to the same 
pointer (EOF).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-27 Thread via GitHub


s1monw commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1437134740


##
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##
@@ -188,6 +200,26 @@ public static FieldInfos getMergedFieldInfos(IndexReader 
reader) {
 }
   }
 
+  private static String getAndValidateParentField(List 
leaves) {
+boolean set = false;
+String theField = null;
+for (LeafReaderContext ctx : leaves) {
+  String field = ctx.reader().getFieldInfos().getParentField();
+  if (set && Objects.equals(field, theField) == false) {
+throw new IllegalStateException(

Review Comment:
   yeah I think so too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-27 Thread via GitHub


s1monw commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1870447436

   @mikemccand @jpountz I think it's ready. I added some more testing to it and 
removed storing the no. of children in the DV field to make it as low impact as 
possible. we can still optimize this if we want / need later internally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Where should we stream FST to disk directly? [lucene]

2023-12-27 Thread via GitHub


dungba88 commented on issue #12902:
URL: https://github.com/apache/lucene/issues/12902#issuecomment-1870848548

   Put the first PR for `FSTPostingsFormat`: 
https://github.com/apache/lucene/pull/12980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TransitionAccessor for NFA: get transitions for a given state via random-access leads to wrong results. [lucene]

2023-12-27 Thread via GitHub


zhaih closed issue #12906: TransitionAccessor for NFA: get transitions for a 
given state via random-access leads to wrong results. 
URL: https://github.com/apache/lucene/issues/12906


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix bug where NFARunAutomaton#getTransition does not set Transition correctly [lucene]

2023-12-27 Thread via GitHub


zhaih merged PR #12909:
URL: https://github.com/apache/lucene/pull/12909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org