date:20231127

Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

2023-11-27 Thread via GitHub



vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1405891395


##
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
 assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+int[] maxFreqs = this.maxFreqs;
+int[] otherMaxFreqs = acc.maxFreqs;
+assert Arrays.stream(maxFreqs).sum() == 0;
+
+System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+for (Impact entry : acc.otherFreqNormPairs) {
+  add(entry, otherFreqNormPairs);
+}

Review Comment:
   > Did you observe a speedup with this change? 
   
   I measured it with TestTermScorer.testRandomTopDocs without assertion, the 
result it not stable.
   I will try other benchmarks.
   
   > Also can you add a test?
   > It looks like we can optimize this bit as well by copying entries from the 
treeset directly?
   
   Thanks for your suggestion, i am working on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1406124683


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java:
##
@@ -262,6 +277,73 @@ long updateDocuments(
 }
   }
 
+  private interface DocValidator extends Consumer {

Review Comment:
   Do we really need a new interface here?  We have only one validator now 
(`ParentBlockValidator`) -- can we just make a concrete instance of that which 
IW calls?



##
lucene/core/src/java/org/apache/lucene/search/Sort.java:
##
@@ -59,10 +61,28 @@ public Sort() {
* is still a tie after all SortFields are checked, the internal Lucene 
docid is used to break it.
*/
   public Sort(SortField... fields) {
+this(null, fields);
+  }
+
+  /**
+   * Sets the sort to the given criteria in succession: the first SortField is 
checked first, but if
+   * it produces a tie, then the second SortField is used to break the tie, 
etc. Finally, if there
+   * is still a tie after all SortFields are checked, the internal Lucene 
docid is used to break it.
+   *
+   * @param parentField the name of a numeric doc values field that marks the 
last document of a
+   * document blocks indexed with {@link
+   * org.apache.lucene.index.IndexWriter#addDocuments(Iterable)} or it's 
update relatives. This
+   * is required for indices that use index sorting in combination with 
document blocks in order
+   * to maintain the document order of the blocks documents. Index sorting 
will effectively
+   * compare the parent (last document) of a block in order to stable sort 
all it's adjacent

Review Comment:
   s/it's/its



##
lucene/core/src/java/org/apache/lucene/search/Sort.java:
##
@@ -59,10 +61,28 @@ public Sort() {
* is still a tie after all SortFields are checked, the internal Lucene 
docid is used to break it.
*/
   public Sort(SortField... fields) {
+this(null, fields);
+  }
+
+  /**
+   * Sets the sort to the given criteria in succession: the first SortField is 
checked first, but if
+   * it produces a tie, then the second SortField is used to break the tie, 
etc. Finally, if there
+   * is still a tie after all SortFields are checked, the internal Lucene 
docid is used to break it.
+   *
+   * @param parentField the name of a numeric doc values field that marks the 
last document of a

Review Comment:
   Can we make it clearer that you only use this for index-time sorting?  It 
has no meaning at search time (and should not be set)?
   
   Or ... does it have search-time meaning?  Can I use this to retrieve whole 
blocks in my search hits?



##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java:
##
@@ -262,6 +277,73 @@ long updateDocuments(
 }
   }
 
+  private interface DocValidator extends Consumer {
+
+default void afterDocument() {}
+
+default void beforeDocument() {}
+
+default void afterDocuments(int numDocs) {}
+
+default void reset() {}
+  }
+
+  private static class ParentBlockValidator implements DocValidator {
+private boolean lastDocHasParentField = false;

Review Comment:
   You don't need the `= false` -- it's java's default already :)



##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java:
##
@@ -262,6 +277,73 @@ long updateDocuments(
 }
   }
 
+  private interface DocValidator extends Consumer {
+
+default void afterDocument() {}
+
+default void beforeDocument() {}
+
+default void afterDocuments(int numDocs) {}
+
+default void reset() {}
+  }
+
+  private static class ParentBlockValidator implements DocValidator {
+private boolean lastDocHasParentField = false;
+private boolean childDocHasParentField = false;
+private final String parentFieldName;
+
+@Override
+public void reset() {
+  lastDocHasParentField = false;
+  childDocHasParentField = false;
+}
+
+private ParentBlockValidator(Sort sort) {
+  this.parentFieldName = sort.getParentField();
+}
+
+@Override
+public void beforeDocument() {
+  if (childDocHasParentField == false && lastDocHasParentField) {
+childDocHasParentField = true;
+  }
+  lastDocHasParentField = false;
+}
+
+@Override
+public void accept(IndexableField field) {
+  if (parentFieldName != null) {
+if (parentFieldName.equals(field.name())
+&& DocValuesType.NUMERIC == field.fieldType().docValuesType()) {
+  lastDocHasParentField = true;
+}
+  }
+}
+
+@Override
+public void afterDocument() {
+  if (childDocHasParentField) {
+throw new IllegalArgumentException(
+"only the last document in the block must contain a numeric doc 
values field named: "

Review Comment:
   Maybe reword to `"child documents must not contain the parent doc values 
field \"" + parentFieldName + "\""` or so?



##

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1827825175

   > using a doc-value field where only parents documents have a value for the 
field, and the value must be the number of child documents that the parent has
   
   This is a neat idea too -- would this maybe make completing the join at 
search time more efficient?  Today we scan for the next set bit, but I guess 
with this approach, we'd still scan for the next `NumericDocValues` that is 
set, and then see the child count, but that would be sort of redundant (we 
could still subtract docids to get the number of children)?  Also, I'm not sure 
how the nested case would be supported (same as the current PR's approach too).
   
   > This makes me wonder if Lucene could be adding this doc value field for 
root documents automatically, similarly to what it is doing for soft deletes? 
(Possibly in a follow-up PR)
   
   This is a nice idea too, but I think we'd still need user input to give us a 
free field name to use for this purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Upgrade to OpenNLP 2.0 and add [LUCENE-10621] [lucene]

2023-11-27 Thread via GitHub



epugh commented on issue #11657:
URL: https://github.com/apache/lucene/issues/11657#issuecomment-1827887052

   OpenNLP 2.3.1 was recently released and would be nice to have Lucene pick it 
up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

2023-11-27 Thread via GitHub



javanna commented on PR #12798:
URL: https://github.com/apache/lucene/pull/12798#issuecomment-1828210981

   With the latest updates, I am not convinced about this change. I think it's 
great to use TaskExecutor to execute parallel tasks, like you did in #12799, 
but I am under the impression that this PR takes it a bit too far in terms of 
responsibilities of TaskExecutor and functionalities it exposes. I woudl be ok 
with those consumers that deal with futures keeping their current logic that 
does not depend on TaskExecutor, unless they use TaskExecutor directly to run 
their futures. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] upgrade to OpenNLP 2.3.0 [lucene]

2023-11-27 Thread via GitHub



epugh commented on PR #12674:
URL: https://github.com/apache/lucene/pull/12674#issuecomment-1828340906

   FYI 2.3.1 was just released.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub



msokolov commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1828402628

   > I don't think we give up any functionality. can you elaborate what 
functionality you are referring to? I don't think we should have a list of 
parent fields that IW requires, what would that list be used for? I also don't 
understand how the APU lets you define overlapping relations that this change 
would prevent you from?
   
   I think we have always been able to model a multi-level hierarchy. Suppose I 
index a book broken into documents representing paragraphs grouped into 
sections, chapters, and parts; so it's a five-level hierarchy. There can be 
documents at every level and we want to be able to do roll-ups to any level. If 
I index a book as a block, we support this today. 
   
   I might be confused, but I didn't think that would work with this 
arrangement. However maybe it does? I guess one describes every component 
document as having the book as a parent, and then later can still use a 
different parent bitset to do the roll-ups to other levels?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on code in PR #12699:
URL: https://github.com/apache/lucene/pull/12699#discussion_r140662


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -104,13 +104,9 @@ public SegmentTermsEnumFrame(SegmentTermsEnum ste, int 
ord) throws IOException {
 suffixLengthsReader = new ByteArrayDataInput();
   }
 
-  public void setFloorData(ByteArrayDataInput in, BytesRef source) {
-final int numBytes = source.length - (in.getPosition() - source.offset);
-if (numBytes > floorData.length) {
-  floorData = new byte[ArrayUtil.oversize(numBytes, 1)];
-}
-System.arraycopy(source.bytes, source.offset + in.getPosition(), 
floorData, 0, numBytes);
-floorDataReader.reset(floorData, 0, numBytes);
+  public void setFloorData(SegmentTermsEnum.OutputAccumulator 
outputAccumulator) {
+outputAccumulator.setFloorData(floorDataReader);
+rewindPos = floorDataReader.getPosition();

Review Comment:
   Woops you're right!  Sorry :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

2023-11-27 Thread via GitHub



msfroh commented on PR #12750:
URL: https://github.com/apache/lucene/pull/12750#issuecomment-1828469855

   I was looking into this and the approach used for (Edge)NGramTokenizer back 
in 2013: 
https://github.com/apache/lucene/commit/a03e38d5d05008aaef969a200071c03a1d6cb991
   
   The solution there is to *always* set the position increment and length to 
1: 
https://github.com/apache/lucene/blob/8ef6a0da56878177ff8d6880c92e8f7d0321d076/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java#L186-L187
   
   With that change, your test passes (but I had to change every other test): 
https://github.com/msfroh/lucene/commit/0d05366c65a79aabc407e0662537520ba9c56737
   
   Given that it's not backward-compatible, I imagine it would have to be a 
change for 10.0? Also, whatever we do should probably also be applied to 
ReversePathHierarchyTokenizer too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828548480

   Hmm I'm running `Test2BFSTs` on this patch and noticed it seems to take very 
much longer during the `TEST: now verify` step where it confirms the built FST 
accepts all the inputs it just compiled into it.
   
   I `jstack`'d it and found this:
   
   ```
  java.lang.Thread.State: RUNNABLE
   at 
java.nio.ByteBuffer.position(java.base@17.0.9/ByteBuffer.java:1516)
   at java.nio.ByteBuffer.position(java.base@17.0.9/ByteBuffer.java:267)
   at java.nio.Buffer.(java.base@17.0.9/Buffer.java:246)
   at java.nio.ByteBuffer.(java.base@17.0.9/ByteBuffer.java:288)
   at 
java.nio.HeapByteBuffer.(java.base@17.0.9/HeapByteBuffer.java:95)
   at 
java.nio.HeapByteBufferR.(java.base@17.0.9/HeapByteBufferR.java:102)
   at 
java.nio.HeapByteBufferR.duplicate(java.base@17.0.9/HeapByteBufferR.java:135)
   at 
java.nio.HeapByteBufferR.asReadOnlyBuffer(java.base@17.0.9/HeapByteBufferR.java:148)
   at 
org.apache.lucene.store.ByteBuffersDataInput.(ByteBuffersDataInput.java:60)
   at 
org.apache.lucene.store.ByteBuffersDataOutput.toDataInput(ByteBuffersDataOutput.java:279)
   at 
org.apache.lucene.util.fst.ReadWriteDataOutput.getReverseBytesReader(ReadWriteDataOutput.java:52)
   at org.apache.lucene.util.fst.FST.getBytesReader(FST.java:1181)
   at org.apache.lucene.util.fst.Util.get(Util.java:50)
   at org.apache.lucene.util.fst.Test2BFST.test(Test2BFST.java:113)
   ```
   
   It looks like getting the reversed reader has maybe become quite a bit more 
expensive than `BytesStore` was?
   
   Note that `Test2BFSTs` is quite silly -- it uses `Util.get` for every 
lookup, instead of pulling and reusing a `ReverseBytesReader` like most "real" 
usages of FST (e.g. terms dict) will do.  So perhaps this performance change 
doesn't really matter in practice?  Though I wonder if all consumers of FST are 
re-using their reversed readers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828590265

   Hmm, also the `FSTCompiler.ramBytesUsed()` seems to no longer return the 
growing FST size:
   
   ```
 1> 310: 560 bytes; 594876500 nodes
 1> 320: 560 bytes; 614066389 nodes
   ```
   
   vs 9.x output:
   
   ```
 1> 310: 1220879080 bytes; 594876036 nodes
 1> 320: 1260262272 bytes; 614066120 nodes
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828597325

   Hmm, also oddly -- why do the number of nodes differ between `main` and 9.x? 
 This PR should not have altered how many nodes are created as a function of 
FST inputs right?  Or maybe how the RAM accounting is done for NodeHash changes 
a bit, causing a slightly larger FST on `main` than 9.x?  So many WTFs ("wow 
that's funny") suddenly :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



dungba88 commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828839806

   Ah I think since we removed the finish(), getting the reverse bytes reader 
is expectedly slower. We have to copy the bytes to a readonly buffer every 
time. If this is a problem maybe let try to add the finish() again? WDYT
   
   The ram bytes used is an easy fix. I'll change it in next rev.
   
   The difference in number of nodes is interesting. Lwt me look deeper. But 
did the 9.x you are testing has incorporated the value-based node hash?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-27 Thread via GitHub



zacharymorn merged PR #240:
URL: https://github.com/apache/lucene/pull/240


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



dungba88 commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828936176

   I checked some of the usage in the analysis module. SynonymGraphFilter cache 
the `BytesReader` on constructor, and I think TokenFilter by default are cached 
per field? But lots of other places does not have this cache, such as:
   - Stemmer
   - GeneratingSuggester
   - ...
   
   I think the main bottleneck are the non-trivial creation of the BufferList 
(in `toBufferList` or `toWritableBufferList`) and the ByteBufferDataInput.
   
   In the worst case, how do you think about reviving the BytesStore (with a 
much simpler implementation as it does have the scratch bytes operation)? Or 
alternatively, we could create a reusable block-based byte array DataOutput 
(not sure if there's already something like that).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Report the time it took for building the FST [lucene]

2023-11-27 Thread via GitHub



dungba88 opened a new pull request, #12847:
URL: https://github.com/apache/lucene/pull/12847

   ### Description
   
   - Report the time it took for building the FST
   - Report the FST actual size, as it can differ from the RAM bytes used once 
the test is moved to off-heap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub



gf2121 commented on PR #12699:
URL: https://github.com/apache/lucene/pull/12699#issuecomment-1829112668

   Thanks for review and great suggestions @mikemccand !
   
   > you want to merge and backport to 9.x?
   
   Yes. I'll merge and backport this this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub



gf2121 merged PR #12699:
URL: https://github.com/apache/lucene/pull/12699


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub



dungba88 commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829144978

   Tested Test2BFST with `-Dtests.seed=D193E7FD4B9E68C4`
   
   **mainline**
   
   ```
   110: 432584968 RAM bytes used; 432367203 FST bytes; 211082699 nodes; 
took 248 seconds
   ```
   
   **pr**
   
   ```
   110: 432427020 RAM bytes used; 432367203 FST bytes; 211082699 nodes; 
took 236 seconds
   ```
   
   **9_x**
   
   ```
   110: 432584968 RAM bytes used; 432367203 FST bytes; 211082699 nodes; 
took 260 seconds
   ```
   
   The FST size, number nodes are exactly the same, while the running time 
doesn't seem to differ much (running with my personal computer so the 
background process could be noisy).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [I] Upgrade to OpenNLP 2.0 and add [LUCENE-10621] [lucene]

Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

Re: [PR] upgrade to OpenNLP 2.3.0 [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

[PR] Report the time it took for building the FST [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

19 matches

Site Navigation

Mail list logo

Footer information