date:20231127

Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

2023-11-27 Thread via GitHub

vsop-479 commented on code in PR #12846: URL: https://github.com/apache/lucene/pull/12846#discussion_r1405891395 ## lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java: ## @@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1406124683 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java: ## @@ -262,6 +277,73 @@ long updateDocuments( } } + private interface DocV

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1827825175 > using a doc-value field where only parents documents have a value for the field, and the value must be the number of child documents that the parent has This is a neat idea to

Re: [I] Upgrade to OpenNLP 2.0 and add [LUCENE-10621] [lucene]

2023-11-27 Thread via GitHub

epugh commented on issue #11657: URL: https://github.com/apache/lucene/issues/11657#issuecomment-1827887052 OpenNLP 2.3.1 was recently released and would be nice to have Lucene pick it up. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

2023-11-27 Thread via GitHub

javanna commented on PR #12798: URL: https://github.com/apache/lucene/pull/12798#issuecomment-1828210981 With the latest updates, I am not convinced about this change. I think it's great to use TaskExecutor to execute parallel tasks, like you did in #12799, but I am under the impression tha

Re: [PR] upgrade to OpenNLP 2.3.0 [lucene]

2023-11-27 Thread via GitHub

epugh commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1828340906 FYI 2.3.1 was just released. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-27 Thread via GitHub

msokolov commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1828402628 > I don't think we give up any functionality. can you elaborate what functionality you are referring to? I don't think we should have a list of parent fields that IW requires, what woul

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r140662 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -104,13 +104,9 @@ public SegmentTermsEnumFrame(SegmentTermsEnu

Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

2023-11-27 Thread via GitHub

msfroh commented on PR #12750: URL: https://github.com/apache/lucene/pull/12750#issuecomment-1828469855 I was looking into this and the approach used for (Edge)NGramTokenizer back in 2013: https://github.com/apache/lucene/commit/a03e38d5d05008aaef969a200071c03a1d6cb991 The solution t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828548480 Hmm I'm running `Test2BFSTs` on this patch and noticed it seems to take very much longer during the `TEST: now verify` step where it confirms the built FST accepts all the inputs it j

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828590265 Hmm, also the `FSTCompiler.ramBytesUsed()` seems to no longer return the growing FST size: ``` 1> 310: 560 bytes; 594876500 nodes 1> 320: 560 bytes; 614066389

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828597325 Hmm, also oddly -- why do the number of nodes differ between `main` and 9.x? This PR should not have altered how many nodes are created as a function of FST inputs right? Or maybe h

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828839806 Ah I think since we removed the finish(), getting the reverse bytes reader is expectedly slower. We have to copy the bytes to a readonly buffer every time. If this is a problem maybe le

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-27 Thread via GitHub

zacharymorn merged PR #240: URL: https://github.com/apache/lucene/pull/240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828936176 I checked some of the usage in the analysis module. SynonymGraphFilter cache the `BytesReader` on constructor, and I think TokenFilter by default are cached per field? But lots of other

[PR] Report the time it took for building the FST [lucene]

2023-11-27 Thread via GitHub

dungba88 opened a new pull request, #12847: URL: https://github.com/apache/lucene/pull/12847 ### Description - Report the time it took for building the FST - Report the FST actual size, as it can differ from the RAM bytes used once the test is moved to off-heap -- This is an aut

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub

gf2121 commented on PR #12699: URL: https://github.com/apache/lucene/pull/12699#issuecomment-1829112668 Thanks for review and great suggestions @mikemccand ! > you want to merge and backport to 9.x? Yes. I'll merge and backport this this. -- This is an automated message from

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-27 Thread via GitHub

gf2121 merged PR #12699: URL: https://github.com/apache/lucene/pull/12699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub

dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829144978 Tested Test2BFST with `-Dtests.seed=D193E7FD4B9E68C4` **mainline** ``` 110: 432584968 RAM bytes used; 432367203 FST bytes; 211082699 nodes; took 248 seconds ```

Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [I] Upgrade to OpenNLP 2.0 and add [LUCENE-10621] [lucene]

Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

Re: [PR] upgrade to OpenNLP 2.3.0 [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

[PR] Report the time it took for building the FST [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

19 matches

Site Navigation

Mail list logo

Footer information