[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-12 Thread GitBox
jpountz commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1379974864 Thanks @rmuir ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] jpountz merged pull request #12053: Allow reusing indexed binary fields.

2023-01-12 Thread GitBox
jpountz merged PR #12053: URL: https://github.com/apache/lucene/pull/12053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox
jpountz commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1067925752 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] jpountz opened a new pull request, #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox
jpountz opened a new pull request, #12079: URL: https://github.com/apache/lucene/pull/12079 On the NYC taxis dataset on my local machine, switching from `Arrays#compareUnsigned` to `ArrayUtil#getUnsignedComparator` yielded a 15% speedup of BKD merging. -- This is an automated message fro

[GitHub] [lucene] romseygeek commented on issue #10458: Ordered intervals can give inaccurate hits on interleaved terms [LUCENE-9418]

2023-01-12 Thread GitBox
romseygeek commented on issue #10458: URL: https://github.com/apache/lucene/issues/10458#issuecomment-1380139574 Hi @Brain2000, yes please open a separate issue. If you could include a reproduction then we can work out if it's an issue with TopFieldCollector itself, or with how you're usin

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox
javanna commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1380141646 @benwtrent could you add a changelog entry too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [lucene] iverase commented on pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox
iverase commented on PR #12079: URL: https://github.com/apache/lucene/pull/12079#issuecomment-1380193686 I wonder if we should add `Arrays.compareUnsigned` to forbidden APIs to force always to use the faster comparators. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068060646 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox
jpountz commented on PR #12079: URL: https://github.com/apache/lucene/pull/12079#issuecomment-1380286154 I remember thinking about it, and there are legitimate use-cases for `Arrays#compareUnsigned` like `BytesRef#compareTo`. Another thing is that `ArrayUtil#getUnsignedComparator` only help

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
jpountz commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068102375 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcept

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068103957 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068105620 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] mmatela opened a new issue, #12080: SynonymGraphFilter: wrong output token position when input positions overlap

2023-01-12 Thread GitBox
mmatela opened a new issue, #12080: URL: https://github.com/apache/lucene/issues/12080 ### Description In my example, the query is 'test polskie'. I use MorfologikFilter for Polish stemming, it turns 'polskie' into 'polski' + 'polskie'. I also use SynonymGraphFilter which turns

[GitHub] [lucene] jpountz opened a new pull request, #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox
jpountz opened a new pull request, #12081: URL: https://github.com/apache/lucene/pull/12081 In the case when an index is sorted on a low-cardinality field, or the index sort order correlates with the order in which documents get ingested, we can optimize `SortedDocIDMerger` by doing a singl

[GitHub] [lucene] jpountz commented on pull request #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox
jpountz commented on PR #12081: URL: https://github.com/apache/lucene/pull/12081#issuecomment-1380369577 Here are timings of the first doc value merges on the IndexTaxis benchmark and a sorted dense index: ``` SM 0 [2023-01-12T13:17:47.581987785Z; Thread-0]: 564 ms to merge doc val

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
jpountz commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068141063 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcept

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068144382 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068145212 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068154625 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1068156278 ## lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java: ## @@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOExcepti

[GitHub] [lucene] dantuzi commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-12 Thread GitBox
dantuzi commented on PR #12048: URL: https://github.com/apache/lucene/pull/12048#issuecomment-1380490660 @msokolov I'm finalizing another PR that works with HnswGraph and, to create the graph, I use the constants `DEFAULT_MAX_CONN` and `DEFAULT_BEAM_WIDTH` already defined. So, in the future

[GitHub] [lucene] jpountz commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

2023-01-12 Thread GitBox
jpountz commented on code in PR #12078: URL: https://github.com/apache/lucene/pull/12078#discussion_r1068260998 ## lucene/core/src/java/org/apache/lucene/document/LongField.java: ## @@ -108,8 +109,9 @@ public static Query newExactQuery(String field, long value) { */ publ

[GitHub] [lucene] javanna merged pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox
javanna merged PR #12072: URL: https://github.com/apache/lucene/pull/12072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna closed issue #12069: Long rewrite times for deeply nested, non-scoring Boolean queries

2023-01-12 Thread GitBox
javanna closed issue #12069: Long rewrite times for deeply nested, non-scoring Boolean queries URL: https://github.com/apache/lucene/issues/12069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox
javanna commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1380596699 Thanks @benwtrent ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] jpountz merged pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox
jpountz merged PR #12079: URL: https://github.com/apache/lucene/pull/12079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz merged pull request #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox
jpountz merged PR #12081: URL: https://github.com/apache/lucene/pull/12081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on pull request #11779: GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori

2023-01-12 Thread GitBox
jpountz commented on PR #11779: URL: https://github.com/apache/lucene/pull/11779#issuecomment-1380774395 @danmuzi Should this be backported to branch_9x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [lucene] jpountz commented on pull request #11807: No need to rewrite queries in unified highlighter

2023-01-12 Thread GitBox
jpountz commented on PR #11807: URL: https://github.com/apache/lucene/pull/11807#issuecomment-1380775336 @romseygeek Should it be backported to branch_9x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [lucene] Brain2000 opened a new issue, #12082: LeafFieldComparator setBottom not being called before compareBottom

2023-01-12 Thread GitBox
Brain2000 opened a new issue, #12082: URL: https://github.com/apache/lucene/issues/12082 ### Description It looks like there's a problem in the TopFieldCollector.java where it calls "compareBottom" without calling "setBottom" first. I believe this is an issue if getLeafComparer( )

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-12 Thread GitBox
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1068562367 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -94,36 +93,83 @@ public int size() { } /** - * Add node on the given level

[GitHub] [lucene] zhaih commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-12 Thread GitBox
zhaih commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1068747982 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -94,36 +93,83 @@ public int size() { } /** - * Add node on the given level +

[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-12 Thread GitBox
gsmiller commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1381092287 Moved this into "draft" state until I'm able to come back and figure out why there was a benchmark improvement with this change, given the feedback that `ReqExclBulkScorer` would be exp

[GitHub] [lucene] LuXugang commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

2023-01-12 Thread GitBox
LuXugang commented on code in PR #12078: URL: https://github.com/apache/lucene/pull/12078#discussion_r1068923445 ## lucene/core/src/java/org/apache/lucene/document/LongField.java: ## @@ -108,8 +109,9 @@ public static Query newExactQuery(String field, long value) { */ pub