[jira] [Created] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet

2022-04-11 Thread Lu Xugang (Jira)
Lu Xugang created LUCENE-10511:
--

 Summary: IntersectIterators is not necessary under matchAll case 
in Facet
 Key: LUCENE-10511
 URL: https://issues.apache.org/jira/browse/LUCENE-10511
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Lu Xugang


If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's 
cost() which is precise, we may not do 
ConjunctionUtils.intersectIterators(List)  instand of 
DocIdSetIterator.all(int maxDoc)?

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #803: LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to work properly with or without offsets (depending on th

2022-04-11 Thread GitBox


dweiss merged PR #803:
URL: https://github.com/apache/lucene/pull/803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520468#comment-17520468
 ] 

ASF subversion and git services commented on LUCENE-10229:
--

Commit 2c1f9381390c7d81adb37424827bcebc4d81ae95 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2c1f9381390 ]

LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify 
highlighting to work properly with or without offsets (depending on their 
availability). (#803)

Thanks @romseygeek 

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10229:
-
Fix Version/s: 9.2

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-10229:


Assignee: Dawid Weiss

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10229:
-
Priority: Minor  (was: Major)

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes

2022-04-11 Thread GitBox


iverase commented on PR #804:
URL: https://github.com/apache/lucene/pull/804#issuecomment-1094868606

   @DaddyWri  would you be able to have a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520487#comment-17520487
 ] 

ASF subversion and git services commented on LUCENE-10229:
--

Commit 62fe8e28747f53a2ad06f4e0c6376c1de593dc63 in lucene's branch 
refs/heads/branch_9x from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=62fe8e28747 ]

LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify 
highlighting to work properly with or without offsets (depending on their 
availability). (#803)

Thanks @romseygeek 

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] DaddyWri commented on pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes

2022-04-11 Thread GitBox


DaddyWri commented on PR #804:
URL: https://github.com/apache/lucene/pull/804#issuecomment-1094949367

   As you suggest, it is essential that the limits of the resolution of planes 
be taken into account when building structures in Geo3D.  This change looks 
like it would possibly do a better job of building those planes under 
degenerate conditions, so I am in favor of it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-04-11 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520523#comment-17520523
 ] 

Ignacio Vera commented on LUCENE-10315:
---

> I'm seeing readInts24ForUtil runs 3 times faster than readInts24Legacy. This 
> speed is attractive to me.

In theory we should call this methods several times less frequently than the 
visitor method so we should not try to optimise for that.  For example, in 1D 
this method should only be called at most two times, but the visitor one can be 
called several times.

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg, cpu_profile_baseline.html, 
> cpu_profile_path.html
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_ca

[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-04-11 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520524#comment-17520524
 ] 

Ignacio Vera commented on LUCENE-10315:
---

I think adding BulkAdder#add(int[] docs, int count) should be done in a 
different issue as it is hiding the performance issues with the current 
approach. 

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg, cpu_profile_baseline.html, 
> cpu_profile_path.html
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_cardinality_candidate
> 193Mindex_1_doc_1024_cardinality_baseline
> 174Mindex_1_doc_1024_cardinality_candidate
> 241Mindex_1_doc_8192_cardinality_baseline
> 233Mindex_1000

[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-11 Thread GitBox


mocobeta commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1094989501

   I took the same bottom-up approach as 
https://issues.apache.org/jira/browse/LUCENE-10393 here again (determine the 
duplicate code and sort out the interfaces).
   I'll look through this again but - some private/package-private members have 
to be made public/protected at this moment. Once we further distill the 
`backtrace()` and n-best logic and move it to analysis-common from 
kuromoji/nori, they can be private again; I hope (and believe) it can be 
possible but I'd leave it for future examination.
   
   I tried but couldn't break it up into smaller patches... I will keep open 
this for waiting for feedback. Hope these changes make sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconsistent scores

2022-04-11 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520560#comment-17520560
 ] 

Michael Sokolov commented on LUCENE-9269:
-

# Does the checkBoosts test case you refer to fail when you attempt your 
change? If so, please address it. Otherwise, I think it should be fixed 
separately
 # It's OK for two different queries to behave the same, and I don't see how 
you can know that they will in this case, so they should compare different I 
think
 # again, toString() is not guaranteed to be different for different queries; I 
think it's OK

> Blended queries with boolean rewrite can result in inconsistent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs

2022-04-11 Thread Rich Bowen (Jira)
Rich Bowen created LUCENE-10512:
---

 Summary: Trivial: Identify and fix "the the" in comments, docs
 Key: LUCENE-10512
 URL: https://issues.apache.org/jira/browse/LUCENE-10512
 Project: Lucene - Core
  Issue Type: Task
Reporter: Rich Bowen


In reading, and attempting to familiarize myself with, the Lucene code, I 
noticed a number of occurances of "the the" (ie, repeated word) in docs and 
comments. Preparing a PR to fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rbowen opened a new pull request, #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


rbowen opened a new pull request, #807:
URL: https://github.com/apache/lucene/pull/807

   # Description
   
   Identify and fix "the the" repeated words in comments/docs.
   
   # Solution
   
   Purely cosmetic/grammar: Remove/replace "the the" in comments, documentation.
   
   # Tests
   
   No tests, because no functional change.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ x] I have developed this patch against the `main` branch.
   - [ x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes. (No: This makes no functional 
change)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


mikemccand commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1095108824

   Whoa, thanks @rbowen for the attention to detail!  This reminds me of the 
world's hardest band for search engines to find: [The 
The](https://en.wikipedia.org/wiki/The_The).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


mikemccand commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-109589

   That CI build failure is a code styling issue.  Lucene uses a [strict 
code-styling plugin called 
Spotless](https://issues.apache.org/jira/browse/LUCENE-9564), which removes all 
ambiguity and demands precise adherence, which is awesome (no more flame wars 
about whitespace).  You should be able to re-style your code automatically by 
running `./gradlew :lucene:core:spotlessApply`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs

2022-04-11 Thread Rich Bowen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520603#comment-17520603
 ] 

Rich Bowen commented on LUCENE-10512:
-

[https://github.com/apache/lucene/pull/807] fixes, however, since this is my 
first Lucene patch, I have done some things wrong. Am attempting to fix, and 
will try again ASAP.

> Trivial: Identify and fix "the the" in comments, docs
> -
>
> Key: LUCENE-10512
> URL: https://issues.apache.org/jira/browse/LUCENE-10512
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In reading, and attempting to familiarize myself with, the Lucene code, I 
> noticed a number of occurances of "the the" (ie, repeated word) in docs and 
> comments. Preparing a PR to fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


rmuir commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1095141887

   > You should be able to re-style your code automatically by running 
`./gradlew :lucene:core:spotlessApply`
   
   Personally I would not recommend running it this way. I run `./gradlew 
tidy`, across the entire codebase/modules. It doesn't take too long and has 
never given me a problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


mikemccand commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1095144308

   > > You should be able to re-style your code automatically by running 
`./gradlew :lucene:core:spotlessApply`
   > 
   > Personally I would not recommend running it this way. I run `./gradlew 
tidy`, across the entire codebase/modules. It doesn't take too long and has 
never given me a problem.
   
   +1
   
   Sorry, the command I suggested only fixes styling for `lucene/core`.  
@rmuir's command will fix ALL styling issues across ALL modules.
   
   Since the Spotless check seems to be fail-fast, maybe we should fix the 
exception message to just suggest `./gradlew tidy` instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Rich Bowen (Jira)
Rich Bowen created LUCENE-10513:
---

 Summary: Make it more obvious how to fix Spotless issues for new 
users
 Key: LUCENE-10513
 URL: https://issues.apache.org/jira/browse/LUCENE-10513
 Project: Lucene - Core
  Issue Type: Task
Reporter: Rich Bowen


I just made my first PR to Lucene (yay me!) and in the process stumbled on 
various things that were non-obvious.

I request, for The Next Person, that the error messaging in `gradlew` make it 
more obvious that one should run `./gradlew tidy` the first time around, so as 
to avoid the low-hanging formatting problems that cause everything else to fail.

During the course of my fumbling around, I was encouraged to run:

./gradlew :lucene:suggest:spotlessJavaCheck

./gradlew :lucene:suggest:spotlessApply

./gradlew :lucene:test-framework:spotlessApply

and

./gradlew check -Ptests.nightly=true

various times, by the error messages in `./gradlew check`, and while I got 
there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
tidy` first may have saved some frustration.

That said, I cannot overstate how impressed I am with the thoroughness of the 
testing/verification tools, and wish more projects had this kind of tooling. 
Thank you.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand merged pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


mikemccand merged PR #807:
URL: https://github.com/apache/lucene/pull/807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs

2022-04-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520638#comment-17520638
 ] 

ASF subversion and git services commented on LUCENE-10512:
--

Commit 0a069ed4542ab672230d3610d91a9eababead199 in lucene's branch 
refs/heads/main from Rich Bowen
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0a069ed4542 ]

LUCENE-10512: Grammar: Remove incidents of "the the" in comments. (#807)

* Grammar: Remove incidents of "the the" in comments.

* fixes formatting, as per helpful comment from Mike

* Running ./gradlew :lucene:misc:spotlessApply again made more changes.

* It keeps finding new things ... what's up with this?

* Fixing more nits that gradlew finds. Sorry, folks. I am new at this.

> Trivial: Identify and fix "the the" in comments, docs
> -
>
> Key: LUCENE-10512
> URL: https://issues.apache.org/jira/browse/LUCENE-10512
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Trivial
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In reading, and attempting to familiarize myself with, the Lucene code, I 
> noticed a number of occurances of "the the" (ie, repeated word) in docs and 
> comments. Preparing a PR to fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520641#comment-17520641
 ] 

Dawid Weiss commented on LUCENE-10513:
--

You should make yourself familiar with various help files under help/, here is 
one of them explicitly talking about formatting:

[https://github.com/apache/lucene/blob/main/help/formatting.txt]

I don't think more can be done about it, to be honest.

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520643#comment-17520643
 ] 

Dawid Weiss commented on LUCENE-10513:
--

Perhaps you could add a line to:

[https://github.com/apache/lucene/blob/main/help/workflow.txt]

and mention the tidy task that reformats the code prior to check.

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-11 Thread GitBox


dweiss commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1095199898

   bq. Since the Spotless check seems to be fail-fast, maybe we should fix the 
exception message to just suggest ./gradlew tidy instead?
   
   Gradle runs tasks in parallel so it's not really "fail fast". It's "abort 
anything not yet started because built will fail". And if multiple things fail, 
gradle will report all of them (as a list of problems). As any tool, it takes 
some getting used to - I think these messages are quite fine (and 'tidy' is in 
fact a non-standard invention of mine... but it's a four letter word so I 
couldn't resist).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rbowen opened a new pull request, #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-11 Thread GitBox


rbowen opened a new pull request, #808:
URL: https://github.com/apache/lucene/pull/808

   Encourage running `gradlew tidy` first, which, in turn, prevents failures in 
later steps.
   
   # Description
   
   In contributing my first change, I encountered formatting advice that would 
have been rendered unnecessary if I had first run `gradlew tidy`
   
   # Solution
   
   Recommend `gradlew tidy` as first step of workflow.
   
   # Tests
   
   Docs-only change - no tests.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x ] I have developed this patch against the `main` branch.
   - [x ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Rich Bowen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520651#comment-17520651
 ] 

Rich Bowen commented on LUCENE-10513:
-

Thanks. [https://github.com/apache/lucene/pull/808] proposed.

 

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-11 Thread GitBox


rmuir commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1095247184

   Thanks for following up here! These changes look fine. I'm wondering if 
there's anything we could improve in the CONTRIBUTING.md to make this easier, 
too, maybe something in "checks" section. maybe it is as simple as adding some 
markdown-links to this file to reference appropriate stuff in help/.
   
   Currently, CONTRIBUTING.md doesn't mention a thing about code formatting 
directly... which doesn't seem right to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs

2022-04-11 Thread Rich Bowen (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Bowen resolved LUCENE-10512.
-
Resolution: Fixed

Thank you, folks. PR merged, and much learned about the process and tooling 
around contributing to Lucene.

> Trivial: Identify and fix "the the" in comments, docs
> -
>
> Key: LUCENE-10512
> URL: https://issues.apache.org/jira/browse/LUCENE-10512
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Trivial
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In reading, and attempting to familiarize myself with, the Lucene code, I 
> noticed a number of occurances of "the the" (ie, repeated word) in docs and 
> comments. Preparing a PR to fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-11 Thread GitBox


dweiss commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1095406753

   I allowed myself to push minor changes to your branch, including what @rmuir 
suggested, which indeed seems like an omission.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-11 Thread GitBox


mocobeta commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1096056476

   Elaborating CONTRIBUTING.md might be good, on the other hand, we also makes 
it wordy and increase the maintenance cost (I know few people care about it, 
then it'd easily become out-of-date).
   Just an idea but how about removing the entire `Checks` section and just 
mentioning `gradlew helpWorkflow`...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase merged pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes

2022-04-11 Thread GitBox


iverase merged PR #804:
URL: https://github.com/apache/lucene/pull/804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10508) GeoArea failure with degenerated latitude

2022-04-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520926#comment-17520926
 ] 

ASF subversion and git services commented on LUCENE-10508:
--

Commit eb2df13bbadccee7c05397886d2448fb91f25f0d in lucene's branch 
refs/heads/main from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eb2df13bbad ]

LUCENE-10508:  Fixes some failures where a GeoArea is built with degenerated 
latitudes (#804)

Fixes some edge cases where GeoArea were built in a way that
vertical planes could not evaluate their sign, either because the planes
were the same or the center between those planes was lying on top of one
of the planes.

> GeoArea failure with degenerated latitude
> -
>
> Key: LUCENE-10508
> URL: https://issues.apache.org/jira/browse/LUCENE-10508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial3d
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The 
> issue seems to happen when you have an almost degenerated minLatitude and 
> maxLatitude and you are close to the poles. Then you might hit the following 
> exception"
> {code}
> java.lang.IllegalArgumentException: Cannot determine sidedness because check 
> point is on plane.
>   at 
> __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0)
>   at 
> org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137)
>   at 
> org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110)
>   at 
> org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100)
>   at 
> org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43)
> {code}
> The situation is easy to reproduce with the following test:
> {code:java}
>   public void testBBoxRandomDegenerate() {
> double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());;
> double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION);
> double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude());
> double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION);
> assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, 
> minX, maxX));
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10508) GeoArea failure with degenerated latitude

2022-04-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520929#comment-17520929
 ] 

ASF subversion and git services commented on LUCENE-10508:
--

Commit f4f1f7086f9ae6d8ed0351ca07ddd4d0497386f1 in lucene's branch 
refs/heads/branch_9x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f4f1f7086f9 ]

LUCENE-10508:  Fixes some failures where a GeoArea is built with degenerated 
latitudes (#804)

Fixes some edge cases where GeoArea were built in a way that
vertical planes could not evaluate their sign, either because the planes
were the same or the center between those planes was lying on top of one
of the planes.

> GeoArea failure with degenerated latitude
> -
>
> Key: LUCENE-10508
> URL: https://issues.apache.org/jira/browse/LUCENE-10508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial3d
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The 
> issue seems to happen when you have an almost degenerated minLatitude and 
> maxLatitude and you are close to the poles. Then you might hit the following 
> exception"
> {code}
> java.lang.IllegalArgumentException: Cannot determine sidedness because check 
> point is on plane.
>   at 
> __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0)
>   at 
> org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137)
>   at 
> org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110)
>   at 
> org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100)
>   at 
> org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43)
> {code}
> The situation is easy to reproduce with the following test:
> {code:java}
>   public void testBBoxRandomDegenerate() {
> double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());;
> double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION);
> double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude());
> double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION);
> assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, 
> minX, maxX));
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10508) GeoArea failure with degenerated latitude

2022-04-11 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-10508.
---
Fix Version/s: 9.2
 Assignee: Ignacio Vera
   Resolution: Fixed

> GeoArea failure with degenerated latitude
> -
>
> Key: LUCENE-10508
> URL: https://issues.apache.org/jira/browse/LUCENE-10508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial3d
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The 
> issue seems to happen when you have an almost degenerated minLatitude and 
> maxLatitude and you are close to the poles. Then you might hit the following 
> exception"
> {code}
> java.lang.IllegalArgumentException: Cannot determine sidedness because check 
> point is on plane.
>   at 
> __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0)
>   at 
> org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137)
>   at 
> org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110)
>   at 
> org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100)
>   at 
> org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43)
> {code}
> The situation is easy to reproduce with the following test:
> {code:java}
>   public void testBBoxRandomDegenerate() {
> double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());;
> double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION);
> double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude());
> double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION);
> assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, 
> minX, maxX));
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-11 Thread GitBox


iverase commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1096194673

   @yixunx if there is no further input I am planning to push this change 
shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org