[jira] [Created] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet
Lu Xugang created LUCENE-10511: -- Summary: IntersectIterators is not necessary under matchAll case in Facet Key: LUCENE-10511 URL: https://issues.apache.org/jira/browse/LUCENE-10511 Project: Lucene - Core Issue Type: Improvement Reporter: Lu Xugang If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's cost() which is precise, we may not do ConjunctionUtils.intersectIterators(List) instand of DocIdSetIterator.all(int maxDoc)? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #803: LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to work properly with or without offsets (depending on th
dweiss merged PR #803: URL: https://github.com/apache/lucene/pull/803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520468#comment-17520468 ] ASF subversion and git services commented on LUCENE-10229: -- Commit 2c1f9381390c7d81adb37424827bcebc4d81ae95 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2c1f9381390 ] LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to work properly with or without offsets (depending on their availability). (#803) Thanks @romseygeek > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10229: - Fix Version/s: 9.2 > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10229: Assignee: Dawid Weiss > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10229: - Priority: Minor (was: Major) > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes
iverase commented on PR #804: URL: https://github.com/apache/lucene/pull/804#issuecomment-1094868606 @DaddyWri would you be able to have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520487#comment-17520487 ] ASF subversion and git services commented on LUCENE-10229: -- Commit 62fe8e28747f53a2ad06f4e0c6376c1de593dc63 in lucene's branch refs/heads/branch_9x from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=62fe8e28747 ] LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to work properly with or without offsets (depending on their availability). (#803) Thanks @romseygeek > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] DaddyWri commented on pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes
DaddyWri commented on PR #804: URL: https://github.com/apache/lucene/pull/804#issuecomment-1094949367 As you suggest, it is essential that the limits of the resolution of planes be taken into account when building structures in Geo3D. This change looks like it would possibly do a better job of building those planes under degenerate conditions, so I am in favor of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520523#comment-17520523 ] Ignacio Vera commented on LUCENE-10315: --- > I'm seeing readInts24ForUtil runs 3 times faster than readInts24Legacy. This > speed is attractive to me. In theory we should call this methods several times less frequently than the visitor method so we should not try to optimise for that. For example, in 1D this method should only be called at most two times, but the visitor one can be called several times. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg, cpu_profile_baseline.html, > cpu_profile_path.html > > Time Spent: 6.5h > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_ca
[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520524#comment-17520524 ] Ignacio Vera commented on LUCENE-10315: --- I think adding BulkAdder#add(int[] docs, int count) should be done in a different issue as it is hiding the performance issues with the current approach. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg, cpu_profile_baseline.html, > cpu_profile_path.html > > Time Spent: 6.5h > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_cardinality_candidate > 193Mindex_1_doc_1024_cardinality_baseline > 174Mindex_1_doc_1024_cardinality_candidate > 241Mindex_1_doc_8192_cardinality_baseline > 233Mindex_1000
[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
mocobeta commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1094989501 I took the same bottom-up approach as https://issues.apache.org/jira/browse/LUCENE-10393 here again (determine the duplicate code and sort out the interfaces). I'll look through this again but - some private/package-private members have to be made public/protected at this moment. Once we further distill the `backtrace()` and n-best logic and move it to analysis-common from kuromoji/nori, they can be private again; I hope (and believe) it can be possible but I'd leave it for future examination. I tried but couldn't break it up into smaller patches... I will keep open this for waiting for feedback. Hope these changes make sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconsistent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520560#comment-17520560 ] Michael Sokolov commented on LUCENE-9269: - # Does the checkBoosts test case you refer to fail when you attempt your change? If so, please address it. Otherwise, I think it should be fixed separately # It's OK for two different queries to behave the same, and I don't see how you can know that they will in this case, so they should compare different I think # again, toString() is not guaranteed to be different for different queries; I think it's OK > Blended queries with boolean rewrite can result in inconsistent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs
Rich Bowen created LUCENE-10512: --- Summary: Trivial: Identify and fix "the the" in comments, docs Key: LUCENE-10512 URL: https://issues.apache.org/jira/browse/LUCENE-10512 Project: Lucene - Core Issue Type: Task Reporter: Rich Bowen In reading, and attempting to familiarize myself with, the Lucene code, I noticed a number of occurances of "the the" (ie, repeated word) in docs and comments. Preparing a PR to fix. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rbowen opened a new pull request, #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
rbowen opened a new pull request, #807: URL: https://github.com/apache/lucene/pull/807 # Description Identify and fix "the the" repeated words in comments/docs. # Solution Purely cosmetic/grammar: Remove/replace "the the" in comments, documentation. # Tests No tests, because no functional change. # Checklist Please review the following and check all that apply: - [ x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x ] I have created a Jira issue and added the issue ID to my pull request title. - [x ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ x] I have developed this patch against the `main` branch. - [ x] I have run `./gradlew check`. - [ ] I have added tests for my changes. (No: This makes no functional change) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
mikemccand commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1095108824 Whoa, thanks @rbowen for the attention to detail! This reminds me of the world's hardest band for search engines to find: [The The](https://en.wikipedia.org/wiki/The_The). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
mikemccand commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-109589 That CI build failure is a code styling issue. Lucene uses a [strict code-styling plugin called Spotless](https://issues.apache.org/jira/browse/LUCENE-9564), which removes all ambiguity and demands precise adherence, which is awesome (no more flame wars about whitespace). You should be able to re-style your code automatically by running `./gradlew :lucene:core:spotlessApply` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs
[ https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520603#comment-17520603 ] Rich Bowen commented on LUCENE-10512: - [https://github.com/apache/lucene/pull/807] fixes, however, since this is my first Lucene patch, I have done some things wrong. Am attempting to fix, and will try again ASAP. > Trivial: Identify and fix "the the" in comments, docs > - > > Key: LUCENE-10512 > URL: https://issues.apache.org/jira/browse/LUCENE-10512 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > In reading, and attempting to familiarize myself with, the Lucene code, I > noticed a number of occurances of "the the" (ie, repeated word) in docs and > comments. Preparing a PR to fix. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
rmuir commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1095141887 > You should be able to re-style your code automatically by running `./gradlew :lucene:core:spotlessApply` Personally I would not recommend running it this way. I run `./gradlew tidy`, across the entire codebase/modules. It doesn't take too long and has never given me a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
mikemccand commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1095144308 > > You should be able to re-style your code automatically by running `./gradlew :lucene:core:spotlessApply` > > Personally I would not recommend running it this way. I run `./gradlew tidy`, across the entire codebase/modules. It doesn't take too long and has never given me a problem. +1 Sorry, the command I suggested only fixes styling for `lucene/core`. @rmuir's command will fix ALL styling issues across ALL modules. Since the Spotless check seems to be fail-fast, maybe we should fix the exception message to just suggest `./gradlew tidy` instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
Rich Bowen created LUCENE-10513: --- Summary: Make it more obvious how to fix Spotless issues for new users Key: LUCENE-10513 URL: https://issues.apache.org/jira/browse/LUCENE-10513 Project: Lucene - Core Issue Type: Task Reporter: Rich Bowen I just made my first PR to Lucene (yay me!) and in the process stumbled on various things that were non-obvious. I request, for The Next Person, that the error messaging in `gradlew` make it more obvious that one should run `./gradlew tidy` the first time around, so as to avoid the low-hanging formatting problems that cause everything else to fail. During the course of my fumbling around, I was encouraged to run: ./gradlew :lucene:suggest:spotlessJavaCheck ./gradlew :lucene:suggest:spotlessApply ./gradlew :lucene:test-framework:spotlessApply and ./gradlew check -Ptests.nightly=true various times, by the error messages in `./gradlew check`, and while I got there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew tidy` first may have saved some frustration. That said, I cannot overstate how impressed I am with the thoroughness of the testing/verification tools, and wish more projects had this kind of tooling. Thank you. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand merged pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
mikemccand merged PR #807: URL: https://github.com/apache/lucene/pull/807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs
[ https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520638#comment-17520638 ] ASF subversion and git services commented on LUCENE-10512: -- Commit 0a069ed4542ab672230d3610d91a9eababead199 in lucene's branch refs/heads/main from Rich Bowen [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0a069ed4542 ] LUCENE-10512: Grammar: Remove incidents of "the the" in comments. (#807) * Grammar: Remove incidents of "the the" in comments. * fixes formatting, as per helpful comment from Mike * Running ./gradlew :lucene:misc:spotlessApply again made more changes. * It keeps finding new things ... what's up with this? * Fixing more nits that gradlew finds. Sorry, folks. I am new at this. > Trivial: Identify and fix "the the" in comments, docs > - > > Key: LUCENE-10512 > URL: https://issues.apache.org/jira/browse/LUCENE-10512 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > In reading, and attempting to familiarize myself with, the Lucene code, I > noticed a number of occurances of "the the" (ie, repeated word) in docs and > comments. Preparing a PR to fix. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520641#comment-17520641 ] Dawid Weiss commented on LUCENE-10513: -- You should make yourself familiar with various help files under help/, here is one of them explicitly talking about formatting: [https://github.com/apache/lucene/blob/main/help/formatting.txt] I don't think more can be done about it, to be honest. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520643#comment-17520643 ] Dawid Weiss commented on LUCENE-10513: -- Perhaps you could add a line to: [https://github.com/apache/lucene/blob/main/help/workflow.txt] and mention the tidy task that reformats the code prior to check. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
dweiss commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1095199898 bq. Since the Spotless check seems to be fail-fast, maybe we should fix the exception message to just suggest ./gradlew tidy instead? Gradle runs tasks in parallel so it's not really "fail fast". It's "abort anything not yet started because built will fail". And if multiple things fail, gradle will report all of them (as a list of problems). As any tool, it takes some getting used to - I think these messages are quite fine (and 'tidy' is in fact a non-standard invention of mine... but it's a four letter word so I couldn't resist). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rbowen opened a new pull request, #808: LUCENE-10513: Run `gradlew tidy` first
rbowen opened a new pull request, #808: URL: https://github.com/apache/lucene/pull/808 Encourage running `gradlew tidy` first, which, in turn, prevents failures in later steps. # Description In contributing my first change, I encountered formatting advice that would have been rendered unnecessary if I had first run `gradlew tidy` # Solution Recommend `gradlew tidy` as first step of workflow. # Tests Docs-only change - no tests. # Checklist Please review the following and check all that apply: - [x ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x ] I have developed this patch against the `main` branch. - [x ] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520651#comment-17520651 ] Rich Bowen commented on LUCENE-10513: - Thanks. [https://github.com/apache/lucene/pull/808] proposed. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
rmuir commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1095247184 Thanks for following up here! These changes look fine. I'm wondering if there's anything we could improve in the CONTRIBUTING.md to make this easier, too, maybe something in "checks" section. maybe it is as simple as adding some markdown-links to this file to reference appropriate stuff in help/. Currently, CONTRIBUTING.md doesn't mention a thing about code formatting directly... which doesn't seem right to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10512) Trivial: Identify and fix "the the" in comments, docs
[ https://issues.apache.org/jira/browse/LUCENE-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Bowen resolved LUCENE-10512. - Resolution: Fixed Thank you, folks. PR merged, and much learned about the process and tooling around contributing to Lucene. > Trivial: Identify and fix "the the" in comments, docs > - > > Key: LUCENE-10512 > URL: https://issues.apache.org/jira/browse/LUCENE-10512 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > > In reading, and attempting to familiarize myself with, the Lucene code, I > noticed a number of occurances of "the the" (ie, repeated word) in docs and > comments. Preparing a PR to fix. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
dweiss commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1095406753 I allowed myself to push minor changes to your branch, including what @rmuir suggested, which indeed seems like an omission. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
mocobeta commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1096056476 Elaborating CONTRIBUTING.md might be good, on the other hand, we also makes it wordy and increase the maintenance cost (I know few people care about it, then it'd easily become out-of-date). Just an idea but how about removing the entire `Checks` section and just mentioning `gradlew helpWorkflow`... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #804: LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes
iverase merged PR #804: URL: https://github.com/apache/lucene/pull/804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10508) GeoArea failure with degenerated latitude
[ https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520926#comment-17520926 ] ASF subversion and git services commented on LUCENE-10508: -- Commit eb2df13bbadccee7c05397886d2448fb91f25f0d in lucene's branch refs/heads/main from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eb2df13bbad ] LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes (#804) Fixes some edge cases where GeoArea were built in a way that vertical planes could not evaluate their sign, either because the planes were the same or the center between those planes was lying on top of one of the planes. > GeoArea failure with degenerated latitude > - > > Key: LUCENE-10508 > URL: https://issues.apache.org/jira/browse/LUCENE-10508 > Project: Lucene - Core > Issue Type: Bug > Components: modules/spatial3d >Reporter: Ignacio Vera >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The > issue seems to happen when you have an almost degenerated minLatitude and > maxLatitude and you are close to the poles. Then you might hit the following > exception" > {code} > java.lang.IllegalArgumentException: Cannot determine sidedness because check > point is on plane. > at > __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0) > at > org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137) > at > org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110) > at > org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100) > at > org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43) > {code} > The situation is easy to reproduce with the following test: > {code:java} > public void testBBoxRandomDegenerate() { > double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());; > double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION); > double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude()); > double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION); > assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, > minX, maxX)); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10508) GeoArea failure with degenerated latitude
[ https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520929#comment-17520929 ] ASF subversion and git services commented on LUCENE-10508: -- Commit f4f1f7086f9ae6d8ed0351ca07ddd4d0497386f1 in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f4f1f7086f9 ] LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes (#804) Fixes some edge cases where GeoArea were built in a way that vertical planes could not evaluate their sign, either because the planes were the same or the center between those planes was lying on top of one of the planes. > GeoArea failure with degenerated latitude > - > > Key: LUCENE-10508 > URL: https://issues.apache.org/jira/browse/LUCENE-10508 > Project: Lucene - Core > Issue Type: Bug > Components: modules/spatial3d >Reporter: Ignacio Vera >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The > issue seems to happen when you have an almost degenerated minLatitude and > maxLatitude and you are close to the poles. Then you might hit the following > exception" > {code} > java.lang.IllegalArgumentException: Cannot determine sidedness because check > point is on plane. > at > __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0) > at > org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137) > at > org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110) > at > org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100) > at > org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43) > {code} > The situation is easy to reproduce with the following test: > {code:java} > public void testBBoxRandomDegenerate() { > double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());; > double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION); > double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude()); > double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION); > assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, > minX, maxX)); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10508) GeoArea failure with degenerated latitude
[ https://issues.apache.org/jira/browse/LUCENE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-10508. --- Fix Version/s: 9.2 Assignee: Ignacio Vera Resolution: Fixed > GeoArea failure with degenerated latitude > - > > Key: LUCENE-10508 > URL: https://issues.apache.org/jira/browse/LUCENE-10508 > Project: Lucene - Core > Issue Type: Bug > Components: modules/spatial3d >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 9.2 > > Time Spent: 40m > Remaining Estimate: 0h > > I hit a failure when trying to build a GeoArea using the GeoAreaFactory. The > issue seems to happen when you have an almost degenerated minLatitude and > maxLatitude and you are close to the poles. Then you might hit the following > exception" > {code} > java.lang.IllegalArgumentException: Cannot determine sidedness because check > point is on plane. > at > __randomizedtesting.SeedInfo.seed([EA56BB13E754A996:C7560EE2BA56A507]:0) > at > org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137) > at > org.apache.lucene.spatial3d.geom.GeoDegenerateVerticalLine.(GeoDegenerateVerticalLine.java:110) > at > org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:100) > at > org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43) > {code} > The situation is easy to reproduce with the following test: > {code:java} > public void testBBoxRandomDegenerate() { > double minX = Geo3DUtil.fromDegrees(GeoTestUtil.nextLongitude());; > double maxX = Math.nextUp(minX + Vector.MINIMUM_ANGULAR_RESOLUTION); > double minY = Geo3DUtil.fromDegrees(GeoTestUtil.nextLatitude()); > double maxY = Math.nextUp(minY + Vector.MINIMUM_ANGULAR_RESOLUTION); > assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, minY, > minX, maxX)); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges
iverase commented on PR #756: URL: https://github.com/apache/lucene/pull/756#issuecomment-1096194673 @yixunx if there is no further input I am planning to push this change shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org