[GitHub] [lucene] mkhludnev commented on issue #12259: Case insensitive search
mkhludnev commented on issue #12259: URL: https://github.com/apache/lucene/issues/12259#issuecomment-1534327232 [SrikanthMedisetti] please use TextField instead. Not an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev closed issue #12259: Case insensitive search
mkhludnev closed issue #12259: Case insensitive search URL: https://github.com/apache/lucene/issues/12259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev opened a new issue, #12264: Shouldn't StandardTokenizer keep aplanum dot joined?
mkhludnev opened a new issue, #12264: URL: https://github.com/apache/lucene/issues/12264 ### Description ### AS-IS `a9nine.com` -> `a9nine.com` `3.14` -> `3.14` ### Problem `a9.com` -> `a9` `com` Should it keep them joined? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?
romseygeek commented on issue #12264: URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534586713 The tokenizer is based on http://unicode.org/reports/tr29/, which has rules for handling dots that appear in numbers or in URLs, but it does seem that URLs that have a number before a dot are not handled here (the relevant rule I think is http://unicode.org/reports/tr29/#WB6 that tells the tokenizer not to break on letter + dot + letter, and then WB11 tells it not to break on number + dot + number, but there's nothing about number + dot + letter - possibly because there are also a bunch of cases where we *do* actually want to break here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?
mkhludnev commented on issue #12264: URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534733406 Thanks @romseygeek. Right. It's a question. Maybe it's worth to discuss. For the reference https://lists.apache.org/thread/gpxz58jdb9n1sh2oxx161g4kkd7x94wn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?
mkhludnev commented on issue #12264: URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534792087 The proposal around http://unicode.org/reports/tr29/#WB7 is to introduce (implement) two new don't break rules: *WB6a* `AHLetter Numeric | × | (MidLetter | MidNumLetQ) AHLetter` *WB7d* `AHLetter Numeric (MidLetter | MidNumLetQ) | × | AHLetter` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder
msokolov commented on PR #12254: URL: https://github.com/apache/lucene/pull/12254#issuecomment-1534829661 Yep, I plan to review, definitely interested to see this get committed, but it's a bit complex and I need to find some quiet time, which is rare. You know we're all volunteers here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev commented on a diff in pull request #12245: `ToParentBlockJoinQuery` Explain Support Score Mode
mkhludnev commented on code in PR #12245: URL: https://github.com/apache/lucene/pull/12245#discussion_r1185121793 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -391,35 +391,75 @@ private void setScoreAndFreq() throws IOException { } this.score = (float) score; } - -public Explanation explain(LeafReaderContext context, Weight childWeight) throws IOException { +/* + * This instance of Explanation requires three parameters, context, childWeight, and scoreMode. + * The scoreMode parameter considers Avg, Total, Min, Max, and None. + * */ +public Explanation explain(LeafReaderContext context, Weight childWeight, ScoreMode scoreMode) +throws IOException { int prevParentDoc = parentBits.prevSetBit(parentApproximation.docID() - 1); int start = context.docBase + prevParentDoc + 1; // +1 b/c prevParentDoc is previous parent doc int end = context.docBase + parentApproximation.docID() - 1; // -1 b/c parentDoc is parent doc Explanation bestChild = null; + double childScoreSum = 0; int matches = 0; for (int childDoc = start; childDoc <= end; childDoc++) { Explanation child = childWeight.explain(context, childDoc - context.docBase); if (child.isMatch()) { matches++; + childScoreSum += child.getValue().doubleValue(); + if (bestChild == null - || child.getValue().floatValue() > bestChild.getValue().floatValue()) { + || child.getValue().doubleValue() > bestChild.getValue().doubleValue()) { bestChild = child; } } } - - return Explanation.match( - score(), - String.format( - Locale.ROOT, - "Score based on %d child docs in range from %d to %d, best match:", - matches, - start, - end), - bestChild); + if (bestChild == null) { +if (scoreMode == ScoreMode.None) { + return Explanation.noMatch("No children matched"); + +} else { + return Explanation.match( + 0.0f, + String.format( + Locale.ROOT, + "Score based on 0 child docs in range from %d to %d, using score mode %s", + start, + end, + scoreMode)); +} + } + if (scoreMode == ScoreMode.Avg) { +double avgScore = matches > 0 ? childScoreSum / (double) matches : 0; +return Explanation.match( +avgScore, +String.format( +Locale.ROOT, +"Score based on %d child docs in range from %d to %d, using score mode %s", +matches, +start, +end, +scoreMode), +bestChild); + } + if (scoreMode == ScoreMode.Total && matches > 0) { +double totalScore = childScoreSum; +return Explanation.match( +totalScore, +String.format( +Locale.ROOT, +"Score based on %d child docs in range from %d to %d, using score mode %s", +matches, +start, +end, +scoreMode), +bestChild); + } else { +return Explanation.noMatch("Unexpected score mode: " + scoreMode); Review Comment: Thanks. Here are a few questions: - Score.None is still not handled - why it is `if`s with boring copies, but not one `switch(mode)` - I see that we can just assume child match since ToPBJQ.explain() ensures the match - I still don't understand why to calculate score math here instead of just calling `this.score()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org