date:20230504

[GitHub] [lucene] mkhludnev commented on issue #12259: Case insensitive search

2023-05-04 Thread via GitHub



mkhludnev commented on issue #12259:
URL: https://github.com/apache/lucene/issues/12259#issuecomment-1534327232

   [SrikanthMedisetti] please use TextField instead. Not an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev closed issue #12259: Case insensitive search

2023-05-04 Thread via GitHub



mkhludnev closed issue #12259: Case insensitive search
URL: https://github.com/apache/lucene/issues/12259


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev opened a new issue, #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

2023-05-04 Thread via GitHub



mkhludnev opened a new issue, #12264:
URL: https://github.com/apache/lucene/issues/12264

   ### Description
   
   ### AS-IS
   `a9nine.com` -> `a9nine.com`
   `3.14` -> `3.14`
   ### Problem
   `a9.com` -> `a9` `com`
   
   Should it keep them joined?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

2023-05-04 Thread via GitHub



romseygeek commented on issue #12264:
URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534586713

   The tokenizer is based on http://unicode.org/reports/tr29/, which has rules 
for handling dots that appear in numbers or in URLs, but it does seem that URLs 
that have a number before a dot are not handled here (the relevant rule I think 
is http://unicode.org/reports/tr29/#WB6 that tells the tokenizer not to break 
on letter + dot + letter, and then WB11 tells it not to break on number + dot + 
number, but there's nothing about number + dot + letter - possibly because 
there are also a bunch of cases where we *do* actually want to break here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

2023-05-04 Thread via GitHub



mkhludnev commented on issue #12264:
URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534733406

   Thanks @romseygeek. Right. It's a question. Maybe it's worth to discuss. 
   
   For the reference 
https://lists.apache.org/thread/gpxz58jdb9n1sh2oxx161g4kkd7x94wn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

2023-05-04 Thread via GitHub



mkhludnev commented on issue #12264:
URL: https://github.com/apache/lucene/issues/12264#issuecomment-1534792087

   The proposal around http://unicode.org/reports/tr29/#WB7 is to introduce 
(implement) two new don't break rules: 
   *WB6a*
   `AHLetter Numeric | × | (MidLetter | MidNumLetQ) AHLetter`
   *WB7d*
   `AHLetter Numeric (MidLetter | MidNumLetQ) | × | AHLetter`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-04 Thread via GitHub



msokolov commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1534829661

   Yep, I plan to review, definitely interested to see this get committed, but 
it's a bit complex and I need to find some quiet time, which is rare. You know 
we're all volunteers here!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev commented on a diff in pull request #12245: `ToParentBlockJoinQuery` Explain Support Score Mode

2023-05-04 Thread via GitHub



mkhludnev commented on code in PR #12245:
URL: https://github.com/apache/lucene/pull/12245#discussion_r1185121793


##
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java:
##
@@ -391,35 +391,75 @@ private void setScoreAndFreq() throws IOException {
   }
   this.score = (float) score;
 }
-
-public Explanation explain(LeafReaderContext context, Weight childWeight) 
throws IOException {
+/*
+ * This instance of Explanation requires three parameters, context, 
childWeight, and scoreMode.
+ * The scoreMode parameter considers Avg, Total, Min, Max, and None.
+ * */
+public Explanation explain(LeafReaderContext context, Weight childWeight, 
ScoreMode scoreMode)
+throws IOException {
   int prevParentDoc = parentBits.prevSetBit(parentApproximation.docID() - 
1);
   int start =
   context.docBase + prevParentDoc + 1; // +1 b/c prevParentDoc is 
previous parent doc
   int end = context.docBase + parentApproximation.docID() - 1; // -1 b/c 
parentDoc is parent doc
 
   Explanation bestChild = null;
+  double childScoreSum = 0;
   int matches = 0;
   for (int childDoc = start; childDoc <= end; childDoc++) {
 Explanation child = childWeight.explain(context, childDoc - 
context.docBase);
 if (child.isMatch()) {
   matches++;
+  childScoreSum += child.getValue().doubleValue();
+
   if (bestChild == null
-  || child.getValue().floatValue() > 
bestChild.getValue().floatValue()) {
+  || child.getValue().doubleValue() > 
bestChild.getValue().doubleValue()) {
 bestChild = child;
   }
 }
   }
-
-  return Explanation.match(
-  score(),
-  String.format(
-  Locale.ROOT,
-  "Score based on %d child docs in range from %d to %d, best 
match:",
-  matches,
-  start,
-  end),
-  bestChild);
+  if (bestChild == null) {
+if (scoreMode == ScoreMode.None) {
+  return Explanation.noMatch("No children matched");
+
+} else {
+  return Explanation.match(
+  0.0f,
+  String.format(
+  Locale.ROOT,
+  "Score based on 0 child docs in range from %d to %d, using 
score mode %s",
+  start,
+  end,
+  scoreMode));
+}
+  }
+  if (scoreMode == ScoreMode.Avg) {
+double avgScore = matches > 0 ? childScoreSum / (double) matches : 0;
+return Explanation.match(
+avgScore,
+String.format(
+Locale.ROOT,
+"Score based on %d child docs in range from %d to %d, using 
score mode %s",
+matches,
+start,
+end,
+scoreMode),
+bestChild);
+  }
+  if (scoreMode == ScoreMode.Total && matches > 0) {
+double totalScore = childScoreSum;
+return Explanation.match(
+totalScore,
+String.format(
+Locale.ROOT,
+"Score based on %d child docs in range from %d to %d, using 
score mode %s",
+matches,
+start,
+end,
+scoreMode),
+bestChild);
+  } else {
+return Explanation.noMatch("Unexpected score mode: " + scoreMode);

Review Comment:
   Thanks. Here are a few questions: 
   - Score.None is still not handled
   - why it is `if`s with boring copies, but not one `switch(mode)`
   - I see that we can just assume child match since ToPBJQ.explain() ensures 
the match
   - I still don't understand why to calculate score math here instead of just 
calling `this.score()`? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mkhludnev commented on issue #12259: Case insensitive search

[GitHub] [lucene] mkhludnev closed issue #12259: Case insensitive search

[GitHub] [lucene] mkhludnev opened a new issue, #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

[GitHub] [lucene] romseygeek commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

[GitHub] [lucene] mkhludnev commented on issue #12264: Shouldn't StandardTokenizer keep aplanum dot joined?

[GitHub] [lucene] msokolov commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

[GitHub] [lucene] mkhludnev commented on a diff in pull request #12245: `ToParentBlockJoinQuery` Explain Support Score Mode

8 matches

Site Navigation

Mail list logo

Footer information