[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-21 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569289#comment-17569289
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Hi [~jpountz]!

Appreciate your help until now!

Another question.
I did a reindex and I get different scores.

query is:

 
{code:java}
{
  "query": {
    "term": {
      "sessionIds": "1234-1234"
    }
  }
}{code}
 

New index explain:
{code:java}
{
  "_index": "entities-new",
  "_type": "entity",
  "_id": "AWByRrSPIGshPfnDk4hN",
  "matched": true,
  "explanation": {
    "value": 22.941677,
    "description": "weight(sessionIds:1234-1234 in 1400) [PerFieldSimilarity], 
result of:",
    "details": [
      {
        "value": 22.941677,
        "description": "score from 
ScriptedSimilarity(weightScript=[Script{type=inline, lang='painless', 
idOrCode='return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);', options={}, 
params={}}], script=[Script{type=inline, lang='painless', idOrCode='return 
weight;', options={}, params={}}]) computed from:",
        "details": [
          {
            "value": 22.941677,
            "description": "weight",
            "details": []
          },
          {
            "value": 1.0,
            "description": "query.boost",
            "details": []
          },
          {
            "value": 12084378,
            "description": "field.docCount",
            "details": []
          },
          {
            "value": 4.730932E+7,
            "description": "field.sumDocFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "field.sumTotalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "term.docFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "term.totalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.freq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.length",
            "details": []
          }
        ]
      }
    ]
  }
}{code}
 

Old index explain:
{code:java}
{
  "_index" : "entities-old",
  "_type" : "entity",
  "_id" : "AWByRrSPIGshPfnDk4hN",
  "matched" : true,
  "explanation" : {
    "value" : 21.23644,
    "description" : "weight(sessionIds:1234-1234 in 527154) 
[PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 21.23644,
        "description" : "score(DFRSimilarity, doc=527154, freq=1.0), computed 
from:",
        "details" : [
          {
            "value" : 1.0,
            "description" : "no normalization",
            "details" : [ ]
          },
          {
            "value" : 21.23644,
            "description" : "BasicModelIn, computed from: ",
            "details" : [
              {
                "value" : 1.605901E7,
                "description" : "numberOfDocuments",
                "details" : [ ]
              },
              {
                "value" : 6.0,
                "description" : "docFreq",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 1.0,
            "description" : "no aftereffect",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}{code}

Does this make sense? I need the scores to stay the same.

Thanks

 

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> /

[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


iverase commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926470931


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation,

[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


iverase commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926470931


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation,

[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


iverase commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926482543


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation,

[GitHub] [lucene] mikemccand merged pull request #963: LUCENE-10583: Add docstring warning to not lock on Lucene objects

2022-07-21 Thread GitBox


mikemccand merged PR #963:
URL: https://github.com/apache/lucene/pull/963


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-07-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569344#comment-17569344
 ] 

ASF subversion and git services commented on LUCENE-10583:
--

Commit 25a842d87198af7b930d890a93b63093d9ca93c3 in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=25a842d8719 ]

LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963)

* add locking warning to docstring

* git tidy

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issue

[GitHub] [lucene] mikemccand commented on pull request #963: LUCENE-10583: Add docstring warning to not lock on Lucene objects

2022-07-21 Thread GitBox


mikemccand commented on PR #963:
URL: https://github.com/apache/lucene/pull/963#issuecomment-1191325044

   I backported to 9.x as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-07-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569345#comment-17569345
 ] 

ASF subversion and git services commented on LUCENE-10583:
--

Commit 1884a8730a315e1e51e6ad0b43774e6714a3b9d1 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1884a8730a3 ]

LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963)

* add locking warning to docstring

* git tidy

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #57: Enable hyperlinks to a commit in commitbots' comments

2022-07-21 Thread GitBox


mocobeta opened a new pull request, #57:
URL: https://github.com/apache/lucene-jira-archive/pull/57

   Close #11 
   
   Removes `[` and `]` if and only if it contains a URL-like string. We perhaps 
could apply it to all comments though, I applied it only to jira-bot's comments 
not to accidentally break comments by humans.
   
   An imported issue for testing:
   https://github.com/mocobeta/migration-test-3/issues/442
   
   Screenshot
   ![Screenshot from 2022-07-21 
20-47-27](https://user-images.githubusercontent.com/1825333/180206434-af9669d7-d689-4a08-b12f-7ccf08cb5d08.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on pull request #57: Enable hyperlinks to a commit in commitbots' comments

2022-07-21 Thread GitBox


mocobeta commented on PR #57:
URL: 
https://github.com/apache/lucene-jira-archive/pull/57#issuecomment-1191402838

   Fortunately, this works also for old issues (in 2013).
   https://github.com/mocobeta/migration-test-3/issues/445
   
   ![Screenshot from 2022-07-21 
21-03-18](https://user-images.githubusercontent.com/1825333/180209447-ca78faca-fe6c-4aea-a345-a5d5b881618d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on a diff in pull request #57: Enable hyperlinks to a commit in commitbots' comments

2022-07-21 Thread GitBox


mikemccand commented on code in PR #57:
URL: https://github.com/apache/lucene-jira-archive/pull/57#discussion_r926614459


##
migration/src/jira2github_import.py:
##
@@ -123,6 +123,17 @@ def comment_author(author_name, author_dispname):
 author_gh = account_map.get(author_name)
 return f"{author_dispname} (@{author_gh})" if author_gh else 
author_dispname
 
+def enable_hyperlink_to_commit(comment_body: str):
+lines = []
+for line in comment_body.split("\n"):
+# remove '[' and ']' iff it contains a URL (i.e. link to a 
commit in ASF GitBox repo).
+m = re.match(r"^\[\s?(https?://\S+)\s?\]$", line.strip())

Review Comment:
   Maybe `\s*` instead of `\s?` after the opening `[` and before the closing 
`]` for better robustness?   Or are we sure it's always exactly 0 or 1 space?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-21 Thread Nathan Meisels (Jira)


[ https://issues.apache.org/jira/browse/LUCENE-10650 ]


Nathan Meisels deleted comment on LUCENE-10650:
-

was (Author: JIRAUSER292626):
Hi [~jpountz]!

Appreciate your help until now!

Another question.
I did a reindex and I get different scores.

query is:

 
{code:java}
{
  "query": {
    "term": {
      "sessionIds": "1234-1234"
    }
  }
}{code}
 

New index explain:
{code:java}
{
  "_index": "entities-new",
  "_type": "entity",
  "_id": "AWByRrSPIGshPfnDk4hN",
  "matched": true,
  "explanation": {
    "value": 22.941677,
    "description": "weight(sessionIds:1234-1234 in 1400) [PerFieldSimilarity], 
result of:",
    "details": [
      {
        "value": 22.941677,
        "description": "score from 
ScriptedSimilarity(weightScript=[Script{type=inline, lang='painless', 
idOrCode='return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);', options={}, 
params={}}], script=[Script{type=inline, lang='painless', idOrCode='return 
weight;', options={}, params={}}]) computed from:",
        "details": [
          {
            "value": 22.941677,
            "description": "weight",
            "details": []
          },
          {
            "value": 1.0,
            "description": "query.boost",
            "details": []
          },
          {
            "value": 12084378,
            "description": "field.docCount",
            "details": []
          },
          {
            "value": 4.730932E+7,
            "description": "field.sumDocFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "field.sumTotalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "term.docFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "term.totalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.freq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.length",
            "details": []
          }
        ]
      }
    ]
  }
}{code}
 

Old index explain:
{code:java}
{
  "_index" : "entities-old",
  "_type" : "entity",
  "_id" : "AWByRrSPIGshPfnDk4hN",
  "matched" : true,
  "explanation" : {
    "value" : 21.23644,
    "description" : "weight(sessionIds:1234-1234 in 527154) 
[PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 21.23644,
        "description" : "score(DFRSimilarity, doc=527154, freq=1.0), computed 
from:",
        "details" : [
          {
            "value" : 1.0,
            "description" : "no normalization",
            "details" : [ ]
          },
          {
            "value" : 21.23644,
            "description" : "BasicModelIn, computed from: ",
            "details" : [
              {
                "value" : 1.605901E7,
                "description" : "numberOfDocuments",
                "details" : [ ]
              },
              {
                "value" : 6.0,
                "description" : "docFreq",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 1.0,
            "description" : "no aftereffect",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}{code}

Does this make sense? I need the scores to stay the same.

Thanks

 

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guarante

[jira] [Resolved] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-21 Thread Nathan Meisels (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels resolved LUCENE-10650.
-
Resolution: Done

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-10592 Build HNSW Graph on indexing

2022-07-21 Thread GitBox


mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1191493127

   @jtibshirani Thanks for the review. 
   
   > It's a bit confusing that the baseline slows down so much from 533s to 
654s, which is almost 2 minutes slower. Do you have a sense for why this is? I 
wonder if graph building time can vary a lot based on what order the vectors 
are processed.
   
   I did not do the detailed analysis and can only speculate that this could be 
the reason, but also that `SortingVectorValues`  can contribute to slowdown as 
they need to do extra lookups. 
   
   > I just realized that we're doing a cast which is pretty tricky/ fragile. 
The check visited.length() < capacity is only true if we are building the graph 
(not searching), and HnswGraphBuilder happens to always use FixedBitSet.
   As a follow-up maybe we should consider 
[LUCENE-10404](https://issues.apache.org/jira/browse/LUCENE-10404) or something 
similar, which chooses a better 'visited' data structure and doesn't require us 
to do this cast + resize.
   
   Good point, I agree about the fragile solution and +1 for investigate better 
data structure for `visited`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-10592 Build HNSW Graph on indexing

2022-07-21 Thread GitBox


mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1191496667

   @jpountz @jtibshirani Thanks for your review.
   
   It looks like we are removing Lucene93Hnsw* codecs in the `main` and 
`branch_9_3` branches. So once this removal is done, my plan for this PR:
   - Introduce Lucene94Hnsw* codes
   - Refactor this PR to use Lucene94Hnsw* codes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-07-21 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569420#comment-17569420
 ] 

Michael Sokolov commented on LUCENE-10404:
--

I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does 
seem to be a small speedup. I haven't had a chance to run luceneutil nor look 
at profiler output, but here are some numbers from KnnGraphTester for an 
internal dataset. The numbers can be a bit noisy, but are consistently better 
for the hash map version.
h3. IntIntHashMap

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.37   1   0       16      32      100     1566}}
{{0.965    0.49   1   50      16      32      150     0}}
{{0.962    0.41   1   0       16      64      100     2655}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.38   1   0       32      32      100     1473}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.45   1   0       32      64      100     2611}}
{{0.985    0.59   1   50      32      64      150     0}}
{{0.907    0.52   10  0       16      32      100     19850}}
{{0.940    0.72   10  50      16      32      150     0}}
{{0.941    0.60   10  0       16      64      100     38614}}
{{0.966    0.84   10  50      16      64      150     0}}
{{0.916    0.55   10  0       32      32      100     19243}}
{{0.949    0.75   10  50      32      32      150     0}}
{{0.952    0.66   10  0       32      64      100     38205}}
{{0.973    0.93   10  50      32      64      150     0}}
{{0.859    0.66   100 0       16      32      100     273112}}
{{{}0.897    0.92   100 50      16      32      150     0{}}}{{{}0.917    
0.85   100 0       16      64      100     523325
0.946    1.06   100 50      16      64      150     0
{}}}
h3. baseline

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.38   1   0       16      32      100     1614}}
{{0.965    0.50   1   50      16      32      150     0}}
{{0.962    0.45   1   0       16      64      100     2687}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.40   1   0       32      32      100     1504}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.44   1   0       32      64      100     2652}}
{{0.985    0.58   1   50      32      64      150     0}}
{{0.907    0.54   10  0       16      32      100     21449}}
{{0.940    0.74   10  50      16      32      150     0}}
{{0.941    0.64   10  0       16      64      100     39962}}
{{0.966    0.88   10  50      16      64      150     0}}
{{0.916    0.59   10  0       32      32      100     20554}}
{{0.949    0.80   10  50      32      32      150     0}}
{{0.952    0.72   10  0       32      64      100     40980}}
{{0.973    1.04   10  50      32      64      150     0}}
{{0.859    0.75   100 0       16      32      100     300514}}
{{0.897    0.96   100 50      16      32      150     0}}
{{0.917    0.84   100 0       16      64      100     563259}}
{{0.946    1.12   100 50      16      64      150     0}}
{{0.874    0.86   100 0       32      32      100     303186}}
{{0.913    1.09   100 50      32      32      150     0}}
{{0.929    1.04   100 0       32      64      100     580725}}
{{0.958    1.38   100 50      32      64      150     0}}

> Use hash set for visited nodes in HNSW search?
> --
>
> Key: LUCENE-10404
> URL: https://issues.apache.org/jira/browse/LUCENE-10404
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Minor
>
> While searching each layer, HNSW tracks the nodes it has already visited 
> using a BitSet. We could look into using something like IntHashSet instead. I 
> tried out the idea quickly by switching to IntIntHashMap (which has already 
> been copied from hppc) and saw an improvement in index performance. 
> *Baseline:* 760896 msec to write vectors
> *Using IntIntHashMap:* 733017 msec to write vectors
> I noticed search performance actually got a little bit worse with the change 
> -- that is something to look into.
> For background, it's good to be aware that HNSW can visit a lot of nodes. For 
> example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search 
> visits ~1000 - 15,000 docs depending on the recall. This number can increase 
> when searching with deleted docs, especially if you hit a "pathological" case 
> where the deleted docs happen to be closest to the query vector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

--

[jira] [Comment Edited] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-07-21 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569420#comment-17569420
 ] 

Michael Sokolov edited comment on LUCENE-10404 at 7/21/22 1:39 PM:
---

I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does 
seem to be a small speedup. I haven't had a chance to run luceneutil nor look 
at profiler output, but here are some numbers from KnnGraphTester for an 
internal dataset. The numbers can be a bit noisy, but are consistently better 
for the hash map version.
h3. IntIntHashMap

recall  latency nDoc    fanout  maxConn beamWidth       visited index ms
0.935    0.37   1   0       16      32      100     1566
0.965    0.49   1   50      16      32      150     0
0.962    0.41   1   0       16      64      100     2655
0.982    0.57   1   50      16      64      150     0
0.941    0.38   1   0       32      32      100     1473
0.969    0.51   1   50      32      32      150     0
0.966    0.45   1   0       32      64      100     2611
0.985    0.59   1   50      32      64      150     0
0.907    0.52   10  0       16      32      100     19850
0.940    0.72   10  50      16      32      150     0
0.941    0.60   10  0       16      64      100     38614
0.966    0.84   10  50      16      64      150     0
0.916    0.55   10  0       32      32      100     19243
0.949    0.75   10  50      32      32      150     0
0.952    0.66   10  0       32      64      100     38205
0.973    0.93   10  50      32      64      150     0
0.859    0.66   100 0       16      32      100     273112
0.897    0.92   100 50      16      32      150     0

{{0.917    0.85   100 0       16      64      100     523325}}
{{0.946    1.06   100 50      16      64      150     0}}



more to come – pushed ctrl-enter instead of enter ...
h3. baseline

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.38   1   0       16      32      100     1614}}
{{0.965    0.50   1   50      16      32      150     0}}
{{0.962    0.45   1   0       16      64      100     2687}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.40   1   0       32      32      100     1504}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.44   1   0       32      64      100     2652}}
{{0.985    0.58   1   50      32      64      150     0}}
{{0.907    0.54   10  0       16      32      100     21449}}
{{0.940    0.74   10  50      16      32      150     0}}
{{0.941    0.64   10  0       16      64      100     39962}}
{{0.966    0.88   10  50      16      64      150     0}}
{{0.916    0.59   10  0       32      32      100     20554}}
{{0.949    0.80   10  50      32      32      150     0}}
{{0.952    0.72   10  0       32      64      100     40980}}
{{0.973    1.04   10  50      32      64      150     0}}
{{0.859    0.75   100 0       16      32      100     300514}}
{{0.897    0.96   100 50      16      32      150     0}}
{{0.917    0.84   100 0       16      64      100     563259}}
{{0.946    1.12   100 50      16      64      150     0}}
{{0.874    0.86   100 0       32      32      100     303186}}
{{0.913    1.09   100 50      32      32      150     0}}
{{0.929    1.04   100 0       32      64      100     580725}}
{{0.958    1.38   100 50      32      64      150     0}}


was (Author: sokolov):
I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does 
seem to be a small speedup. I haven't had a chance to run luceneutil nor look 
at profiler output, but here are some numbers from KnnGraphTester for an 
internal dataset. The numbers can be a bit noisy, but are consistently better 
for the hash map version.
h3. IntIntHashMap

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.37   1   0       16      32      100     1566}}
{{0.965    0.49   1   50      16      32      150     0}}
{{0.962    0.41   1   0       16      64      100     2655}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.38   1   0       32      32      100     1473}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.45   1   0       32      64      100     2611}}
{{0.985    0.59   1   50      32      64      150     0}}
{{0.907    0.52   10  0       16      32      100     19850}}
{{0.940    0.72   10  50      16      32      150     0}}
{{0.941    0.60   10  0       16      64      100     38614}}
{{0.966    0.84   10  50      16      64      150  

[jira] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-07-21 Thread Michael Sokolov (Jira)


[ https://issues.apache.org/jira/browse/LUCENE-10404 ]


Michael Sokolov deleted comment on LUCENE-10404:
--

was (Author: sokolov):
I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does 
seem to be a small speedup. I haven't had a chance to run luceneutil nor look 
at profiler output, but here are some numbers from KnnGraphTester for an 
internal dataset. The numbers can be a bit noisy, but are consistently better 
for the hash map version.
h3. IntIntHashMap

recall  latency nDoc    fanout  maxConn beamWidth       visited index ms
0.935    0.37   1   0       16      32      100     1566
0.965    0.49   1   50      16      32      150     0
0.962    0.41   1   0       16      64      100     2655
0.982    0.57   1   50      16      64      150     0
0.941    0.38   1   0       32      32      100     1473
0.969    0.51   1   50      32      32      150     0
0.966    0.45   1   0       32      64      100     2611
0.985    0.59   1   50      32      64      150     0
0.907    0.52   10  0       16      32      100     19850
0.940    0.72   10  50      16      32      150     0
0.941    0.60   10  0       16      64      100     38614
0.966    0.84   10  50      16      64      150     0
0.916    0.55   10  0       32      32      100     19243
0.949    0.75   10  50      32      32      150     0
0.952    0.66   10  0       32      64      100     38205
0.973    0.93   10  50      32      64      150     0
0.859    0.66   100 0       16      32      100     273112
0.897    0.92   100 50      16      32      150     0

{{0.917    0.85   100 0       16      64      100     523325}}
{{0.946    1.06   100 50      16      64      150     0}}



more to come – pushed ctrl-enter instead of enter ...
h3. baseline

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.38   1   0       16      32      100     1614}}
{{0.965    0.50   1   50      16      32      150     0}}
{{0.962    0.45   1   0       16      64      100     2687}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.40   1   0       32      32      100     1504}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.44   1   0       32      64      100     2652}}
{{0.985    0.58   1   50      32      64      150     0}}
{{0.907    0.54   10  0       16      32      100     21449}}
{{0.940    0.74   10  50      16      32      150     0}}
{{0.941    0.64   10  0       16      64      100     39962}}
{{0.966    0.88   10  50      16      64      150     0}}
{{0.916    0.59   10  0       32      32      100     20554}}
{{0.949    0.80   10  50      32      32      150     0}}
{{0.952    0.72   10  0       32      64      100     40980}}
{{0.973    1.04   10  50      32      64      150     0}}
{{0.859    0.75   100 0       16      32      100     300514}}
{{0.897    0.96   100 50      16      32      150     0}}
{{0.917    0.84   100 0       16      64      100     563259}}
{{0.946    1.12   100 50      16      64      150     0}}
{{0.874    0.86   100 0       32      32      100     303186}}
{{0.913    1.09   100 50      32      32      150     0}}
{{0.929    1.04   100 0       32      64      100     580725}}
{{0.958    1.38   100 50      32      64      150     0}}

> Use hash set for visited nodes in HNSW search?
> --
>
> Key: LUCENE-10404
> URL: https://issues.apache.org/jira/browse/LUCENE-10404
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Minor
>
> While searching each layer, HNSW tracks the nodes it has already visited 
> using a BitSet. We could look into using something like IntHashSet instead. I 
> tried out the idea quickly by switching to IntIntHashMap (which has already 
> been copied from hppc) and saw an improvement in index performance. 
> *Baseline:* 760896 msec to write vectors
> *Using IntIntHashMap:* 733017 msec to write vectors
> I noticed search performance actually got a little bit worse with the change 
> -- that is something to look into.
> For background, it's good to be aware that HNSW can visit a lot of nodes. For 
> example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search 
> visits ~1000 - 15,000 docs depending on the recall. This number can increase 
> when searching with deleted docs, especially if you hit a "pathological" case 
> where the deleted docs happen to be closest to the query vector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Commented] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-07-21 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569429#comment-17569429
 ] 

Michael Sokolov commented on LUCENE-10404:
--

I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does 
seem to be a small speedup. I haven't had a chance to run luceneutil nor look 
at profiler output, but here are some numbers from KnnGraphTester for an 
internal dataset. The numbers can be a bit noisy, but are consistently better 
for the hash map version.
h3. IntIntHashMap

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.37   1   0       16      32      100     1566}}
{{0.965    0.49   1   50      16      32      150     0}}
{{0.962    0.41   1   0       16      64      100     2655}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.38   1   0       32      32      100     1473}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.45   1   0       32      64      100     2611}}
{{0.985    0.59   1   50      32      64      150     0}}
{{0.907    0.52   10  0       16      32      100     19850}}
{{0.940    0.72   10  50      16      32      150     0}}
{{0.941    0.60   10  0       16      64      100     38614}}
{{0.966    0.84   10  50      16      64      150     0}}
{{0.916    0.55   10  0       32      32      100     19243}}
{{0.949    0.75   10  50      32      32      150     0}}
{{0.952    0.66   10  0       32      64      100     38205}}
{{0.973    0.93   10  50      32      64      150     0}}
{{0.859    0.66   100 0       16      32      100     273112}}
{{0.897    0.92   100 50      16      32      150     0}}
{{0.917    0.85   100 0       16      64      100     523325}}
{{0.946    1.06   100 50      16      64      150     0}}
{{0.874    0.80   100 0       32      32      100     274816}}
{{0.913    1.05   100 50      32      32      150     0}}
{{0.929    0.98   100 0       32      64      100     564762}}
h3. baseline

{{recall  latency nDoc    fanout  maxConn beamWidth       visited index ms}}
{{0.935    0.38   1   0       16      32      100     1614}}
{{0.965    0.50   1   50      16      32      150     0}}
{{0.962    0.45   1   0       16      64      100     2687}}
{{0.982    0.57   1   50      16      64      150     0}}
{{0.941    0.40   1   0       32      32      100     1504}}
{{0.969    0.51   1   50      32      32      150     0}}
{{0.966    0.44   1   0       32      64      100     2652}}
{{0.985    0.58   1   50      32      64      150     0}}
{{0.907    0.54   10  0       16      32      100     21449}}
{{0.940    0.74   10  50      16      32      150     0}}
{{0.941    0.64   10  0       16      64      100     39962}}
{{0.966    0.88   10  50      16      64      150     0}}
{{0.916    0.59   10  0       32      32      100     20554}}
{{0.949    0.80   10  50      32      32      150     0}}
{{0.952    0.72   10  0       32      64      100     40980}}
{{0.973    1.04   10  50      32      64      150     0}}
{{0.859    0.75   100 0       16      32      100     300514}}
{{0.897    0.96   100 50      16      32      150     0}}
{{0.917    0.84   100 0       16      64      100     563259}}
{{0.946    1.12   100 50      16      64      150     0}}
{{0.874    0.86   100 0       32      32      100     303186}}
{{0.913    1.09   100 50      32      32      150     0}}
{{0.929    1.04   100 0       32      64      100     580725}}
{{0.958    1.38   100 50      32      64      150     0}}

 

> Use hash set for visited nodes in HNSW search?
> --
>
> Key: LUCENE-10404
> URL: https://issues.apache.org/jira/browse/LUCENE-10404
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Minor
>
> While searching each layer, HNSW tracks the nodes it has already visited 
> using a BitSet. We could look into using something like IntHashSet instead. I 
> tried out the idea quickly by switching to IntIntHashMap (which has already 
> been copied from hppc) and saw an improvement in index performance. 
> *Baseline:* 760896 msec to write vectors
> *Using IntIntHashMap:* 733017 msec to write vectors
> I noticed search performance actually got a little bit worse with the change 
> -- that is something to look into.
> For background, it's good to be aware that HNSW can visit a lot of nodes. For 
> example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search 
> visits ~1000 - 15,000 docs depending on the recall. This number can increase 
> when searching with deleted docs, especially if you hit a "pathological" case 

[jira] [Resolved] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

2022-07-21 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-10655.
--
Resolution: Fixed

> can we optimize visited bitset usage in HNSW graph search/indexing?
> ---
>
> Key: LUCENE-10655
> URL: https://issues.apache.org/jira/browse/LUCENE-10655
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/hnsw
>Reporter: Michael Sokolov
>Priority: Major
>
> When running {{luceneutil}}  I noticed that {{FixedBitSet.clear()}} dominates 
> the CPU profiler output. I had a few ideas:
>  # In upper graph layers, the occupied nodes are very sparse - maybe 
> {{SparseFixedBitSet}} would be a better fit for those
>  # We are caching these bitsets, but they are only used for a single search 
> (single document insert, during indexing). Should we cache across searches? 
> We would need to pool them though, and they would vary by field since fields 
> can have different numbers of vector nodes. This starts to get complex
>  # Are we sure that clearing a bitset is more efficient than allocating a new 
> one? Maybe the JDK maintains a pool of already-zeroed memory for us
> I think we could try specializing the bitset type by graph level, and then I 
> think we ought to measure the performance of allocation vs the limited reuse 
> that we currently have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

2022-07-21 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569457#comment-17569457
 ] 

Michael Sokolov commented on LUCENE-10655:
--

Ah, I see - I hadn't followed your investigations there closely, [~julietibs] . 
Well at least we can confirm what you had found. I'll close this now - it 
doesn't seem fruitful, and I think the hash set idea has legs.

> can we optimize visited bitset usage in HNSW graph search/indexing?
> ---
>
> Key: LUCENE-10655
> URL: https://issues.apache.org/jira/browse/LUCENE-10655
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/hnsw
>Reporter: Michael Sokolov
>Priority: Major
>
> When running {{luceneutil}}  I noticed that {{FixedBitSet.clear()}} dominates 
> the CPU profiler output. I had a few ideas:
>  # In upper graph layers, the occupied nodes are very sparse - maybe 
> {{SparseFixedBitSet}} would be a better fit for those
>  # We are caching these bitsets, but they are only used for a single search 
> (single document insert, during indexing). Should we cache across searches? 
> We would need to pool them though, and they would vary by field since fields 
> can have different numbers of vector nodes. This starts to get complex
>  # Are we sure that clearing a bitset is more efficient than allocating a new 
> one? Maybe the JDK maintains a pool of already-zeroed memory for us
> I think we could try specializing the bitset type by graph level, and then I 
> think we ought to measure the performance of allocation vs the limited reuse 
> that we currently have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #1038: Fix TestDisiPriorityQueue test bug

2022-07-21 Thread GitBox


gsmiller merged PR #1038:
URL: https://github.com/apache/lucene/pull/1038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10659) Fix random TestDisiPriorityQueue bug

2022-07-21 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-10659.
--
Fix Version/s: 9.3
   Resolution: Fixed

> Fix random TestDisiPriorityQueue bug
> 
>
> Key: LUCENE-10659
> URL: https://issues.apache.org/jira/browse/LUCENE-10659
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 9.3
>Reporter: Greg Miller
>Priority: Blocker
> Fix For: 9.3
>
>
> A recently added test ({{TestDisiPriorityQueue}}) has a bug that can randomly 
> trip (my fault). I fixed this on {{main}} and {{branch_9x}}, but I think we 
> should roll it into the 9.3 release. I'll prepare a PR, but raising it here 
> for visibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


nknize commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926815794


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation, 

[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


nknize commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926815794


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation, 

[GitHub] [lucene] JoeHF commented on a diff in pull request #1003: LUCENE-10616: optimizing decompress when only retrieving some fields

2022-07-21 Thread GitBox


JoeHF commented on code in PR #1003:
URL: https://github.com/apache/lucene/pull/1003#discussion_r926864645


##
lucene/core/src/java/org/apache/lucene/document/DocumentStoredFieldVisitor.java:
##
@@ -98,6 +100,16 @@ public void doubleField(FieldInfo fieldInfo, double value) {
 
   @Override
   public Status needsField(FieldInfo fieldInfo) throws IOException {
+// return stop after collected all needed fields
+if (fieldsToAdd != null
+&& !fieldsToAdd.contains(fieldInfo.name)
+&& fieldsToAdd.size()
+== doc.getFields().stream()
+.map(IndexableField::name)
+.collect(Collectors.toSet())
+.size()) {
+  return Status.STOP;

Review Comment:
   removed this in 
https://github.com/apache/lucene/pull/1003/commits/4b9086fc1bbb31f0ca36986f3adaa770665215e1
 found other way to optimize



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JoeHF commented on pull request #1003: LUCENE-10616: optimizing decompress when only retrieving some fields

2022-07-21 Thread GitBox


JoeHF commented on PR #1003:
URL: https://github.com/apache/lucene/pull/1003#issuecomment-1191678569

   
https://github.com/apache/lucene/pull/1003/commits/4b9086fc1bbb31f0ca36986f3adaa770665215e1
 found alternatives that we can skip non needed compressed bytes by reading 
compressed length. This will significantly decrease decompression time when we 
only want several fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


nknize commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r926982395


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation, 

[GitHub] [lucene-jira-archive] mikemccand commented on issue #53: Remove "module" for core components?

2022-07-21 Thread GitBox


mikemccand commented on issue #53:
URL: 
https://github.com/apache/lucene-jira-archive/issues/53#issuecomment-1192045923

   Can this be closed now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #58: Errors setting assignee when running `import_github_issues.py`

2022-07-21 Thread GitBox


mikemccand opened a new issue, #58:
URL: https://github.com/apache/lucene-jira-archive/issues/58

   Is this expected?  Am I doing something wrong in running the tool?
   
   ```
   > python3 src/import_github_issues.py --min 8000 -max 9000
   [2022-07-21 19:38:46,024] WARNING:github_issues_util: Assignee ErickErickson 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:40:21,583] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:40:45,250] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:40:57,013] WARNING:github_issues_util: Assignee jpountz 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:42:46,487] WARNING:github_issues_util: Assignee uschindler 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:43:00,638] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:43:16,720] WARNING:github_issues_util: Assignee dsmiley 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:43:32,841] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:43:41,458] WARNING:github_issues_util: Assignee s1monw cannot 
be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:43:57,607] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:44:21,227] WARNING:github_issues_util: Assignee ErickErickson 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:44:29,831] WARNING:github_issues_util: Assignee dsmiley 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:44:38,428] WARNING:github_issues_util: Assignee dsmiley 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:45:13,826] WARNING:github_issues_util: Assignee s1monw cannot 
be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:45:44,846] WARNING:github_issues_util: Assignee jpountz 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:48:21,738] WARNING:github_issues_util: Assignee s1monw cannot 
be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:48:51,086] WARNING:github_issues_util: Assignee dsmiley 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:49:16,090] WARNING:github_issues_util: Assignee s1monw cannot 
be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:49:39,645] WARNING:github_issues_util: Assignee romseygeek 
cannot be assigned; status code=404, message={"message":"Not 
Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"}
   [2022-07-21 19:50:08,770] WARNING:github_issues_util: Assignee romsey

[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #59: Module label is sometimes missing?

2022-07-21 Thread GitBox


mikemccand opened a new issue, #59:
URL: https://github.com/apache/lucene-jira-archive/issues/59

   I am test importing all Jira issues from 8000 to 9000, and spot checking.
   
   I noticed [this 
issue](https://github.com/mikemccand/stargazers-migration-test/issues/161), 
which in Jira is under `modules/highlighter` in Jira, but that label did not 
carry over to the GitHub issue for some reason?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #60: Invalid unicode character in conversion of comment

2022-07-21 Thread GitBox


mikemccand opened a new issue, #60:
URL: https://github.com/apache/lucene-jira-archive/issues/60

   Spot checking a few converted issues, I noticed the invalid Unicode 
character, I think (U+FFDD) in [this 
comment](https://github.com/mikemccand/stargazers-migration-test/issues/329#issuecomment-1192052095)
 but the [corresponding Jira issue 
comment](https://issues.apache.org/jira/browse/LUCENE-8329?focusedCommentId=16487052&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16487052)
 seems to have just a whitespace character.
   
   Not sure how widespread this issue is.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #61: Should we carry over Jira "labels"?

2022-07-21 Thread GitBox


mikemccand opened a new issue, #61:
URL: https://github.com/apache/lucene-jira-archive/issues/61

   Some Jira issues have labels, like [this 
one](https://issues.apache.org/jira/browse/LUCENE-8213) with `labels: 
performance`.  But when we don't seem to carry over the label to the [GitHub 
issue](https://github.com/mikemccand/stargazers-migration-test/issues/213) ... 
should we?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand closed issue #54: Hyperlinks are sometimes not actual links on import

2022-07-21 Thread GitBox


mikemccand closed issue #54: Hyperlinks are sometimes not actual links on import
URL: https://github.com/apache/lucene-jira-archive/issues/54


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand merged pull request #57: Enable hyperlinks to a commit in commitbots' comments

2022-07-21 Thread GitBox


mikemccand merged PR #57:
URL: https://github.com/apache/lucene-jira-archive/pull/57


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #62: Missing closing paren in conversion

2022-07-21 Thread GitBox


mikemccand opened a new issue, #62:
URL: https://github.com/apache/lucene-jira-archive/issues/62

   I noticed that [this 
comment](https://github.com/mikemccand/stargazers-migration-test/issues/213#issuecomment-1192043447)
 is missing the closing paren after the link to GitHub PR, but in the 
[corresponding Jira 
comment](https://issues.apache.org/jira/browse/LUCENE-8213?focusedCommentId=16942788&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942788)
 there was a closing `)`.
   
   Not sure how often this happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #63: Jira username mentions are not converted?

2022-07-21 Thread GitBox


mikemccand opened a new issue, #63:
URL: https://github.com/apache/lucene-jira-archive/issues/63

   I noticed [this 
comment](https://github.com/mikemccand/stargazers-migration-test/issues/213#issuecomment-1192043461)
 is calling Jira user `[~ben.manes]`.  Should we replace it with the 
presentation name of the user (Ben Manes), since the Jira username won't 
necessarily be so recognizable.
   
   Not sure how often this is happening!  In general for all these little 
issues I'm opening, they are minor and should not block migration.  Net/net the 
quality of migrated issues looks great overall!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta closed issue #53: Remove "module" for core components?

2022-07-21 Thread GitBox


mocobeta closed issue #53: Remove "module" for core components?
URL: https://github.com/apache/lucene-jira-archive/issues/53


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #53: Remove "module" for core components?

2022-07-21 Thread GitBox


mocobeta commented on issue #53:
URL: 
https://github.com/apache/lucene-jira-archive/issues/53#issuecomment-1192109634

   Yes, I think so. I'm closing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wuwm opened a new pull request, #1042: Cache decoded length bytes for TFIDFSimilarity scorer.

2022-07-21 Thread GitBox


wuwm opened a new pull request, #1042:
URL: https://github.com/apache/lucene/pull/1042

   ### Description
   
   When doing A/B testing between TF-IDF and BM25 similarity, we found scorer() 
method in TFIDFSimilarity is somewhat slower than that in BM25Similarity. After 
reading the code and profiling, we found [BM25Similarity caches decoded length 
bytes](https://github.com/apache/lucene/blob/8ac26737913d0c1555019e93bc6bf7db1ab9047e/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L122-L129)
 while [TFIDFSimilarity 
doesn't](https://github.com/apache/lucene/blob/8ac26737913d0c1555019e93bc6bf7db1ab9047e/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java#L468-L472).
   
   Btw, I corrected one comment typo in TermInSetQuery.
   
   ### Tests
   ```
   ./gradlew check
   
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #58: Errors setting assignee when running `import_github_issues.py`

2022-07-21 Thread GitBox


mocobeta commented on issue #58:
URL: 
https://github.com/apache/lucene-jira-archive/issues/58#issuecomment-1192171388

   You cannot assign accounts that have no push access to the repository.
   This is the reason I invited you to my test repository in #8. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #59: Module label is sometimes missing?

2022-07-21 Thread GitBox


mocobeta commented on issue #59:
URL: 
https://github.com/apache/lucene-jira-archive/issues/59#issuecomment-1192173312

   This is a bug (typo) in the label mapping; I'll fix this.
   
https://github.com/apache/lucene-jira-archive/blob/75e70ce3abad1b070a44a0b75e0df96afd3eae65/migration/src/common.py#L193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?

2022-07-21 Thread GitBox


mocobeta commented on issue #61:
URL: 
https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1192175427

   It was an intentional omission by me. Personally, I don't think we should 
bloat issue labels in GitHub... should we port all Jira "Labels" to GitHub 
labels?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #63: Jira username mentions are not converted?

2022-07-21 Thread GitBox


mocobeta commented on issue #63:
URL: 
https://github.com/apache/lucene-jira-archive/issues/63#issuecomment-1192179474

   I recognize this issue. I think It'd be great if we can handle `[~user]` as 
well as `@user`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


iverase commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r927333254


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation,

[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-07-21 Thread GitBox


iverase commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r927333254


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,896 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.document.SpatialQuery.EncodedRectangle;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.store.ByteArrayDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+
+/** A doc values field representation for {@link LatLonShape} and {@link 
XYShape} */
+public final class ShapeDocValuesField extends Field {
+  private final ShapeComparator shapeComparator;
+
+  private static final FieldType FIELD_TYPE = new FieldType();
+
+  static {
+FIELD_TYPE.setDocValuesType(DocValuesType.BINARY);
+FIELD_TYPE.setOmitNorms(true);
+FIELD_TYPE.freeze();
+  }
+
+  /**
+   * Creates a {@ShapeDocValueField} instance from a shape tessellation
+   *
+   * @param name The Field Name (must not be null)
+   * @param tessellation The tessellation (must not be null)
+   */
+  ShapeDocValuesField(String name, List 
tessellation) {
+super(name, FIELD_TYPE);
+BytesRef b = computeBinaryValue(tessellation);
+this.fieldsData = b;
+try {
+  this.shapeComparator = new ShapeComparator(b);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** Creates a {@code ShapeDocValue} field from a given serialized value */
+  ShapeDocValuesField(String name, BytesRef binaryValue) {
+super(name, FIELD_TYPE);
+this.fieldsData = binaryValue;
+try {
+  this.shapeComparator = new ShapeComparator(binaryValue);
+} catch (IOException e) {
+  throw new IllegalArgumentException("unable to read binary shape doc 
value field. ", e);
+}
+  }
+
+  /** The name of the field */
+  @Override
+  public String name() {
+return name;
+  }
+
+  /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */
+  @Override
+  public IndexableFieldType fieldType() {
+return FIELD_TYPE;
+  }
+
+  /** Currently there is no string representation for the ShapeDocValueField */
+  @Override
+  public String stringValue() {
+return null;
+  }
+
+  /** TokenStreams are not yet supported */
+  @Override
+  public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) {
+return null;
+  }
+
+  /** create a shape docvalue field from indexable fields */
+  public static ShapeDocValuesField createDocValueField(String fieldName, 
Field[] indexableFields) {
+ArrayList tess = new 
ArrayList<>(indexableFields.length);
+final byte[] scratch = new byte[7 * Integer.BYTES];
+for (Field f : indexableFields) {
+  BytesRef br = f.binaryValue();
+  assert br.length == 7 * ShapeField.BYTES;
+  System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES);
+  ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle();
+  ShapeField.decodeTriangle(scratch, t);
+  tess.add(t);
+}
+return new ShapeDocValuesField(fieldName, tess);
+  }
+
+  /** Returns the number of terms (tessellated triangles) for this shape */
+  public int numberOfTerms() {
+return shapeComparator.numberOfTerms();
+  }
+
+  /** Creates a geometry query for shape docvalues */
+  public static Query newGeometryQuery(
+  final String field, final QueryRelation relation, Object... geometries) {
+return null;
+// TODO
+//  return new ShapeDocValuesQuery(field, relation,