[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569289#comment-17569289 ] Nathan Meisels commented on LUCENE-10650: - Hi [~jpountz]! Appreciate your help until now! Another question. I did a reindex and I get different scores. query is: {code:java} { "query": { "term": { "sessionIds": "1234-1234" } } }{code} New index explain: {code:java} { "_index": "entities-new", "_type": "entity", "_id": "AWByRrSPIGshPfnDk4hN", "matched": true, "explanation": { "value": 22.941677, "description": "weight(sessionIds:1234-1234 in 1400) [PerFieldSimilarity], result of:", "details": [ { "value": 22.941677, "description": "score from ScriptedSimilarity(weightScript=[Script{type=inline, lang='painless', idOrCode='return query.boost * Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);', options={}, params={}}], script=[Script{type=inline, lang='painless', idOrCode='return weight;', options={}, params={}}]) computed from:", "details": [ { "value": 22.941677, "description": "weight", "details": [] }, { "value": 1.0, "description": "query.boost", "details": [] }, { "value": 12084378, "description": "field.docCount", "details": [] }, { "value": 4.730932E+7, "description": "field.sumDocFreq", "details": [] }, { "value": -1.0, "description": "field.sumTotalTermFreq", "details": [] }, { "value": 1.0, "description": "term.docFreq", "details": [] }, { "value": -1.0, "description": "term.totalTermFreq", "details": [] }, { "value": 1.0, "description": "doc.freq", "details": [] }, { "value": 1.0, "description": "doc.length", "details": [] } ] } ] } }{code} Old index explain: {code:java} { "_index" : "entities-old", "_type" : "entity", "_id" : "AWByRrSPIGshPfnDk4hN", "matched" : true, "explanation" : { "value" : 21.23644, "description" : "weight(sessionIds:1234-1234 in 527154) [PerFieldSimilarity], result of:", "details" : [ { "value" : 21.23644, "description" : "score(DFRSimilarity, doc=527154, freq=1.0), computed from:", "details" : [ { "value" : 1.0, "description" : "no normalization", "details" : [ ] }, { "value" : 21.23644, "description" : "BasicModelIn, computed from: ", "details" : [ { "value" : 1.605901E7, "description" : "numberOfDocuments", "details" : [ ] }, { "value" : 6.0, "description" : "docFreq", "details" : [ ] } ] }, { "value" : 1.0, "description" : "no aftereffect", "details" : [ ] } ] } ] } }{code} Does this make sense? I need the scores to stay the same. Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > /
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926470931 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926470931 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926482543 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] mikemccand merged pull request #963: LUCENE-10583: Add docstring warning to not lock on Lucene objects
mikemccand merged PR #963: URL: https://github.com/apache/lucene/pull/963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges
[ https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569344#comment-17569344 ] ASF subversion and git services commented on LUCENE-10583: -- Commit 25a842d87198af7b930d890a93b63093d9ca93c3 in lucene's branch refs/heads/main from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=25a842d8719 ] LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963) * add locking warning to docstring * git tidy > Deadlock with MMapDirectory while waitForMerges > --- > > Key: LUCENE-10583 > URL: https://issues.apache.org/jira/browse/LUCENE-10583 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 8.11.1 > Environment: Java 17 > OS: Windows 2016 >Reporter: Thomas Hoffmann >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Hello, > a deadlock situation happened in our application. We are using MMapDirectory > on Windows 2016 and got the following stacktrace: > {code:java} > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait() > [0x413fc000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@17.0.2/Native Method) > - waiting on > at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at > org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278) > at > com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723) > - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory) > at > com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142) > ...{code} > All threads were waiting to lock <0x0006d5c00208> which got never > released. > A lucene thread was also blocked, I dont know if this is relevant: > {code:java} > "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms > elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry > [0x5da9e000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346) > - waiting to lock <0x0006d5c00208> (a > org.apache.lucene.store.MMapDirectory) > at > org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363) > at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130) > at > org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) > at > org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code} > If looks like the merge operation never finished and released the lock. > Is there any option to prevent this deadlock or how to investigate it further? > A load-test didn't show this problem unfortunately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issue
[GitHub] [lucene] mikemccand commented on pull request #963: LUCENE-10583: Add docstring warning to not lock on Lucene objects
mikemccand commented on PR #963: URL: https://github.com/apache/lucene/pull/963#issuecomment-1191325044 I backported to 9.x as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges
[ https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569345#comment-17569345 ] ASF subversion and git services commented on LUCENE-10583: -- Commit 1884a8730a315e1e51e6ad0b43774e6714a3b9d1 in lucene's branch refs/heads/branch_9x from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1884a8730a3 ] LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963) * add locking warning to docstring * git tidy > Deadlock with MMapDirectory while waitForMerges > --- > > Key: LUCENE-10583 > URL: https://issues.apache.org/jira/browse/LUCENE-10583 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 8.11.1 > Environment: Java 17 > OS: Windows 2016 >Reporter: Thomas Hoffmann >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Hello, > a deadlock situation happened in our application. We are using MMapDirectory > on Windows 2016 and got the following stacktrace: > {code:java} > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait() > [0x413fc000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@17.0.2/Native Method) > - waiting on > at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at > org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278) > at > com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723) > - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory) > at > com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142) > ...{code} > All threads were waiting to lock <0x0006d5c00208> which got never > released. > A lucene thread was also blocked, I dont know if this is relevant: > {code:java} > "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms > elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry > [0x5da9e000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346) > - waiting to lock <0x0006d5c00208> (a > org.apache.lucene.store.MMapDirectory) > at > org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363) > at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130) > at > org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) > at > org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code} > If looks like the merge operation never finished and released the lock. > Is there any option to prevent this deadlock or how to investigate it further? > A load-test didn't show this problem unfortunately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail:
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #57: Enable hyperlinks to a commit in commitbots' comments
mocobeta opened a new pull request, #57: URL: https://github.com/apache/lucene-jira-archive/pull/57 Close #11 Removes `[` and `]` if and only if it contains a URL-like string. We perhaps could apply it to all comments though, I applied it only to jira-bot's comments not to accidentally break comments by humans. An imported issue for testing: https://github.com/mocobeta/migration-test-3/issues/442 Screenshot  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #57: Enable hyperlinks to a commit in commitbots' comments
mocobeta commented on PR #57: URL: https://github.com/apache/lucene-jira-archive/pull/57#issuecomment-1191402838 Fortunately, this works also for old issues (in 2013). https://github.com/mocobeta/migration-test-3/issues/445  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on a diff in pull request #57: Enable hyperlinks to a commit in commitbots' comments
mikemccand commented on code in PR #57: URL: https://github.com/apache/lucene-jira-archive/pull/57#discussion_r926614459 ## migration/src/jira2github_import.py: ## @@ -123,6 +123,17 @@ def comment_author(author_name, author_dispname): author_gh = account_map.get(author_name) return f"{author_dispname} (@{author_gh})" if author_gh else author_dispname +def enable_hyperlink_to_commit(comment_body: str): +lines = [] +for line in comment_body.split("\n"): +# remove '[' and ']' iff it contains a URL (i.e. link to a commit in ASF GitBox repo). +m = re.match(r"^\[\s?(https?://\S+)\s?\]$", line.strip()) Review Comment: Maybe `\s*` instead of `\s?` after the opening `[` and before the closing `]` for better robustness? Or are we sure it's always exactly 0 or 1 space? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650 ] Nathan Meisels deleted comment on LUCENE-10650: - was (Author: JIRAUSER292626): Hi [~jpountz]! Appreciate your help until now! Another question. I did a reindex and I get different scores. query is: {code:java} { "query": { "term": { "sessionIds": "1234-1234" } } }{code} New index explain: {code:java} { "_index": "entities-new", "_type": "entity", "_id": "AWByRrSPIGshPfnDk4hN", "matched": true, "explanation": { "value": 22.941677, "description": "weight(sessionIds:1234-1234 in 1400) [PerFieldSimilarity], result of:", "details": [ { "value": 22.941677, "description": "score from ScriptedSimilarity(weightScript=[Script{type=inline, lang='painless', idOrCode='return query.boost * Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);', options={}, params={}}], script=[Script{type=inline, lang='painless', idOrCode='return weight;', options={}, params={}}]) computed from:", "details": [ { "value": 22.941677, "description": "weight", "details": [] }, { "value": 1.0, "description": "query.boost", "details": [] }, { "value": 12084378, "description": "field.docCount", "details": [] }, { "value": 4.730932E+7, "description": "field.sumDocFreq", "details": [] }, { "value": -1.0, "description": "field.sumTotalTermFreq", "details": [] }, { "value": 1.0, "description": "term.docFreq", "details": [] }, { "value": -1.0, "description": "term.totalTermFreq", "details": [] }, { "value": 1.0, "description": "doc.freq", "details": [] }, { "value": 1.0, "description": "doc.length", "details": [] } ] } ] } }{code} Old index explain: {code:java} { "_index" : "entities-old", "_type" : "entity", "_id" : "AWByRrSPIGshPfnDk4hN", "matched" : true, "explanation" : { "value" : 21.23644, "description" : "weight(sessionIds:1234-1234 in 527154) [PerFieldSimilarity], result of:", "details" : [ { "value" : 21.23644, "description" : "score(DFRSimilarity, doc=527154, freq=1.0), computed from:", "details" : [ { "value" : 1.0, "description" : "no normalization", "details" : [ ] }, { "value" : 21.23644, "description" : "BasicModelIn, computed from: ", "details" : [ { "value" : 1.605901E7, "description" : "numberOfDocuments", "details" : [ ] }, { "value" : 6.0, "description" : "docFreq", "details" : [ ] } ] }, { "value" : 1.0, "description" : "no aftereffect", "details" : [ ] } ] } ] } }{code} Does this make sense? I need the scores to stay the same. Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guarante
[jira] [Resolved] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels resolved LUCENE-10650. - Resolution: Done > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-10592 Build HNSW Graph on indexing
mayya-sharipova commented on PR #992: URL: https://github.com/apache/lucene/pull/992#issuecomment-1191493127 @jtibshirani Thanks for the review. > It's a bit confusing that the baseline slows down so much from 533s to 654s, which is almost 2 minutes slower. Do you have a sense for why this is? I wonder if graph building time can vary a lot based on what order the vectors are processed. I did not do the detailed analysis and can only speculate that this could be the reason, but also that `SortingVectorValues` can contribute to slowdown as they need to do extra lookups. > I just realized that we're doing a cast which is pretty tricky/ fragile. The check visited.length() < capacity is only true if we are building the graph (not searching), and HnswGraphBuilder happens to always use FixedBitSet. As a follow-up maybe we should consider [LUCENE-10404](https://issues.apache.org/jira/browse/LUCENE-10404) or something similar, which chooses a better 'visited' data structure and doesn't require us to do this cast + resize. Good point, I agree about the fragile solution and +1 for investigate better data structure for `visited`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-10592 Build HNSW Graph on indexing
mayya-sharipova commented on PR #992: URL: https://github.com/apache/lucene/pull/992#issuecomment-1191496667 @jpountz @jtibshirani Thanks for your review. It looks like we are removing Lucene93Hnsw* codecs in the `main` and `branch_9_3` branches. So once this removal is done, my plan for this PR: - Introduce Lucene94Hnsw* codes - Refactor this PR to use Lucene94Hnsw* codes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10404) Use hash set for visited nodes in HNSW search?
[ https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569420#comment-17569420 ] Michael Sokolov commented on LUCENE-10404: -- I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does seem to be a small speedup. I haven't had a chance to run luceneutil nor look at profiler output, but here are some numbers from KnnGraphTester for an internal dataset. The numbers can be a bit noisy, but are consistently better for the hash map version. h3. IntIntHashMap {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.37 1 0 16 32 100 1566}} {{0.965 0.49 1 50 16 32 150 0}} {{0.962 0.41 1 0 16 64 100 2655}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.38 1 0 32 32 100 1473}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.45 1 0 32 64 100 2611}} {{0.985 0.59 1 50 32 64 150 0}} {{0.907 0.52 10 0 16 32 100 19850}} {{0.940 0.72 10 50 16 32 150 0}} {{0.941 0.60 10 0 16 64 100 38614}} {{0.966 0.84 10 50 16 64 150 0}} {{0.916 0.55 10 0 32 32 100 19243}} {{0.949 0.75 10 50 32 32 150 0}} {{0.952 0.66 10 0 32 64 100 38205}} {{0.973 0.93 10 50 32 64 150 0}} {{0.859 0.66 100 0 16 32 100 273112}} {{{}0.897 0.92 100 50 16 32 150 0{}}}{{{}0.917 0.85 100 0 16 64 100 523325 0.946 1.06 100 50 16 64 150 0 {}}} h3. baseline {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.38 1 0 16 32 100 1614}} {{0.965 0.50 1 50 16 32 150 0}} {{0.962 0.45 1 0 16 64 100 2687}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.40 1 0 32 32 100 1504}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.44 1 0 32 64 100 2652}} {{0.985 0.58 1 50 32 64 150 0}} {{0.907 0.54 10 0 16 32 100 21449}} {{0.940 0.74 10 50 16 32 150 0}} {{0.941 0.64 10 0 16 64 100 39962}} {{0.966 0.88 10 50 16 64 150 0}} {{0.916 0.59 10 0 32 32 100 20554}} {{0.949 0.80 10 50 32 32 150 0}} {{0.952 0.72 10 0 32 64 100 40980}} {{0.973 1.04 10 50 32 64 150 0}} {{0.859 0.75 100 0 16 32 100 300514}} {{0.897 0.96 100 50 16 32 150 0}} {{0.917 0.84 100 0 16 64 100 563259}} {{0.946 1.12 100 50 16 64 150 0}} {{0.874 0.86 100 0 32 32 100 303186}} {{0.913 1.09 100 50 32 32 150 0}} {{0.929 1.04 100 0 32 64 100 580725}} {{0.958 1.38 100 50 32 64 150 0}} > Use hash set for visited nodes in HNSW search? > -- > > Key: LUCENE-10404 > URL: https://issues.apache.org/jira/browse/LUCENE-10404 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Minor > > While searching each layer, HNSW tracks the nodes it has already visited > using a BitSet. We could look into using something like IntHashSet instead. I > tried out the idea quickly by switching to IntIntHashMap (which has already > been copied from hppc) and saw an improvement in index performance. > *Baseline:* 760896 msec to write vectors > *Using IntIntHashMap:* 733017 msec to write vectors > I noticed search performance actually got a little bit worse with the change > -- that is something to look into. > For background, it's good to be aware that HNSW can visit a lot of nodes. For > example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search > visits ~1000 - 15,000 docs depending on the recall. This number can increase > when searching with deleted docs, especially if you hit a "pathological" case > where the deleted docs happen to be closest to the query vector. -- This message was sent by Atlassian Jira (v8.20.10#820010) --
[jira] [Comment Edited] (LUCENE-10404) Use hash set for visited nodes in HNSW search?
[ https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569420#comment-17569420 ] Michael Sokolov edited comment on LUCENE-10404 at 7/21/22 1:39 PM: --- I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does seem to be a small speedup. I haven't had a chance to run luceneutil nor look at profiler output, but here are some numbers from KnnGraphTester for an internal dataset. The numbers can be a bit noisy, but are consistently better for the hash map version. h3. IntIntHashMap recall latency nDoc fanout maxConn beamWidth visited index ms 0.935 0.37 1 0 16 32 100 1566 0.965 0.49 1 50 16 32 150 0 0.962 0.41 1 0 16 64 100 2655 0.982 0.57 1 50 16 64 150 0 0.941 0.38 1 0 32 32 100 1473 0.969 0.51 1 50 32 32 150 0 0.966 0.45 1 0 32 64 100 2611 0.985 0.59 1 50 32 64 150 0 0.907 0.52 10 0 16 32 100 19850 0.940 0.72 10 50 16 32 150 0 0.941 0.60 10 0 16 64 100 38614 0.966 0.84 10 50 16 64 150 0 0.916 0.55 10 0 32 32 100 19243 0.949 0.75 10 50 32 32 150 0 0.952 0.66 10 0 32 64 100 38205 0.973 0.93 10 50 32 64 150 0 0.859 0.66 100 0 16 32 100 273112 0.897 0.92 100 50 16 32 150 0 {{0.917 0.85 100 0 16 64 100 523325}} {{0.946 1.06 100 50 16 64 150 0}} more to come – pushed ctrl-enter instead of enter ... h3. baseline {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.38 1 0 16 32 100 1614}} {{0.965 0.50 1 50 16 32 150 0}} {{0.962 0.45 1 0 16 64 100 2687}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.40 1 0 32 32 100 1504}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.44 1 0 32 64 100 2652}} {{0.985 0.58 1 50 32 64 150 0}} {{0.907 0.54 10 0 16 32 100 21449}} {{0.940 0.74 10 50 16 32 150 0}} {{0.941 0.64 10 0 16 64 100 39962}} {{0.966 0.88 10 50 16 64 150 0}} {{0.916 0.59 10 0 32 32 100 20554}} {{0.949 0.80 10 50 32 32 150 0}} {{0.952 0.72 10 0 32 64 100 40980}} {{0.973 1.04 10 50 32 64 150 0}} {{0.859 0.75 100 0 16 32 100 300514}} {{0.897 0.96 100 50 16 32 150 0}} {{0.917 0.84 100 0 16 64 100 563259}} {{0.946 1.12 100 50 16 64 150 0}} {{0.874 0.86 100 0 32 32 100 303186}} {{0.913 1.09 100 50 32 32 150 0}} {{0.929 1.04 100 0 32 64 100 580725}} {{0.958 1.38 100 50 32 64 150 0}} was (Author: sokolov): I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does seem to be a small speedup. I haven't had a chance to run luceneutil nor look at profiler output, but here are some numbers from KnnGraphTester for an internal dataset. The numbers can be a bit noisy, but are consistently better for the hash map version. h3. IntIntHashMap {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.37 1 0 16 32 100 1566}} {{0.965 0.49 1 50 16 32 150 0}} {{0.962 0.41 1 0 16 64 100 2655}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.38 1 0 32 32 100 1473}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.45 1 0 32 64 100 2611}} {{0.985 0.59 1 50 32 64 150 0}} {{0.907 0.52 10 0 16 32 100 19850}} {{0.940 0.72 10 50 16 32 150 0}} {{0.941 0.60 10 0 16 64 100 38614}} {{0.966 0.84 10 50 16 64 150
[jira] (LUCENE-10404) Use hash set for visited nodes in HNSW search?
[ https://issues.apache.org/jira/browse/LUCENE-10404 ] Michael Sokolov deleted comment on LUCENE-10404: -- was (Author: sokolov): I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does seem to be a small speedup. I haven't had a chance to run luceneutil nor look at profiler output, but here are some numbers from KnnGraphTester for an internal dataset. The numbers can be a bit noisy, but are consistently better for the hash map version. h3. IntIntHashMap recall latency nDoc fanout maxConn beamWidth visited index ms 0.935 0.37 1 0 16 32 100 1566 0.965 0.49 1 50 16 32 150 0 0.962 0.41 1 0 16 64 100 2655 0.982 0.57 1 50 16 64 150 0 0.941 0.38 1 0 32 32 100 1473 0.969 0.51 1 50 32 32 150 0 0.966 0.45 1 0 32 64 100 2611 0.985 0.59 1 50 32 64 150 0 0.907 0.52 10 0 16 32 100 19850 0.940 0.72 10 50 16 32 150 0 0.941 0.60 10 0 16 64 100 38614 0.966 0.84 10 50 16 64 150 0 0.916 0.55 10 0 32 32 100 19243 0.949 0.75 10 50 32 32 150 0 0.952 0.66 10 0 32 64 100 38205 0.973 0.93 10 50 32 64 150 0 0.859 0.66 100 0 16 32 100 273112 0.897 0.92 100 50 16 32 150 0 {{0.917 0.85 100 0 16 64 100 523325}} {{0.946 1.06 100 50 16 64 150 0}} more to come – pushed ctrl-enter instead of enter ... h3. baseline {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.38 1 0 16 32 100 1614}} {{0.965 0.50 1 50 16 32 150 0}} {{0.962 0.45 1 0 16 64 100 2687}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.40 1 0 32 32 100 1504}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.44 1 0 32 64 100 2652}} {{0.985 0.58 1 50 32 64 150 0}} {{0.907 0.54 10 0 16 32 100 21449}} {{0.940 0.74 10 50 16 32 150 0}} {{0.941 0.64 10 0 16 64 100 39962}} {{0.966 0.88 10 50 16 64 150 0}} {{0.916 0.59 10 0 32 32 100 20554}} {{0.949 0.80 10 50 32 32 150 0}} {{0.952 0.72 10 0 32 64 100 40980}} {{0.973 1.04 10 50 32 64 150 0}} {{0.859 0.75 100 0 16 32 100 300514}} {{0.897 0.96 100 50 16 32 150 0}} {{0.917 0.84 100 0 16 64 100 563259}} {{0.946 1.12 100 50 16 64 150 0}} {{0.874 0.86 100 0 32 32 100 303186}} {{0.913 1.09 100 50 32 32 150 0}} {{0.929 1.04 100 0 32 64 100 580725}} {{0.958 1.38 100 50 32 64 150 0}} > Use hash set for visited nodes in HNSW search? > -- > > Key: LUCENE-10404 > URL: https://issues.apache.org/jira/browse/LUCENE-10404 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Minor > > While searching each layer, HNSW tracks the nodes it has already visited > using a BitSet. We could look into using something like IntHashSet instead. I > tried out the idea quickly by switching to IntIntHashMap (which has already > been copied from hppc) and saw an improvement in index performance. > *Baseline:* 760896 msec to write vectors > *Using IntIntHashMap:* 733017 msec to write vectors > I noticed search performance actually got a little bit worse with the change > -- that is something to look into. > For background, it's good to be aware that HNSW can visit a lot of nodes. For > example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search > visits ~1000 - 15,000 docs depending on the recall. This number can increase > when searching with deleted docs, especially if you hit a "pathological" case > where the deleted docs happen to be closest to the query vector. -- This message was sent by Atlassian Jira (v8.20.10#820010) ---
[jira] [Commented] (LUCENE-10404) Use hash set for visited nodes in HNSW search?
[ https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569429#comment-17569429 ] Michael Sokolov commented on LUCENE-10404: -- I tried using IntIntHashMap (mapping to 1 for visited nodes) and indeed does seem to be a small speedup. I haven't had a chance to run luceneutil nor look at profiler output, but here are some numbers from KnnGraphTester for an internal dataset. The numbers can be a bit noisy, but are consistently better for the hash map version. h3. IntIntHashMap {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.37 1 0 16 32 100 1566}} {{0.965 0.49 1 50 16 32 150 0}} {{0.962 0.41 1 0 16 64 100 2655}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.38 1 0 32 32 100 1473}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.45 1 0 32 64 100 2611}} {{0.985 0.59 1 50 32 64 150 0}} {{0.907 0.52 10 0 16 32 100 19850}} {{0.940 0.72 10 50 16 32 150 0}} {{0.941 0.60 10 0 16 64 100 38614}} {{0.966 0.84 10 50 16 64 150 0}} {{0.916 0.55 10 0 32 32 100 19243}} {{0.949 0.75 10 50 32 32 150 0}} {{0.952 0.66 10 0 32 64 100 38205}} {{0.973 0.93 10 50 32 64 150 0}} {{0.859 0.66 100 0 16 32 100 273112}} {{0.897 0.92 100 50 16 32 150 0}} {{0.917 0.85 100 0 16 64 100 523325}} {{0.946 1.06 100 50 16 64 150 0}} {{0.874 0.80 100 0 32 32 100 274816}} {{0.913 1.05 100 50 32 32 150 0}} {{0.929 0.98 100 0 32 64 100 564762}} h3. baseline {{recall latency nDoc fanout maxConn beamWidth visited index ms}} {{0.935 0.38 1 0 16 32 100 1614}} {{0.965 0.50 1 50 16 32 150 0}} {{0.962 0.45 1 0 16 64 100 2687}} {{0.982 0.57 1 50 16 64 150 0}} {{0.941 0.40 1 0 32 32 100 1504}} {{0.969 0.51 1 50 32 32 150 0}} {{0.966 0.44 1 0 32 64 100 2652}} {{0.985 0.58 1 50 32 64 150 0}} {{0.907 0.54 10 0 16 32 100 21449}} {{0.940 0.74 10 50 16 32 150 0}} {{0.941 0.64 10 0 16 64 100 39962}} {{0.966 0.88 10 50 16 64 150 0}} {{0.916 0.59 10 0 32 32 100 20554}} {{0.949 0.80 10 50 32 32 150 0}} {{0.952 0.72 10 0 32 64 100 40980}} {{0.973 1.04 10 50 32 64 150 0}} {{0.859 0.75 100 0 16 32 100 300514}} {{0.897 0.96 100 50 16 32 150 0}} {{0.917 0.84 100 0 16 64 100 563259}} {{0.946 1.12 100 50 16 64 150 0}} {{0.874 0.86 100 0 32 32 100 303186}} {{0.913 1.09 100 50 32 32 150 0}} {{0.929 1.04 100 0 32 64 100 580725}} {{0.958 1.38 100 50 32 64 150 0}} > Use hash set for visited nodes in HNSW search? > -- > > Key: LUCENE-10404 > URL: https://issues.apache.org/jira/browse/LUCENE-10404 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Minor > > While searching each layer, HNSW tracks the nodes it has already visited > using a BitSet. We could look into using something like IntHashSet instead. I > tried out the idea quickly by switching to IntIntHashMap (which has already > been copied from hppc) and saw an improvement in index performance. > *Baseline:* 760896 msec to write vectors > *Using IntIntHashMap:* 733017 msec to write vectors > I noticed search performance actually got a little bit worse with the change > -- that is something to look into. > For background, it's good to be aware that HNSW can visit a lot of nodes. For > example, on the glove-100-angular dataset with ~1.2 million docs, HNSW search > visits ~1000 - 15,000 docs depending on the recall. This number can increase > when searching with deleted docs, especially if you hit a "pathological" case
[jira] [Resolved] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?
[ https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-10655. -- Resolution: Fixed > can we optimize visited bitset usage in HNSW graph search/indexing? > --- > > Key: LUCENE-10655 > URL: https://issues.apache.org/jira/browse/LUCENE-10655 > Project: Lucene - Core > Issue Type: Improvement > Components: core/hnsw >Reporter: Michael Sokolov >Priority: Major > > When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates > the CPU profiler output. I had a few ideas: > # In upper graph layers, the occupied nodes are very sparse - maybe > {{SparseFixedBitSet}} would be a better fit for those > # We are caching these bitsets, but they are only used for a single search > (single document insert, during indexing). Should we cache across searches? > We would need to pool them though, and they would vary by field since fields > can have different numbers of vector nodes. This starts to get complex > # Are we sure that clearing a bitset is more efficient than allocating a new > one? Maybe the JDK maintains a pool of already-zeroed memory for us > I think we could try specializing the bitset type by graph level, and then I > think we ought to measure the performance of allocation vs the limited reuse > that we currently have. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?
[ https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569457#comment-17569457 ] Michael Sokolov commented on LUCENE-10655: -- Ah, I see - I hadn't followed your investigations there closely, [~julietibs] . Well at least we can confirm what you had found. I'll close this now - it doesn't seem fruitful, and I think the hash set idea has legs. > can we optimize visited bitset usage in HNSW graph search/indexing? > --- > > Key: LUCENE-10655 > URL: https://issues.apache.org/jira/browse/LUCENE-10655 > Project: Lucene - Core > Issue Type: Improvement > Components: core/hnsw >Reporter: Michael Sokolov >Priority: Major > > When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates > the CPU profiler output. I had a few ideas: > # In upper graph layers, the occupied nodes are very sparse - maybe > {{SparseFixedBitSet}} would be a better fit for those > # We are caching these bitsets, but they are only used for a single search > (single document insert, during indexing). Should we cache across searches? > We would need to pool them though, and they would vary by field since fields > can have different numbers of vector nodes. This starts to get complex > # Are we sure that clearing a bitset is more efficient than allocating a new > one? Maybe the JDK maintains a pool of already-zeroed memory for us > I think we could try specializing the bitset type by graph level, and then I > think we ought to measure the performance of allocation vs the limited reuse > that we currently have. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #1038: Fix TestDisiPriorityQueue test bug
gsmiller merged PR #1038: URL: https://github.com/apache/lucene/pull/1038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10659) Fix random TestDisiPriorityQueue bug
[ https://issues.apache.org/jira/browse/LUCENE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-10659. -- Fix Version/s: 9.3 Resolution: Fixed > Fix random TestDisiPriorityQueue bug > > > Key: LUCENE-10659 > URL: https://issues.apache.org/jira/browse/LUCENE-10659 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 9.3 >Reporter: Greg Miller >Priority: Blocker > Fix For: 9.3 > > > A recently added test ({{TestDisiPriorityQueue}}) has a bug that can randomly > trip (my fault). I fixed this on {{main}} and {{branch_9x}}, but I think we > should roll it into the 9.3 release. I'll prepare a PR, but raising it here > for visibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
nknize commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926815794 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
nknize commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926815794 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] JoeHF commented on a diff in pull request #1003: LUCENE-10616: optimizing decompress when only retrieving some fields
JoeHF commented on code in PR #1003: URL: https://github.com/apache/lucene/pull/1003#discussion_r926864645 ## lucene/core/src/java/org/apache/lucene/document/DocumentStoredFieldVisitor.java: ## @@ -98,6 +100,16 @@ public void doubleField(FieldInfo fieldInfo, double value) { @Override public Status needsField(FieldInfo fieldInfo) throws IOException { +// return stop after collected all needed fields +if (fieldsToAdd != null +&& !fieldsToAdd.contains(fieldInfo.name) +&& fieldsToAdd.size() +== doc.getFields().stream() +.map(IndexableField::name) +.collect(Collectors.toSet()) +.size()) { + return Status.STOP; Review Comment: removed this in https://github.com/apache/lucene/pull/1003/commits/4b9086fc1bbb31f0ca36986f3adaa770665215e1 found other way to optimize -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] JoeHF commented on pull request #1003: LUCENE-10616: optimizing decompress when only retrieving some fields
JoeHF commented on PR #1003: URL: https://github.com/apache/lucene/pull/1003#issuecomment-1191678569 https://github.com/apache/lucene/pull/1003/commits/4b9086fc1bbb31f0ca36986f3adaa770665215e1 found alternatives that we can skip non needed compressed bytes by reading compressed length. This will significantly decrease decompression time when we only want several fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
nknize commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r926982395 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene-jira-archive] mikemccand commented on issue #53: Remove "module" for core components?
mikemccand commented on issue #53: URL: https://github.com/apache/lucene-jira-archive/issues/53#issuecomment-1192045923 Can this be closed now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #58: Errors setting assignee when running `import_github_issues.py`
mikemccand opened a new issue, #58: URL: https://github.com/apache/lucene-jira-archive/issues/58 Is this expected? Am I doing something wrong in running the tool? ``` > python3 src/import_github_issues.py --min 8000 -max 9000 [2022-07-21 19:38:46,024] WARNING:github_issues_util: Assignee ErickErickson cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:40:21,583] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:40:45,250] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:40:57,013] WARNING:github_issues_util: Assignee jpountz cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:42:46,487] WARNING:github_issues_util: Assignee uschindler cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:43:00,638] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:43:16,720] WARNING:github_issues_util: Assignee dsmiley cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:43:32,841] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:43:41,458] WARNING:github_issues_util: Assignee s1monw cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:43:57,607] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:44:21,227] WARNING:github_issues_util: Assignee ErickErickson cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:44:29,831] WARNING:github_issues_util: Assignee dsmiley cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:44:38,428] WARNING:github_issues_util: Assignee dsmiley cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:45:13,826] WARNING:github_issues_util: Assignee s1monw cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:45:44,846] WARNING:github_issues_util: Assignee jpountz cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:48:21,738] WARNING:github_issues_util: Assignee s1monw cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:48:51,086] WARNING:github_issues_util: Assignee dsmiley cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:49:16,090] WARNING:github_issues_util: Assignee s1monw cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:49:39,645] WARNING:github_issues_util: Assignee romseygeek cannot be assigned; status code=404, message={"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#check-if-a-user-can-be-assigned"} [2022-07-21 19:50:08,770] WARNING:github_issues_util: Assignee romsey
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #59: Module label is sometimes missing?
mikemccand opened a new issue, #59: URL: https://github.com/apache/lucene-jira-archive/issues/59 I am test importing all Jira issues from 8000 to 9000, and spot checking. I noticed [this issue](https://github.com/mikemccand/stargazers-migration-test/issues/161), which in Jira is under `modules/highlighter` in Jira, but that label did not carry over to the GitHub issue for some reason? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #60: Invalid unicode character in conversion of comment
mikemccand opened a new issue, #60: URL: https://github.com/apache/lucene-jira-archive/issues/60 Spot checking a few converted issues, I noticed the invalid Unicode character, I think (U+FFDD) in [this comment](https://github.com/mikemccand/stargazers-migration-test/issues/329#issuecomment-1192052095) but the [corresponding Jira issue comment](https://issues.apache.org/jira/browse/LUCENE-8329?focusedCommentId=16487052&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16487052) seems to have just a whitespace character. Not sure how widespread this issue is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #61: Should we carry over Jira "labels"?
mikemccand opened a new issue, #61: URL: https://github.com/apache/lucene-jira-archive/issues/61 Some Jira issues have labels, like [this one](https://issues.apache.org/jira/browse/LUCENE-8213) with `labels: performance`. But when we don't seem to carry over the label to the [GitHub issue](https://github.com/mikemccand/stargazers-migration-test/issues/213) ... should we? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand closed issue #54: Hyperlinks are sometimes not actual links on import
mikemccand closed issue #54: Hyperlinks are sometimes not actual links on import URL: https://github.com/apache/lucene-jira-archive/issues/54 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand merged pull request #57: Enable hyperlinks to a commit in commitbots' comments
mikemccand merged PR #57: URL: https://github.com/apache/lucene-jira-archive/pull/57 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #62: Missing closing paren in conversion
mikemccand opened a new issue, #62: URL: https://github.com/apache/lucene-jira-archive/issues/62 I noticed that [this comment](https://github.com/mikemccand/stargazers-migration-test/issues/213#issuecomment-1192043447) is missing the closing paren after the link to GitHub PR, but in the [corresponding Jira comment](https://issues.apache.org/jira/browse/LUCENE-8213?focusedCommentId=16942788&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16942788) there was a closing `)`. Not sure how often this happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #63: Jira username mentions are not converted?
mikemccand opened a new issue, #63: URL: https://github.com/apache/lucene-jira-archive/issues/63 I noticed [this comment](https://github.com/mikemccand/stargazers-migration-test/issues/213#issuecomment-1192043461) is calling Jira user `[~ben.manes]`. Should we replace it with the presentation name of the user (Ben Manes), since the Jira username won't necessarily be so recognizable. Not sure how often this is happening! In general for all these little issues I'm opening, they are minor and should not block migration. Net/net the quality of migrated issues looks great overall! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta closed issue #53: Remove "module" for core components?
mocobeta closed issue #53: Remove "module" for core components? URL: https://github.com/apache/lucene-jira-archive/issues/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #53: Remove "module" for core components?
mocobeta commented on issue #53: URL: https://github.com/apache/lucene-jira-archive/issues/53#issuecomment-1192109634 Yes, I think so. I'm closing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wuwm opened a new pull request, #1042: Cache decoded length bytes for TFIDFSimilarity scorer.
wuwm opened a new pull request, #1042: URL: https://github.com/apache/lucene/pull/1042 ### Description When doing A/B testing between TF-IDF and BM25 similarity, we found scorer() method in TFIDFSimilarity is somewhat slower than that in BM25Similarity. After reading the code and profiling, we found [BM25Similarity caches decoded length bytes](https://github.com/apache/lucene/blob/8ac26737913d0c1555019e93bc6bf7db1ab9047e/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L122-L129) while [TFIDFSimilarity doesn't](https://github.com/apache/lucene/blob/8ac26737913d0c1555019e93bc6bf7db1ab9047e/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java#L468-L472). Btw, I corrected one comment typo in TermInSetQuery. ### Tests ``` ./gradlew check ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #58: Errors setting assignee when running `import_github_issues.py`
mocobeta commented on issue #58: URL: https://github.com/apache/lucene-jira-archive/issues/58#issuecomment-1192171388 You cannot assign accounts that have no push access to the repository. This is the reason I invited you to my test repository in #8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #59: Module label is sometimes missing?
mocobeta commented on issue #59: URL: https://github.com/apache/lucene-jira-archive/issues/59#issuecomment-1192173312 This is a bug (typo) in the label mapping; I'll fix this. https://github.com/apache/lucene-jira-archive/blob/75e70ce3abad1b070a44a0b75e0df96afd3eae65/migration/src/common.py#L193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1192175427 It was an intentional omission by me. Personally, I don't think we should bloat issue labels in GitHub... should we port all Jira "Labels" to GitHub labels? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #63: Jira username mentions are not converted?
mocobeta commented on issue #63: URL: https://github.com/apache/lucene-jira-archive/issues/63#issuecomment-1192179474 I recognize this issue. I think It'd be great if we can handle `[~user]` as well as `@user`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r927333254 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r927333254 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java: ## @@ -0,0 +1,896 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexableFieldType; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** A doc values field representation for {@link LatLonShape} and {@link XYShape} */ +public final class ShapeDocValuesField extends Field { + private final ShapeComparator shapeComparator; + + private static final FieldType FIELD_TYPE = new FieldType(); + + static { +FIELD_TYPE.setDocValuesType(DocValuesType.BINARY); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.freeze(); + } + + /** + * Creates a {@ShapeDocValueField} instance from a shape tessellation + * + * @param name The Field Name (must not be null) + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValuesField(String name, List tessellation) { +super(name, FIELD_TYPE); +BytesRef b = computeBinaryValue(tessellation); +this.fieldsData = b; +try { + this.shapeComparator = new ShapeComparator(b); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValue} field from a given serialized value */ + ShapeDocValuesField(String name, BytesRef binaryValue) { +super(name, FIELD_TYPE); +this.fieldsData = binaryValue; +try { + this.shapeComparator = new ShapeComparator(binaryValue); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** The name of the field */ + @Override + public String name() { +return name; + } + + /** Gets the {@code IndexableFieldType} for this ShapeDocValue field */ + @Override + public IndexableFieldType fieldType() { +return FIELD_TYPE; + } + + /** Currently there is no string representation for the ShapeDocValueField */ + @Override + public String stringValue() { +return null; + } + + /** TokenStreams are not yet supported */ + @Override + public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) { +return null; + } + + /** create a shape docvalue field from indexable fields */ + public static ShapeDocValuesField createDocValueField(String fieldName, Field[] indexableFields) { +ArrayList tess = new ArrayList<>(indexableFields.length); +final byte[] scratch = new byte[7 * Integer.BYTES]; +for (Field f : indexableFields) { + BytesRef br = f.binaryValue(); + assert br.length == 7 * ShapeField.BYTES; + System.arraycopy(br.bytes, br.offset, scratch, 0, 7 * ShapeField.BYTES); + ShapeField.DecodedTriangle t = new ShapeField.DecodedTriangle(); + ShapeField.decodeTriangle(scratch, t); + tess.add(t); +} +return new ShapeDocValuesField(fieldName, tess); + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** Creates a geometry query for shape docvalues */ + public static Query newGeometryQuery( + final String field, final QueryRelation relation, Object... geometries) { +return null; +// TODO +// return new ShapeDocValuesQuery(field, relation,