david-sitsky opened a new issue, #13611: URL: https://github.com/apache/lucene/issues/13611
### Description I work on a program which supports using both KnnFloatVectorQuery and FloatVectorSimilarityQuery for querying an index. Each item in an index can have multiple embeddings/vectors associated with it, so I use Lucene's parent/child documents, as you can see in the query code below. A user has found an unusual situation where using KnnFloatVectorQuery has missed all top-ranking results which FloatVectorSimilarityQuery found. I do realise these queries work differently, and that HNSW is approximate, however I still found the results strange and worth reporting here in case this is seen as unexpected behaviour. I have a reproduction here in this rough test program. FWIW in my index this runs against, all parent items in this instance only have one image embedding child document. ``` package org.testme; import java.io.IOException; import java.nio.file.Paths; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.search.FieldExistsQuery; import org.apache.lucene.search.FloatVectorSimilarityQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.KnnFloatVectorQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.join.BitSetProducer; import org.apache.lucene.search.join.DiversifyingChildrenFloatKnnVectorQuery; import org.apache.lucene.search.join.QueryBitSetProducer; import org.apache.lucene.search.join.ScoreMode; import org.apache.lucene.search.join.ToParentBlockJoinQuery; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.junit.jupiter.api.Test; public class TestLuceneVectorSearch { @Test public void testMe() throws Exception { try (Directory directory = FSDirectory.open(Paths.get("/path/to/TextIndex")); DirectoryReader directoryReader = DirectoryReader.open(directory)) { IndexSearcher indexSearcher = new IndexSearcher(directoryReader); // Matches all "parent" documents in the index, which have the "store-item-id" field. // A parent document can have multiple image embeddings, by creating multiple child documents which use // the "image-embeddings" field. BitSetProducer parentDocsFilter = new QueryBitSetProducer(new FieldExistsQuery("store-item-id")); // Embeddings which represent the image search query "sports car". float[] queryVector = new float[] { 0.0044572074f, -0.023088364f, -0.0036703937f, -0.009964277f, -0.00916677f, 0.024540152f, 0.0070927753f, -0.09274965f, -0.005670112f, -1.6179009E-4f, 0.015744649f, -0.020366224f, 0.0016601892f, -0.010853463f, 0.033737496f, 0.0045916773f, 0.058148578f, 9.6350856E-4f, -0.014317f, 0.016393812f, 0.020966044f, 0.018872999f, -7.3815876E-4f, 0.014115239f, -0.031016221f, 0.03167156f, -0.005343557f, 6.839247E-4f, -0.0042909216f, 0.016312996f, -0.004735699f, -0.012280128f, 0.019533258f, 0.017941408f, -0.045601472f, -0.008958914f, -0.002675118f, -9.5679046E-4f, 0.010099755f, 0.0017631804f, 0.01860111f, 0.037259832f, -0.012140477f, -0.0030949886f, -0.0058084503f, 0.0124143725f, 2.2733907E-4f, -0.012101588f, 0.0108692385f, -0.018079244f, -0.011381395f, -0.017509837f, 0.0012023754f, -0.014764546f, -0.022564514f, -0.03368696f, 0.029053459f, -0.001420365f, -0.050087184f, -0.018782789f, 0.024201525f, -0.027481187f, -0.0052858875f, 0.010976288f, 1.2510385E-4f, 0.014984402f, 0.034309097f, 0.017706916f, 0.018962046f, -0.008943878f, 0.0020767413f, -0.0373546f, -0.005417921f, 0.025909677f, 0.0024755383f, -0.0110565685f, 0.019262856f, 0.01675921f, -0.009826738f, -0.020427415f, -0.01732683f, 0.002310435f, -0.004211249f, 0.023289248f, 0.018740546f, -0.006772653f, -0.006745499f, -0.024986906f, 0.008061354f, -0.015194572f, 0.011593046f, -0.05354502f, -0.13884935f, 0.033540063f, -0.0027567758f, 0.023013994f, -0.014023355f, 0.015025772f, 0.016264228f, -0.01661682f, 0.0035698174f, -0.016123693f, 0.034136593f, 0.004460381f, -0.018264858f, -0.006348571f, -0.011179938f, -0.010155596f, -5.2316306E-4f, -0.012665312f, 0.0061210217f, 0.016656024f, -0.004163066f, -3.4165586E-4f, 0.016313404f, -0.015221417f, 0.008724262f, 0.037221473f, 0.0038612096f, -0.016207112f, 0.022699108f, -0.019367028f, -0.019470742f, -0.0063872794f, -0.0015510321f, -0.04432974f, -0.020032836f, -0.011690585f, 0.03094845f, 0.0067724036f, 0.012485819f, -0.009259412f, 0.008886298f, 0.61861366f, -0.0045251283f, -0.0077076033f, -0.043705218f, -0.025324458f, 0.021726586f, -0.026047882f, -0.009551234f, -0.0071868496f, -0.0036969658f, 0.020132458f, -0.006516556f, -0.0070351847f, 0.017480128f, -0.0035662379f, -5.9353746E-4f, -0.02526074f, 0.022630077f, -0.011171131f, -0.005936157f, 0.040870648f, -0.018079637f, 0.026608476f, 0.009430965f, -0.027296953f, -0.014650009f, 0.006681159f, -2.6793202E-4f, 0.0054786704f, -0.013636381f, 0.016031679f, -0.028956043f, -8.7672856E-4f, 0.013015282f, -0.013950296f, -0.012305141f, 0.007395428f, 0.0032757986f, 0.011180955f, 0.018238183f, 0.012033082f, -0.036541812f, 0.0057558473f, -0.0071096867f, -0.008386121f, 0.012468599f, 0.022702914f, -0.0073613483f, 0.028406166f, -0.016778922f, -0.017091695f, -0.033710238f, -0.016843721f, 0.015285634f, -0.019003538f, -0.00687855f, -7.775667E-4f, -0.024790084f, 0.016236953f, -0.006595245f, -0.015513008f, -0.03021261f, 0.0030078986f, -0.026664777f, 0.008451913f, 0.004026551f, -0.011371533f, -0.015816687f, -0.0026805112f, 0.017776044f, 0.017499488f, 0.0044229627f, 0.017531231f, -0.033204503f, -0.038329072f, -0.011035979f, 0.008958172f, 0.07328921f, 0.0038306648f, 0.03270265f, 0.015056664f, -0.006860551f, -0.004933787f, 0.016191917f, -0.006549873f, -0.015812844f, -0.0099520385f, -0.019040879f, -0.037397895f, 0.015847206f, -0.0016991902f, 0.003470394f, -0.0069604022f, 0.0123413615f, -0.009023129f, -0.007122265f, -0.011230118f, -0.007362384f, 0.0020543125f, 0.0024772482f, -0.0076109925f, 0.03498191f, -0.011076619f, -0.011154479f, -0.01450519f, -0.01843803f, -0.017011909f, 0.0018331372f, 0.0151024535f, 0.016623776f, -0.027112132f, -0.030555645f, -0.011304468f, 0.0251135f, 0.006708286f, 0.00846858f, -0.010242636f, -0.00698456f, 0.019706938f, 0.013477113f, 0.048511542f, -0.005879136f, 0.009369399f, 0.004999097f, -0.004784924f, 0.016561827f, 0.0036518855f, -0.005227837f, 0.0037853734f, -0.009837364f, 0.012072863f, 0.03813349f, 0.0040256353f, 0.0013520177f, -0.01447286f, 0.008837758f, -0.0066623543f, 0.0029706238f, 0.018294264f, -0.01446418f, -0.0021699388f, 9.294378E-4f, -0.009523726f, 0.005299897f, -0.012993116f, 0.025575459f, -0.016830947f, 0.011483546f, -0.0011682257f, 0.005689315f, -0.01871892f, -0.017454233f, -0.0015068237f, 0.04453382f, 0.0029374026f, 0.038485717f, 0.0019930135f, -0.004014516f, -0.016176851f, 0.0055262805f, 0.008696258f, -0.021886224f, 0.025037047f, -0.038151f, 0.006943026f, 0.017139055f, 0.013372888f, 0.023437364f, -0.0054156454f, -0.0014752378f, 0.0046605296f, -0.0044771726f, -0.011856738f, 0.0010809092f, 0.010216948f, -0.012713817f, -0.0031348357f, 0.009013894f, 0.0011253358f, 0.61798275f, 0.007944298f, 0.0085330885f, 0.016979003f, 4.995474E-4f, -0.027207121f, 0.04165457f, 0.0020099438f, 0.008510639f, 0.019254327f, 0.013971549f, 0.0073373774f, -0.0055961516f, 4.949079E-4f, 0.02810051f, 0.0060176505f, -0.008400483f, -0.19501963f, 0.016252134f, 0.012292523f, 0.0018070682f, 0.008999078f, 0.022372805f, -0.016504897f, -0.028100906f, 0.007098479f, -0.009990526f, -0.0017882327f, 0.0050334823f, -0.0068439087f, 0.0026650713f, -0.03168618f, -0.0012727041f, 0.008549434f, -0.0067351903f, 4.8637684E-4f, -0.007929317f, -0.004617511f, -0.03894391f, 0.013047643f, 0.036115382f, -0.0026169834f, -0.02540212f, -4.6752606E-4f, -8.121685E-4f, 0.022683896f, 0.00134045f, 0.042805973f, -0.0041396986f, -0.008076729f, 4.813038E-4f, -0.026571859f, -0.002208052f, -0.030623492f, 0.0071517443f, 0.0060770884f, 8.646011E-4f, 0.006398815f, -0.007452149f, 0.018887492f, -0.0148247555f, 0.016297784f, -0.015059465f, 0.015252803f, -0.0042130435f, -0.002824615f, 0.029199244f, 0.009138435f, -0.015550282f, 0.019079657f, -6.981265E-4f, 1.9067482E-4f, 0.01982623f, 0.0011727469f, 0.0057251197f, -0.0015611411f, 0.004203257f, -0.008882021f, -0.050709292f, 0.036732737f, -0.0016383937f, -0.0052129203f, 5.78685E-4f, 0.01028424f, 0.0071797483f, -0.020324964f, 0.003225342f, 0.054530565f, 0.006593899f, 0.005106005f, -0.014254335f, 0.0025621254f, -0.037771065f, -0.010182639f, 0.004708179f, 9.6000374E-5f, 0.014761056f, -0.012892494f, -0.0025439663f, 0.009076798f, 0.0032978996f, 0.00796419f, 0.0025830409f, 0.0055782637f, -0.008025513f, -0.016867429f, -0.0023941789f, -0.008508283f, -0.008827625f, -0.012730328f, -0.006827924f, -0.03513044f, -0.019266263f, -0.011573588f, -0.0035062141f, 0.0052483953f, -3.5721017E-4f, -0.0021933548f, -0.015921012f, -0.011550315f, -0.008281973f, -0.0033136331f, -0.015491238f, -0.007224302f, -0.028960207f, 0.031132156f, -0.005436975f, 0.00838252f, -0.013607596f, -0.0048204553f, -0.010242622f, -0.030366635f, -0.0072604655f, 7.622423E-4f, 0.0013710709f, -0.035052024f, -0.013582093f, 0.005741299f, 0.008179583f, 0.02272927f, -0.0040672733f, 0.017910969f, -0.006078158f, -0.04835871f, 0.025611773f, 0.02066559f, -0.0017394141f, 1.7129006E-4f, -0.00600073f, 0.011923645f, 0.02351016f, 0.006471754f, 0.00868545f, 0.0075797923f, 0.023683062f, -0.015859105f, -0.0062999893f, -0.0094027f, -0.018763369f, -0.02838345f, -0.004544819f, -0.03608459f, 0.016126886f, 0.005982367f, 0.0012092822f, 0.020421034f, 0.027935015f, 0.011481908f, 0.029014295f, -0.06716036f, -0.011798545f, -0.0021892604f, 0.0022094583f, -0.007288418f, 0.002441089f, 0.015705219f, 0.0016868426f, -0.016558398f, -0.0013452561f, 0.014902193f, -0.023527546f, 0.0833602f, -0.010013801f, -0.012113727f, 0.022079771f, 0.0064695985f, -0.020935113f, 6.643729E-4f, -0.016690062f, -6.999961E-4f, -0.002155845f, 0.0222167f, -0.0024071531f, -0.011394607f, -0.0042578937f, -0.015400263f, -0.006934272f, 0.025316682f, -0.03549049f, -0.0050169053f }; // Perform vector similarity query using a threshold of 0.62. float resultSimilarity = 0.62f; float traversalSimilarity = resultSimilarity - 0.05f; System.out.println("Similarity query results: \n"); Query similarityQuery = new ToParentBlockJoinQuery( new FloatVectorSimilarityQuery("image-embeddings", queryVector, traversalSimilarity, resultSimilarity), parentDocsFilter, ScoreMode.Max); TopDocs topDocs = indexSearcher.search(similarityQuery, 5); printResults(indexSearcher, topDocs); // Perform "top k" vector search using DiversifyingChildrenFloatKnnVectorQuery. System.out.println(); System.out.println("Top k query results using DiversifyingChildrenFloatKnnVectorQuery: \n"); Query diversifyingChildrenFloatKnnVectorQuery = new DiversifyingChildrenFloatKnnVectorQuery( "image-embeddings", queryVector, null, 5, parentDocsFilter); Query rewrittenKnnQuery = indexSearcher.rewrite(diversifyingChildrenFloatKnnVectorQuery); Query finalQuery = new ToParentBlockJoinQuery(rewrittenKnnQuery, parentDocsFilter, ScoreMode.Max); topDocs = indexSearcher.search(finalQuery, 5); printResults(indexSearcher, topDocs); // Perform regular "top k" vector search. System.out.println(); System.out.println("Top k query results: \n"); Query knnQuery = new ToParentBlockJoinQuery( new KnnFloatVectorQuery("image-embeddings", queryVector, 5), parentDocsFilter, ScoreMode.Max); topDocs = indexSearcher.search(knnQuery, 5); printResults(indexSearcher, topDocs); } } private static void printResults(IndexSearcher indexSearcher, TopDocs topDocs) throws IOException { for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document document = indexSearcher.doc(scoreDoc.doc); System.out.println("Name: " + document.get("name") + " store-item-id: " + document.get("store-item-id") + " score: " + scoreDoc.score); } } } ``` The output from running this is as follows: ``` Similarity query results: Name: ferrari.02.jpg store-item-id: 13 score: 0.6431116 Name: ferrari.02.jpg store-item-id: 23 score: 0.6431116 Name: lambo.01.jpg store-item-id: 11 score: 0.63762814 Name: lambo.01.jpg store-item-id: 21 score: 0.63762814 Name: ferrari.01.jpg store-item-id: 5 score: 0.6363953 Top k query results using DiversifyingChildrenFloatKnnVectorQuery: Name: store-item-id: 6093 score: 0.62397873 Name: store-item-id: 6095 score: 0.62397873 Name: store-item-id: 1368 score: 0.6209105 Name: store-item-id: 142 score: 0.6206993 Name: store-item-id: 5611 score: 0.62044996 Top k query results: Name: store-item-id: 6093 score: 0.62397873 Name: store-item-id: 6095 score: 0.62397873 Name: store-item-id: 1368 score: 0.6209105 Name: store-item-id: 142 score: 0.6206993 Name: store-item-id: 5611 score: 0.62044996 ``` FloatVectorSimilarityQuery finds the best results, and the "top k" queries (with/without using DiversifyingChildrenFloatKnnVectorQuery) return the same results, but all missing the top results. If it helps, I can provide the index to run against this program (it is 37M compressed), but wanted to check first if what I am reporting is expected or not. ### Version and environment details This is using Lucene 9.11.1 on Linux. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org