david-sitsky opened a new issue, #13611:
URL: https://github.com/apache/lucene/issues/13611

   ### Description
   
   I work on a program which supports using both KnnFloatVectorQuery and 
FloatVectorSimilarityQuery for querying an index.  Each item in an index can 
have multiple embeddings/vectors associated with it, so I use Lucene's 
parent/child documents, as you can see in the query code below.
   
   A user has found an unusual situation where using KnnFloatVectorQuery has 
missed all top-ranking results which FloatVectorSimilarityQuery found.  I do 
realise these queries work differently, and that HNSW is approximate, however I 
still found the results strange and worth reporting here in case this is seen 
as unexpected behaviour.
   
   I have a reproduction here in this rough test program.  FWIW in my index 
this runs against, all parent items in this instance only have one image 
embedding child document.
   ```
   package org.testme;
   
   import java.io.IOException;
   import java.nio.file.Paths;
   
   import org.apache.lucene.document.Document;
   import org.apache.lucene.index.DirectoryReader;
   import org.apache.lucene.search.FieldExistsQuery;
   import org.apache.lucene.search.FloatVectorSimilarityQuery;
   import org.apache.lucene.search.IndexSearcher;
   import org.apache.lucene.search.KnnFloatVectorQuery;
   import org.apache.lucene.search.Query;
   import org.apache.lucene.search.ScoreDoc;
   import org.apache.lucene.search.TopDocs;
   import org.apache.lucene.search.join.BitSetProducer;
   import org.apache.lucene.search.join.DiversifyingChildrenFloatKnnVectorQuery;
   import org.apache.lucene.search.join.QueryBitSetProducer;
   import org.apache.lucene.search.join.ScoreMode;
   import org.apache.lucene.search.join.ToParentBlockJoinQuery;
   import org.apache.lucene.store.Directory;
   import org.apache.lucene.store.FSDirectory;
   import org.junit.jupiter.api.Test;
   
   public class TestLuceneVectorSearch
   {
       @Test
       public void testMe() throws Exception
       {
           try (Directory directory = 
FSDirectory.open(Paths.get("/path/to/TextIndex"));
                DirectoryReader directoryReader = 
DirectoryReader.open(directory))
           {
               IndexSearcher indexSearcher = new IndexSearcher(directoryReader);
   
               // Matches all "parent" documents in the index, which have the 
"store-item-id" field.
               // A parent document can have multiple image embeddings, by 
creating multiple child documents which use
               // the "image-embeddings" field.
               BitSetProducer parentDocsFilter = new QueryBitSetProducer(new 
FieldExistsQuery("store-item-id"));
   
               // Embeddings which represent the image search query "sports 
car".
               float[] queryVector = new float[] {
                   0.0044572074f, -0.023088364f, -0.0036703937f, -0.009964277f,
                   -0.00916677f, 0.024540152f, 0.0070927753f, -0.09274965f, 
-0.005670112f,
                   -1.6179009E-4f, 0.015744649f, -0.020366224f, 0.0016601892f, 
-0.010853463f,
                   0.033737496f, 0.0045916773f, 0.058148578f, 9.6350856E-4f, 
-0.014317f,
                   0.016393812f, 0.020966044f, 0.018872999f, -7.3815876E-4f, 
0.014115239f,
                   -0.031016221f, 0.03167156f, -0.005343557f, 6.839247E-4f, 
-0.0042909216f,
                   0.016312996f, -0.004735699f, -0.012280128f, 0.019533258f, 
0.017941408f,
                   -0.045601472f, -0.008958914f, -0.002675118f, -9.5679046E-4f, 
0.010099755f,
                   0.0017631804f, 0.01860111f, 0.037259832f, -0.012140477f, 
-0.0030949886f,
                   -0.0058084503f, 0.0124143725f, 2.2733907E-4f, -0.012101588f,
                   0.0108692385f, -0.018079244f, -0.011381395f, -0.017509837f, 
0.0012023754f,
                   -0.014764546f, -0.022564514f, -0.03368696f, 0.029053459f, 
-0.001420365f,
                   -0.050087184f, -0.018782789f, 0.024201525f, -0.027481187f, 
-0.0052858875f,
                   0.010976288f, 1.2510385E-4f, 0.014984402f, 0.034309097f, 
0.017706916f,
                   0.018962046f, -0.008943878f, 0.0020767413f, -0.0373546f, 
-0.005417921f,
                   0.025909677f, 0.0024755383f, -0.0110565685f, 0.019262856f, 
0.01675921f,
                   -0.009826738f, -0.020427415f, -0.01732683f, 0.002310435f, 
-0.004211249f,
                   0.023289248f, 0.018740546f, -0.006772653f, -0.006745499f, 
-0.024986906f,
                   0.008061354f, -0.015194572f, 0.011593046f, -0.05354502f, 
-0.13884935f,
                   0.033540063f, -0.0027567758f, 0.023013994f, -0.014023355f, 
0.015025772f,
                   0.016264228f, -0.01661682f, 0.0035698174f, -0.016123693f, 
0.034136593f,
                   0.004460381f, -0.018264858f, -0.006348571f, -0.011179938f, 
-0.010155596f,
                   -5.2316306E-4f, -0.012665312f, 0.0061210217f, 0.016656024f, 
-0.004163066f,
                   -3.4165586E-4f, 0.016313404f, -0.015221417f, 0.008724262f, 
0.037221473f,
                   0.0038612096f, -0.016207112f, 0.022699108f, -0.019367028f, 
-0.019470742f,
                   -0.0063872794f, -0.0015510321f, -0.04432974f, -0.020032836f,
                   -0.011690585f, 0.03094845f, 0.0067724036f, 0.012485819f, 
-0.009259412f,
                   0.008886298f, 0.61861366f, -0.0045251283f, -0.0077076033f, 
-0.043705218f,
                   -0.025324458f, 0.021726586f, -0.026047882f, -0.009551234f, 
-0.0071868496f,
                   -0.0036969658f, 0.020132458f, -0.006516556f, -0.0070351847f, 
0.017480128f,
                   -0.0035662379f, -5.9353746E-4f, -0.02526074f, 0.022630077f, 
-0.011171131f,
                   -0.005936157f, 0.040870648f, -0.018079637f, 0.026608476f, 
0.009430965f,
                   -0.027296953f, -0.014650009f, 0.006681159f, -2.6793202E-4f, 
0.0054786704f,
                   -0.013636381f, 0.016031679f, -0.028956043f, -8.7672856E-4f, 
0.013015282f,
                   -0.013950296f, -0.012305141f, 0.007395428f, 0.0032757986f, 
0.011180955f,
                   0.018238183f, 0.012033082f, -0.036541812f, 0.0057558473f, 
-0.0071096867f,
                   -0.008386121f, 0.012468599f, 0.022702914f, -0.0073613483f, 
0.028406166f,
                   -0.016778922f, -0.017091695f, -0.033710238f, -0.016843721f, 
0.015285634f,
                   -0.019003538f, -0.00687855f, -7.775667E-4f, -0.024790084f, 
0.016236953f,
                   -0.006595245f, -0.015513008f, -0.03021261f, 0.0030078986f, 
-0.026664777f,
                   0.008451913f, 0.004026551f, -0.011371533f, -0.015816687f, 
-0.0026805112f,
                   0.017776044f, 0.017499488f, 0.0044229627f, 0.017531231f, 
-0.033204503f,
                   -0.038329072f, -0.011035979f, 0.008958172f, 0.07328921f, 
0.0038306648f,
                   0.03270265f, 0.015056664f, -0.006860551f, -0.004933787f, 
0.016191917f,
                   -0.006549873f, -0.015812844f, -0.0099520385f, -0.019040879f,
                   -0.037397895f, 0.015847206f, -0.0016991902f, 0.003470394f, 
-0.0069604022f,
                   0.0123413615f, -0.009023129f, -0.007122265f, -0.011230118f, 
-0.007362384f,
                   0.0020543125f, 0.0024772482f, -0.0076109925f, 0.03498191f, 
-0.011076619f,
                   -0.011154479f, -0.01450519f, -0.01843803f, -0.017011909f, 
0.0018331372f,
                   0.0151024535f, 0.016623776f, -0.027112132f, -0.030555645f, 
-0.011304468f,
                   0.0251135f, 0.006708286f, 0.00846858f, -0.010242636f, 
-0.00698456f,
                   0.019706938f, 0.013477113f, 0.048511542f, -0.005879136f, 
0.009369399f,
                   0.004999097f, -0.004784924f, 0.016561827f, 0.0036518855f, 
-0.005227837f,
                   0.0037853734f, -0.009837364f, 0.012072863f, 0.03813349f, 
0.0040256353f,
                   0.0013520177f, -0.01447286f, 0.008837758f, -0.0066623543f, 
0.0029706238f,
                   0.018294264f, -0.01446418f, -0.0021699388f, 9.294378E-4f, 
-0.009523726f,
                   0.005299897f, -0.012993116f, 0.025575459f, -0.016830947f, 
0.011483546f,
                   -0.0011682257f, 0.005689315f, -0.01871892f, -0.017454233f, 
-0.0015068237f,
                   0.04453382f, 0.0029374026f, 0.038485717f, 0.0019930135f, 
-0.004014516f,
                   -0.016176851f, 0.0055262805f, 0.008696258f, -0.021886224f, 
0.025037047f,
                   -0.038151f, 0.006943026f, 0.017139055f, 0.013372888f, 
0.023437364f,
                   -0.0054156454f, -0.0014752378f, 0.0046605296f, 
-0.0044771726f,
                   -0.011856738f, 0.0010809092f, 0.010216948f, -0.012713817f, 
-0.0031348357f,
                   0.009013894f, 0.0011253358f, 0.61798275f, 0.007944298f, 
0.0085330885f,
                   0.016979003f, 4.995474E-4f, -0.027207121f, 0.04165457f, 
0.0020099438f,
                   0.008510639f, 0.019254327f, 0.013971549f, 0.0073373774f, 
-0.0055961516f,
                   4.949079E-4f, 0.02810051f, 0.0060176505f, -0.008400483f, 
-0.19501963f,
                   0.016252134f, 0.012292523f, 0.0018070682f, 0.008999078f, 
0.022372805f,
                   -0.016504897f, -0.028100906f, 0.007098479f, -0.009990526f, 
-0.0017882327f,
                   0.0050334823f, -0.0068439087f, 0.0026650713f, -0.03168618f,
                   -0.0012727041f, 0.008549434f, -0.0067351903f, 4.8637684E-4f,
                   -0.007929317f, -0.004617511f, -0.03894391f, 0.013047643f, 
0.036115382f,
                   -0.0026169834f, -0.02540212f, -4.6752606E-4f, -8.121685E-4f, 
0.022683896f,
                   0.00134045f, 0.042805973f, -0.0041396986f, -0.008076729f, 
4.813038E-4f,
                   -0.026571859f, -0.002208052f, -0.030623492f, 0.0071517443f, 
0.0060770884f,
                   8.646011E-4f, 0.006398815f, -0.007452149f, 0.018887492f, 
-0.0148247555f,
                   0.016297784f, -0.015059465f, 0.015252803f, -0.0042130435f, 
-0.002824615f,
                   0.029199244f, 0.009138435f, -0.015550282f, 0.019079657f, 
-6.981265E-4f,
                   1.9067482E-4f, 0.01982623f, 0.0011727469f, 0.0057251197f, 
-0.0015611411f,
                   0.004203257f, -0.008882021f, -0.050709292f, 0.036732737f, 
-0.0016383937f,
                   -0.0052129203f, 5.78685E-4f, 0.01028424f, 0.0071797483f, 
-0.020324964f,
                   0.003225342f, 0.054530565f, 0.006593899f, 0.005106005f, 
-0.014254335f,
                   0.0025621254f, -0.037771065f, -0.010182639f, 0.004708179f, 
9.6000374E-5f,
                   0.014761056f, -0.012892494f, -0.0025439663f, 0.009076798f, 
0.0032978996f,
                   0.00796419f, 0.0025830409f, 0.0055782637f, -0.008025513f, 
-0.016867429f,
                   -0.0023941789f, -0.008508283f, -0.008827625f, -0.012730328f,
                   -0.006827924f, -0.03513044f, -0.019266263f, -0.011573588f, 
-0.0035062141f,
                   0.0052483953f, -3.5721017E-4f, -0.0021933548f, -0.015921012f,
                   -0.011550315f, -0.008281973f, -0.0033136331f, -0.015491238f,
                   -0.007224302f, -0.028960207f, 0.031132156f, -0.005436975f, 
0.00838252f,
                   -0.013607596f, -0.0048204553f, -0.010242622f, -0.030366635f,
                   -0.0072604655f, 7.622423E-4f, 0.0013710709f, -0.035052024f, 
-0.013582093f,
                   0.005741299f, 0.008179583f, 0.02272927f, -0.0040672733f, 
0.017910969f,
                   -0.006078158f, -0.04835871f, 0.025611773f, 0.02066559f, 
-0.0017394141f,
                   1.7129006E-4f, -0.00600073f, 0.011923645f, 0.02351016f, 
0.006471754f,
                   0.00868545f, 0.0075797923f, 0.023683062f, -0.015859105f, 
-0.0062999893f,
                   -0.0094027f, -0.018763369f, -0.02838345f, -0.004544819f, 
-0.03608459f,
                   0.016126886f, 0.005982367f, 0.0012092822f, 0.020421034f, 
0.027935015f,
                   0.011481908f, 0.029014295f, -0.06716036f, -0.011798545f, 
-0.0021892604f,
                   0.0022094583f, -0.007288418f, 0.002441089f, 0.015705219f, 
0.0016868426f,
                   -0.016558398f, -0.0013452561f, 0.014902193f, -0.023527546f, 
0.0833602f,
                   -0.010013801f, -0.012113727f, 0.022079771f, 0.0064695985f, 
-0.020935113f,
                   6.643729E-4f, -0.016690062f, -6.999961E-4f, -0.002155845f, 
0.0222167f,
                   -0.0024071531f, -0.011394607f, -0.0042578937f, -0.015400263f,
                   -0.006934272f, 0.025316682f, -0.03549049f, -0.0050169053f };
   
               // Perform vector similarity query using a threshold of 0.62.
               float resultSimilarity = 0.62f;
               float traversalSimilarity = resultSimilarity - 0.05f;
   
               System.out.println("Similarity query results: \n");
               Query similarityQuery =
                   new ToParentBlockJoinQuery(
                       new FloatVectorSimilarityQuery("image-embeddings",
                                                      queryVector,
                                                      traversalSimilarity,
                                                      resultSimilarity),
                       parentDocsFilter,
                       ScoreMode.Max);
               TopDocs topDocs = indexSearcher.search(similarityQuery, 5);
               printResults(indexSearcher, topDocs);
   
               // Perform "top k" vector search using 
DiversifyingChildrenFloatKnnVectorQuery.
               System.out.println();
               System.out.println("Top k query results using 
DiversifyingChildrenFloatKnnVectorQuery: \n");
               Query diversifyingChildrenFloatKnnVectorQuery = new 
DiversifyingChildrenFloatKnnVectorQuery(
                   "image-embeddings",
                   queryVector,
                   null,
                   5,
                   parentDocsFilter);
               Query rewrittenKnnQuery = 
indexSearcher.rewrite(diversifyingChildrenFloatKnnVectorQuery);
               Query finalQuery = new ToParentBlockJoinQuery(rewrittenKnnQuery, 
parentDocsFilter, ScoreMode.Max);
               topDocs = indexSearcher.search(finalQuery, 5);
               printResults(indexSearcher, topDocs);
   
               // Perform regular "top k" vector search.
               System.out.println();
               System.out.println("Top k query results: \n");
               Query knnQuery =
                   new ToParentBlockJoinQuery(
                       new KnnFloatVectorQuery("image-embeddings", queryVector, 
5),
                       parentDocsFilter,
                       ScoreMode.Max);
               topDocs = indexSearcher.search(knnQuery, 5);
               printResults(indexSearcher, topDocs);
           }
       }
   
       private static void printResults(IndexSearcher indexSearcher, TopDocs 
topDocs)
           throws IOException
       {
           for (ScoreDoc scoreDoc : topDocs.scoreDocs)
           {
               Document document = indexSearcher.doc(scoreDoc.doc);
               System.out.println("Name: " + document.get("name") +
                                  " store-item-id: " + 
document.get("store-item-id") +
                                  " score: " + scoreDoc.score);
           }
       }
   }
   ```
   The output from running this is as follows:
   
   ```
   Similarity query results: 
   
   Name: ferrari.02.jpg store-item-id: 13 score: 0.6431116
   Name: ferrari.02.jpg store-item-id: 23 score: 0.6431116
   Name: lambo.01.jpg store-item-id: 11 score: 0.63762814
   Name: lambo.01.jpg store-item-id: 21 score: 0.63762814
   Name: ferrari.01.jpg store-item-id: 5 score: 0.6363953
   
   Top k query results using DiversifyingChildrenFloatKnnVectorQuery: 
   
   Name:  store-item-id: 6093 score: 0.62397873
   Name:  store-item-id: 6095 score: 0.62397873
   Name:  store-item-id: 1368 score: 0.6209105
   Name:  store-item-id: 142 score: 0.6206993
   Name:  store-item-id: 5611 score: 0.62044996
   
   Top k query results: 
   
   Name:  store-item-id: 6093 score: 0.62397873
   Name:  store-item-id: 6095 score: 0.62397873
   Name:  store-item-id: 1368 score: 0.6209105
   Name:  store-item-id: 142 score: 0.6206993
   Name:  store-item-id: 5611 score: 0.62044996
   ```
   FloatVectorSimilarityQuery finds the best results, and the "top k" queries 
(with/without using DiversifyingChildrenFloatKnnVectorQuery) return the same 
results, but all missing the top results.
   
   If it helps, I can provide the index to run against this program (it is 37M 
compressed), but wanted to check first if what I am reporting is expected or 
not.
   
   
   ### Version and environment details
   
   This is using Lucene 9.11.1 on Linux.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to