david-sitsky commented on issue #13611:
URL: https://github.com/apache/lucene/issues/13611#issuecomment-3316715132
@benwtrent - apologies for waking up this old issue I previously reported.
I am happy to create a new one, but I thought all of the background context was
the same for the issue I am going to report. I've recently gone back to using
"top k" queries as well as FloatVectorSimilarityQuery and have found another
discrepancy where similarity queries are finding more relevant items which "top
k" are missing. This is using Lucene 9.12.2.
Here is the test source which reproduces it:
```java
package testme;
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.FieldExistsQuery;
import org.apache.lucene.search.FloatVectorSimilarityQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.KnnFloatVectorQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.join.BitSetProducer;
import org.apache.lucene.search.join.DiversifyingChildrenFloatKnnVectorQuery;
import org.apache.lucene.search.join.QueryBitSetProducer;
import org.apache.lucene.search.join.ScoreMode;
import org.apache.lucene.search.join.ToParentBlockJoinQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class Main
{
public static void main(String args[]) throws Exception
{
try (Directory directory =
FSDirectory.open(Paths.get("/path/to/TextIndex"));
DirectoryReader directoryReader =
DirectoryReader.open(directory))
{
IndexSearcher indexSearcher = new IndexSearcher(directoryReader);
// Matches all "parent" documents in the index, which have the
"store-item-id" field.
// A parent document can have multiple text embeddings, by
creating multiple child documents which use
// the "text-embeddings" field.
BitSetProducer parentDocsFilter = new QueryBitSetProducer(new
FieldExistsQuery("store-item-id"));
// Embeddings which represent the text search query "recounting
a foreign trip".
float[] queryVector = new float[]{
0.01027684f, 0.00628602f, -0.02823237f, -0.05088264f,
0.02140638f,
0.00190021f, -0.01387214f, 0.03947051f, 0.05346841f,
-0.01635768f,
0.03185865f, 0.05483901f, -0.04269207f, -0.00006817f,
-0.02283853f,
-0.01186552f, -0.0444189f, 0.03178197f, -0.01415246f,
-0.0197319f,
0.03628024f, -0.03238474f, -0.0223659f, -0.05228625f,
-0.03574177f,
-0.0259234f, -0.03735204f, -0.02635597f, -0.03167967f,
-0.04350322f,
0.00352342f, 0.01317282f, -0.00408797f, 0.00335579f,
-0.04690048f,
0.04221549f, 0.04383847f, 0.04812804f, -0.05127876f,
0.03940346f,
-0.01105701f, 0.02793047f, 0.01769191f, -0.01656313f,
-0.01801754f,
0.02300375f, 0.02924846f, -0.00549838f, -0.02706757f,
0.02918416f,
0.02152329f, 0.04873563f, -0.01788644f, -0.02927562f,
-0.06649011f,
0.01991419f, -0.02805632f, 0.0174007f, -0.06473303f,
0.00145654f,
-0.02835324f, -0.01369905f, -0.0100678f, -0.01015685f,
-0.03844858f,
0.0495478f, 0.03632185f, 0.02048692f, -0.03830176f,
0.01396586f,
-0.05888891f, 0.01742701f, -0.02744306f, -0.03928311f,
-0.03628024f,
-0.01255987f, 0.03196765f, 0.01364787f, 0.04790041f,
-0.00822623f,
0.09903441f, 0.04002756f, 0.01214318f, 0.00994572f,
0.05001545f,
0.03934466f, 0.03697965f, -0.00268431f, 0.02740248f,
0.04566777f,
-0.02895034f, 0.03853317f, 0.05431979f, -0.0310487f,
-0.00808215f,
-0.01828308f, 0.01317085f, 0.01710409f, -0.01124782f,
-0.01765452f,
0.0014975f, -0.04522351f, 0.03408887f, -0.02297048f,
-0.01038513f,
0.02204956f, 0.04183106f, 0.05121136f, -0.02423165f,
0.02165885f,
0.03074817f, 0.06137497f, 0.00549754f, -0.04322573f,
-0.05147338f,
0.02677693f, 0.04088272f, -0.00809619f, -0.02834069f,
0.01832873f,
0.03912735f, 0.02473006f, 0.0006513f, -0.01830964f,
0.04132526f,
0.00172348f, 0.02728678f, 0.04469157f, -0.00558199f,
0.02187385f,
0.02949431f, 0.02974051f, -0.0221185f, -0.02398089f,
-0.0392324f,
-0.01045734f, 0.00631069f, -0.02823822f, -0.00771661f,
-0.00134273f,
-0.05211776f, 0.01958258f, 0.04470911f, -0.03439662f,
-0.04499175f,
-0.03461806f, -0.03215057f, -0.0094563f, -0.01631495f,
-0.0161858f,
-0.01235264f, -0.02141257f, 0.00163138f, 0.02153086f,
0.04344648f,
-0.03202679f, 0.02854812f, 0.06429703f, 0.00741143f,
0.03731009f,
-0.02565606f, -0.03307141f, -0.04286881f, -0.04593047f,
0.02404408f,
-0.00518453f, 0.02937122f, 0.01359327f, 0.02717881f,
-0.03486374f,
-0.00886027f, -0.05559617f, 0.03279977f, -0.04488585f,
-0.00965267f,
0.03490277f, 0.05169413f, 0.01460244f, 0.05085994f,
-0.01863269f,
-0.00152995f, 0.03258486f, 0.04756927f, -0.01463953f,
-0.0364226f,
0.00775375f, 0.02201277f, 0.0086249f, 0.03305835f,
0.03560973f, 0.01925068f,
0.00034379f, 0.00080798f, 0.04065027f, -0.02702494f,
0.04331788f,
0.05248775f, 0.02112004f, -0.01921441f, -0.04337634f,
-0.0387161f,
-0.00393325f, -0.03022397f, 0.01982191f, 0.00864878f,
-0.03857065f,
-0.03298717f, -0.01877556f, 0.02891148f, -0.05808154f,
-0.03950593f,
0.01260633f, 0.00727577f, 0.00051243f, -0.01526981f,
-0.04950586f,
-0.02654199f, -0.00065191f, 0.03350776f, 0.05086751f,
0.03033641f,
0.02927322f, 0.0043207f, 0.02891011f, 0.00568328f,
0.03854589f, 0.03604557f,
0.01453362f, 0.0023853f, 0.00669792f, 0.01538687f,
0.05086063f,
-0.02950875f, -0.01252858f, -0.0218754f, -0.02375713f,
0.00463582f,
-0.01039772f, 0.02684218f, 0.05329924f, -0.02904047f,
0.02044789f,
-0.01769776f, -0.01550458f, -0.02138059f, -0.04099241f,
0.01849257f,
-0.01582675f, 0.03129369f, -0.03587519f, 0.00060391f,
0.04805754f,
-0.03528823f, -0.02065657f, 0.01118188f, 0.01126019f,
0.02747452f,
-0.02339866f, -0.0188672f, -0.03334426f, -0.02903974f,
-0.00860439f,
-0.02756668f, 0.04445535f, 0.03931165f, 0.00851445f,
-0.03409713f,
-0.00134982f, 0.0140662f, -0.01910592f, -0.02375386f,
-0.01754302f,
-0.02489356f, -0.01691201f, -0.02308679f, 0.01642533f,
-0.0201637f,
-0.02082647f, 0.00950929f, -0.0237365f, 0.0313659f,
-0.01777254f,
0.02077782f, -0.06618477f, -0.03615577f, -0.05745367f,
0.03453648f,
0.00878784f, 0.01984941f, -0.03072152f, 0.03971912f,
-0.02552391f,
0.0776282f, 0.02572534f, 0.00572648f, -0.04653222f,
-0.00505951f,
0.01255866f, -0.02663552f, 0.02471098f, -0.04086965f,
-0.03136438f,
0.03724338f, 0.04734887f, -0.01308618f, -0.02979467f,
-0.06534578f,
0.00415249f, -0.0358188f, -0.01326163f, -0.00125706f,
0.03324162f,
0.00064602f, -0.04253424f, -0.03455307f, -0.00149872f,
-0.01481743f,
0.06944794f, -0.01368267f, 0.02329961f, 0.03496742f,
-0.01011851f,
-0.02024626f, 0.03486735f, 0.02404562f, 0.00336852f,
0.01544827f,
-0.0033943f, -0.00586205f, -0.01395784f, 0.01800361f,
-0.01832529f,
0.00985097f, 0.04526959f, -0.01513167f, 0.01721954f,
0.00167489f,
0.00465945f, 0.02693089f, -0.02152123f, 0.05090259f,
-0.03363602f,
-0.05681892f, -0.01707978f, 0.05179591f, -0.03987832f,
0.00857893f,
-0.04021358f, 0.03467927f, 0.00605426f, 0.00997358f,
-0.02831997f,
0.03344243f, 0.0283651f, -0.0403047f, 0.06705678f,
0.01467989f,
-0.02581646f, 0.03345549f, -0.03773681f, 0.03287112f,
0.02869623f,
0.04872015f, -0.0257348f, -0.00882429f, -0.02138953f,
0.03128991f,
-0.01493073f, -0.01883518f, -0.06210256f, 0.02090143f,
0.04738428f,
-0.01565207f, 0.01702123f, 0.01980162f, -0.02266626f,
-0.02721019f,
-0.04012727f, 0.01758343f, -0.01642009f, -0.02785895f,
-0.03942169f,
0.01332481f, -0.02889687f, -0.01590852f, -0.0323531f,
0.15605412f,
0.02384292f, 0.05468153f, -0.02590174f, -0.02078607f,
0.02655575f,
-0.001859f, 0.04504471f, 0.02361752f, 0.02883136f,
-0.0141216f, 0.01921423f,
0.03275644f, -0.03427421f, 0.04020601f, 0.07054414f,
0.03708142f,
0.03716395f, 0.01807721f, -0.03209522f, 0.03699615f,
-0.02894961f,
0.03067067f, 0.00958268f, -0.02135583f, -0.01955405f,
-0.02002787f,
0.00296257f, -0.00199303f, 0.04691527f, -0.01339521f,
0.03348731f,
-0.01949215f, -0.02129291f, 0.00866053f, 0.04181971f,
0.01336142f,
-0.03757141f, 0.03741668f, -0.03884401f, -0.01581581f,
-0.03499836f,
-0.01320947f, 0.02842184f, -0.02385272f, -0.00306953f,
0.00514276f,
-0.02522538f, -0.02088458f, 0.02261863f, 0.03821408f,
0.05431223f,
-0.00022568f, -0.03365699f, 0.01186652f, -0.0440245f,
-0.01759116f,
-0.03060083f, 0.00321774f, 0.02350724f, 0.04970804f,
0.032622f, -0.0154008f,
-0.05176978f, 0.06078835f, 0.04342379f, -0.01879084f,
-0.04407986f,
-0.04018641f, -0.01856504f, -0.04731689f, -0.03529339f,
-0.04144801f,
0.02652755f, 0.02136649f, -0.014213f, -0.05022863f,
0.02043844f,
0.02124907f, -0.02047329f, 0.05679553f, -0.05172164f,
-0.02698281f,
0.0363167f, 0.05104288f, -0.01453448f, 0.04820197f,
0.03768042f,
-0.03150551f, 0.03995999f, 0.01110195f, 0.02118958f,
0.03683557f,
0.02500678f, -0.00406999f, -0.04070426f, -0.01392612f,
0.02338096f,
-0.06358732f, -0.04049313f, -0.05172198f, -0.01138423f,
-0.00932451f,
-0.00875079f, -0.02911977f, -0.01853739f, 0.04042092f,
0.01372583f,
0.01208322f, 0.00689814f, 0.01021423f, -0.01454862f,
0.01801281f,
-0.00994749f, 0.03018752f, 0.02478834f, 0.0390266f,
-0.02454713f,
-0.00310209f, 0.05166456f, -0.01144295f, 0.01971995f,
0.01922584f,
0.03020781f, -0.02225192f, 0.04873563f, -0.03418343f,
-0.00693985f,
0.02779224f, -0.00503884f, 0.01033054f, -0.00341245f,
-0.04211233f,
-0.00295577f, 0.01432112f, -0.00283895f, -0.01278072f,
-0.02725291f,
-0.02054194f, 0.03588447f, -0.0265824f, 0.07609461f,
-0.06353161f,
0.0251236f, 0.00368404f, 0.02793184f, -0.03767629f,
-0.05013167f,
-0.03427662f, -0.04577505f, 0.07156882f, -0.00503187f,
0.00063016f,
0.01426099f, -0.00942605f, -0.01763607f, 0.01063399f,
0.02168817f,
0.04786568f, -0.00997126f, -0.03153817f, 0.00063639f,
-0.00240898f,
0.08298542f, 0.02046783f, 0.04565746f, -0.03188134f,
0.02135687f,
0.0361121f, 0.02950463f, -0.0217284f, -0.01738829f,
0.02631058f, 0.03588f,
0.01258492f, -0.01613563f, -0.00756988f, 0.03388084f,
-0.0249668f,
0.00536928f, 0.02154798f, 0.05079324f, 0.05181036f,
-0.02417921f,
-0.04027651f, -0.00070837f, 0.00337591f, 0.00035914f,
-0.01727344f,
0.01840523f, -0.03169497f, 0.09204526f, -0.01332331f,
0.01516889f,
-0.04286468f, -0.01341989f, 0.03866487f, 0.01114607f,
-0.01805805f,
-0.05865647f, -0.02863915f, 0.0085304f, -0.0234805f,
0.01565807f,
-0.01658294f, -0.0074289f, 0.03082141f, 0.02436665f,
0.00779369f,
0.00639465f, -0.03202266f, -0.02417955f, 0.04700467f,
0.0186253f,
0.01145389f, 0.00882695f, 0.0411829f, 0.02592375f,
0.03207252f,
-0.06531002f, 0.0134137f, 0.02501872f, 0.05302278f,
-0.01685323f,
-0.03955991f, -0.0231516f, 0.02059437f, 0.04894228f,
-0.00503833f,
-0.04607283f, 0.00709518f, -0.00105615f, -0.02611183f,
-0.04404307f,
-0.00627989f, -0.03007783f, 0.04012727f, 0.04633003f,
0.0185273f,
-0.0281667f, 0.02289956f, -0.0394602f, 0.00642096f,
-0.03099041f,
-0.01432499f, 0.01858378f, -0.04810431f, -0.04512826f,
0.00484278f,
0.04436113f, 0.00715612f, -0.01726046f, 0.04310229f,
0.02808486f,
-0.01584697f, -0.03278155f, -0.01018345f, -0.01570137f,
-0.04975343f,
-0.04898698f, 0.04423666f, -0.01532724f, -0.0178059f,
-0.03535287f,
-0.01760028f, 0.01135857f, 0.03046054f, -0.03586384f,
-0.03169429f,
-0.00868819f, 0.01779967f, -0.00656741f, -0.02448318f,
-0.0329882f,
0.03185211f, 0.04352247f, -0.02142392f, -0.01511688f,
-0.01013997f,
-0.01356976f, -0.01684896f, 0.0060451f, 0.04427792f,
-0.01952043f,
0.03301984f, 0.01919747f, -0.07573631f, 0.03639888f,
0.02191305f,
-0.04537103f, -0.02020805f, 0.02989628f, -0.02763854f,
0.01508718f,
-0.02368836f, -0.07501216f, -0.04219899f, -0.011666f,
-0.03410469f,
-0.02371862f, 0.03246829f, -0.03481406f, -0.0509184f,
-0.01898351f,
-0.00936194f, -0.00187071f, -0.03509568f, 0.02675759f,
0.05895768f,
0.02942159f, 0.02613728f, -0.03394755f, -0.01120109f,
0.0178298f,
-0.0215499f, 0.03233969f, -0.00541283f, -0.02570712f,
-0.03569845f,
0.02218607f, -0.01393743f, -0.02423233f, -0.0404443f,
0.01366207f,
0.03717822f, 0.00285752f, 0.01067531f, 0.01338937f,
-0.0039914f, 0.0042137f,
0.02527765f, 0.00570071f, -0.03005479f, -0.0106908f,
-0.04020808f,
-0.02106227f, -0.02191528f, -0.00807879f, -0.02749275f,
-0.02673799f,
-0.04974655f, 0.03883129f, 0.02455366f, -0.06913573f,
0.04138955f,
0.00080887f, 0.00212912f, 0.04027925f, -0.03892069f,
-0.03420682f,
-0.02323551f, 0.05796257f, -0.03750849f, 0.00667946f,
-0.03036013f,
0.02840482f, -0.039955f, 0.06304609f, -0.01626217f,
-0.02711176f,
0.01290643f, -0.03695936f, 0.05259365f, -0.02309642f,
-0.02184737f,
0.01282882f, -0.03104302f, 0.03205911f, -0.05133997f,
-0.00875932f,
-0.02028998f, -0.01560533f, -0.02997949f, -0.03226095f,
-0.02307132f,
0.01840145f, -0.04772539f, 0.01581637f, 0.02813678f,
0.00196045f,
0.02321393f, -0.06910134f, -0.01907781f, 0.0065231f,
0.03835506f,
-0.02615241f, -0.00623578f, 0.0387498f, -0.03476609f,
0.01163542f,
-0.00453145f, 0.0337433f, -0.04233309f, 0.01945308f,
-0.07245115f,
-0.01838615f, 0.01881889f, -0.01215137f, -0.01985887f,
-0.02440976f,
-0.08796441f, 0.00825268f, -0.01459462f, -0.00594833f,
0.00860212f,
0.02728746f, 0.01357101f, 0.00210741f, -0.05818332f,
0.04375664f,
0.02436506f, -0.02106236f, 0.01722393f, -0.01175414f,
0.01398286f,
-0.00249037f, 0.01784974f, -0.02055603f, 0.01552977f,
-0.01426228f,
0.01551483f, 0.02264201f, -0.03158975f, -0.03215917f,
-0.00893651f,
-0.06029527f, 0.01281909f, 0.0404866f, -0.02686865f,
0.04830684f,
0.00096769f, 0.00195593f, -0.02758387f, -0.04052717f,
-0.05251801f,
-0.02535381f, -0.03872126f, 0.00161251f, -0.02407115f,
0.00913572f,
0.02893418f, -0.04106737f, 0.03164099f, -0.03089414f,
-0.07185353f,
-0.04921358f, 0.0268965f, 0.00917947f, 0.01907884f,
0.01032689f,
0.01830019f, -0.01722092f, 0.015609f, 0.03660244f,
0.03172213f, -0.0328039f,
-0.01189777f, -0.01111258f, -0.01145507f, -0.00735958f,
-0.03312368f,
0.00122242f, 0.00987834f, 0.02902891f, 0.01680993f,
0.00138047f,
0.01253759f, 0.02423062f, 0.05942119f, -0.01931902f,
-0.00785042f,
0.01157649f, 0.03180053f, 0.02252717f, -0.04868886f,
-0.02989387f,
0.05730546f, -0.00987012f, -0.06648462f, 0.02559434f,
0.03571186f,
-0.02741022f, -0.04011042f, 0.03760408f, -0.0213873f,
0.01456631f,
0.01259924f, -0.01798129f, 0.02751888f, 0.00205967f,
-0.04371194f,
0.01048153f, 0.02404012f, 0.03939452f, -0.02392218f,
-0.02260385f,
0.02636502f, 0.04661405f, 0.00970898f, 0.01249967f,
0.00092008f,
-0.00603257f, 0.01858352f, -0.03073648f, -0.02119577f,
0.00205588f,
-0.0466003f, -0.00554847f, -0.0648802f, 0.00642551f,
0.0010819f,
0.01979981f, -0.00702552f, 0.01244128f, -0.02546763f,
-0.02729692f,
0.02396826f, -0.04750704f, -0.01717373f, -0.02157315f,
0.03349263f,
-0.0401957f, 0.02839983f, -0.04444847f, 0.04992673f,
-0.02232241f,
0.02027425f, -0.00590554f, -0.01163877f, -0.03985288f,
0.01274439f,
0.05926577f, 0.01126887f, 0.02774135f, -0.01624756f,
-0.02907069f,
-0.02971713f, -0.01226422f, -0.02083266f, 0.02567067f,
-0.01430747f,
-0.04322366f, -0.0494247f, -0.02600386f, 0.02547459f,
-0.02610582f,
-0.01884322f, 0.05933591f, 0.00150254f, 0.00345098f,
0.03040724f,
-0.04789284f, 0.04158245f, 0.00119923f, -0.03210643f,
0.02472301f,
0.01233812f, 0.01544488f, -0.02220842f, -0.03512593f,
0.03067063f,
0.01220052f, -0.02166195f, -0.03889043f, 0.02450501f,
0.0090927f,
-0.03153061f, 0.02894415f, 0.03141817f, -0.002749f,
0.02073759f,
0.00635402f, 0.01055465f, 0.01173356f, 0.00482761f,
-0.01088883f,
-0.02805735f, 0.03595393f, 0.04259579f, 0.06299313f,
0.03603474f,
0.02637566f, -0.02975427f, 0.01598915f, 0.00745132f,
0.00531482f,
-0.01436347f, 0.05383908f, 0.02996238f, -0.02073054f,
-0.0207314f,
0.01192913f, -0.01873499f, 0.01398776f, 0.02481001f,
-0.03487183f,
0.03915486f, -0.05042738f, -0.02397943f, 0.04169077f,
-0.00344578f,
0.05264489f, -0.01692168f, 0.0251401f, 0.03025629f,
-0.04537756f,
-0.00937919f, 0.00718579f, 0.01522116f, 0.02089352f,
0.01974951f,
-0.05802309f, 0.03657287f, 0.01948476f, 0.01420482f,
0.06098641f,
0.00897072f, -0.0126598f, -0.05006633f, 0.04033393f,
-0.03716636f,
0.01775845f, -0.03111248f, -0.02425262f, 0.02931723f,
0.02382246f,
0.02057099f, 0.03314672f, -0.04362185f, -0.02148753f,
0.01488616f,
0.00172719f, 0.00187579f, 0.01627275f, -0.00164652f,
-0.01386414f,
-0.07553963f, 0.0114396f
};
// Perform vector similarity query using a threshold of 0.90.
float resultSimilarity = 0.90f;
float traversalSimilarity = resultSimilarity - 0.05f;
System.out.println("Similarity query results: \n");
Query similarityQuery =
new ToParentBlockJoinQuery(
new FloatVectorSimilarityQuery("text-embeddings",
queryVector,
traversalSimilarity,
resultSimilarity),
parentDocsFilter,
ScoreMode.Max);
TopDocs topDocs = indexSearcher.search(similarityQuery, 5);
printResults(indexSearcher, topDocs);
// Perform "top k" vector search using
DiversifyingChildrenFloatKnnVectorQuery.
System.out.println();
System.out.println("Top k query results using
DiversifyingChildrenFloatKnnVectorQuery: \n");
Query diversifyingChildrenFloatKnnVectorQuery = new
DiversifyingChildrenFloatKnnVectorQuery(
"text-embeddings",
queryVector,
null,
200,
parentDocsFilter);
Query rewrittenKnnQuery =
indexSearcher.rewrite(diversifyingChildrenFloatKnnVectorQuery);
Query finalQuery = new ToParentBlockJoinQuery(rewrittenKnnQuery,
parentDocsFilter, ScoreMode.Max);
topDocs = indexSearcher.search(finalQuery, 5);
printResults(indexSearcher, topDocs);
}
}
private static void printResults(IndexSearcher indexSearcher, TopDocs
topDocs)
throws IOException
{
for (ScoreDoc scoreDoc : topDocs.scoreDocs)
{
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println("Name: " + document.get("name") +
" store-item-id: " + document.get("store-item-id") +
" score: " + scoreDoc.score);
}
}
}
```
With the program above, I see the following output:
```
Similarity query results:
Name: Paris store-item-id: 643 score: 0.9144889
Name: Re: hows it goin? store-item-id: 2785 score: 0.91406804
Name: hows it goin? store-item-id: 2751 score: 0.91001636
Name: ASAP: FIJI, plane ticket store-item-id: 2144 score: 0.9088812
Name: Re: The waterfall i was talking about... store-item-id: 121 score:
0.9052492
Top k query results using DiversifyingChildrenFloatKnnVectorQuery:
Name: Re: hows it goin? store-item-id: 2785 score: 0.91406804
Name: hows it goin? store-item-id: 2751 score: 0.91001636
Name: Paris store-item-id: 643 score: 0.909234
Name: ASAP: FIJI, plane ticket store-item-id: 2144 score: 0.9088812
Name: Re: The waterfall i was talking about... store-item-id: 121 score:
0.9052492
```
The results are the same, except for the "Paris" item has a much higher
score via the similarity search method, compared to the "top k" method. If I
change the `k` parameter to 201, then the output for both methods match
exactly. Clearly there is a child/nested document under the "Paris" item which
is being missed when using "top k" retrieval.
Could it be there is some subtle graph connectivity issue here?
For reference, this index uses a custom codec as a part of your
recommendations on this ticket earlier on:
```java
package testme;
import org.apache.lucene.codecs.FilterCodec;
import org.apache.lucene.codecs.KnnVectorsFormat;
import org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat;
import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
public class Lucene99VectorCodec extends FilterCodec
{
/**
* Creates a new codec using {@link
org.apache.lucene.backward_codecs.lucene99.Lucene99Codec} and enabling
* vector quantisation.
*/
public Lucene99VectorCodec()
{
super("Lucene99VectorCodec", new
org.apache.lucene.backward_codecs.lucene99.Lucene99Codec());
}
@Override
public KnnVectorsFormat knnVectorsFormat()
{
// PerField is required as per
https://github.com/apache/lucene/issues/13626.
return new PerFieldKnnVectorsFormat()
{
@Override
public KnnVectorsFormat getKnnVectorsFormatForField(String field)
{
return new Lucene99HnswVectorsFormat(16, 250);
}
};
}
}
```
Here is a link to the index which reproduces this:
https://drive.google.com/file/d/1m0BsPBRgZf1r7gzbB2a0QfcVABjm7fTL/view?usp=sharing.
Many thanks in advance for any insights you can provide.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]