Hello, we are currently developing a combined index for book metadata and fulltexts. Our primary core contains metadata of ~12Mio. books. ~0.5Mio. of them have fulltexts; those fulltexts are indexed in a secondary core. This secondary core has one index document per fulltext page. We are joining all matching fulltext pages with the bookwise metadata in the primary core. Currently we have the problem that scores for books with matches from the secondary core are not comparable with matches from metadata only. So we are trying to normalize fulltext scores to be in the same dimension as the metadata scores for non-digitized results.
This is a basic query without join using only the primary core (metadata): http://server/solr/live/select?&q=+geschichte&fl=id,score Top 10 result scores range from 2.0 to 1.7 For fulltexts, the query is extended with a join: http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29&fl=id,score Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the first hit result from the joined secondary core. We would like to reduce this value. See explain output below [1]) This difference will effectively hide any books without fulltexts from hitlists, which is not our goal. We tried to add lucene boosts to the join subquery, but they do not have any effect on the final scores. E.g. we 'down boost' the fulltext results by a factor of 0.1: q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages to=id score=max v='pageno_content:(+geschichte)^0.1'}) But the resulting scores are the same as from the join example above. Is this the correct query syntax, or should the boost for the join query be put somewhere else? Thanks for any suggestions. Best Regards Alena [1] Explain output for the first hit of the join example query 5.398742 = sum of: 4.816505 = sum of: 0.07251295 = max of: 0.07251295 = weight(title:geschichte in 10585926) [ClassicSimilarity], result of: 0.07251295 = score(doc=10585926,freq=1.0), product of: 0.037440736 = queryWeight, product of: 5.1646385 = idf(docFreq=197504, maxDocs=12713278) 0.00724944 = queryNorm 1.9367394 = fieldWeight in 10585926, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.1646385 = idf(docFreq=197504, maxDocs=12713278) 0.375 = fieldNorm(doc=10585926) 0.005904072 = weight(free_search:geschichte in 10585926) [ClassicSimilarity], result of: 0.005904072 = score(doc=10585926,freq=2.0), product of: 0.022005465 = queryWeight, product of: 3.035471 = idf(docFreq=1660594, maxDocs=12713278) 0.00724944 = queryNorm 0.26830027 = fieldWeight in 10585926, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.035471 = idf(docFreq=1660594, maxDocs=12713278) 0.0625 = fieldNorm(doc=10585926) 4.743992 = Score based on join value 957245 0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity], result of: 0.58188105 = score(doc=10585926,freq=1.0), product of: 0.4592555 = queryWeight, product of: 50.0 = boost 1.2670095 = idf(docFreq=9734121, maxDocs=12713278) 0.00724944 = queryNorm 1.2670095 = fieldWeight in 10585926, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.2670095 = idf(docFreq=9734121, maxDocs=12713278) 1.0 = fieldNorm(doc=10585926) 3.5596997E-4 = FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)), product of: 0.00491031 = 1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0) 0.0724944 = boost 1.0 = queryNorm