I think it works to join against the other collection to get scores. But I’m not sure. I think that was suggested for a fairly static collection of documents with rapidly changing scoring inputs.
Personally, I would try a straight popularity boost to see if it got you 80% of the way there. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2016, at 2:46 PM, Mark T. Trembley <mark.tremb...@etrailer.com> > wrote: > > Yes, the spam issue is something I'm aware of. I plan on having some sanity > checks in place to make sure that the boosts are in line with expectations > either at query time or while indexing the scores into Solr. > > I just read through that document along with some of the more recent posts > about signals, and it appears that I'm going down the same path as > Lucidworks. I'm storing the aggregated search term and product id in an > alternate index. It seems that the piece that I'm missing is getting the > boost per document. In the following post, it appears to me that Fusion is > applying a boost to the main query by obtaining the scores from a set number > of documents from the aggregate collection. I'm going to assume that part of > it's query processing pipeline is to run a query on the aggregation > collection to obtain the scores from that query and return them for use on > the main query. > > https://lucidworks.com/blog/2015/09/01/better-search-fusion-signals/ > > I think I could possibly hack something together on my side that mimics what > I think is happening in Fusion, but with my tinkering, it seems to me that > using a !join query (with scoring) like I've been trying could handle the job > if I could only understand how the query executes on the joined collection > and how I can pass a calculated score back to the main query for use in > calculating a final score on the main collection. > > > On 7/7/2016 1:34 PM, Walter Underwood wrote: >> If it is running in an environment protected from spammers, you might want >> to start with the work that LucidWorks did on click scoring. >> >> https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/ >> >> <https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/> >> >> Of course, there are no environments free of spammers. I’ve seen them in >> enterprise search, too. But they are easier to deal with there. Call them up >> and tell them they need to stop immediately or their pages disappear from >> the search engine. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Jul 7, 2016, at 11:29 AM, Walter Underwood <wun...@wunderwood.org> wrote: >>> >>> You understand that you are making your site extremely easy to spam, right? >>> This is how Microsoft became the top hit for “evil empire” on Google. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> >>>> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <mark.tremb...@etrailer.com> >>>> wrote: >>>> >>>> I've found that it is definitely complicated! >>>> >>>> Essentially what I am attempting to do is boost products based on the >>>> number of times that particular product has been selected via historical >>>> searches using the same search term or phrase. >>>> >>>> >>>> On 7/7/2016 11:55 AM, Walter Underwood wrote: >>>>> That is a very complicated design. What are you trying to achieve? Maybe >>>>> there is a different approach that is simpler. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>> >>>>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley >>>>>> <mark.tremb...@etrailer.com> wrote: >>>>>> >>>>>> That works with static boosts based on documents matching the query >>>>>> "Boost2". I want to apply a different boost to documents based on the >>>>>> value assigned to Boost2 within the document. >>>>>> >>>>>> From my sample documents, when running a query with "Boost2," I want >>>>>> Document2 boosted by 20.0 and Document6 boosted by 15.0: >>>>>> >>>>>> { >>>>>> "id" : "Document2_Boost2", >>>>>> "B1_s" : "Boost2", >>>>>> "B1_f" : 20 >>>>>> } >>>>>> { >>>>>> "id" : "Document6_Boost2", >>>>>> "B1_s" : "Boost2", >>>>>> "B1_f" : 15 >>>>>> } >>>>>> >>>>>> >>>>>> On 7/7/2016 10:21 AM, Walter Underwood wrote: >>>>>>> This looks like a job for “bq”, the boost query parameter. I used this >>>>>>> to boost textbooks which were used at the student’s school. bq does not >>>>>>> force documents to be included in the result set. It does affect the >>>>>>> ranking of the included documents. >>>>>>> >>>>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use >>>>>>> weights, like bq=B1_ss:Boost2^10 >>>>>>> >>>>>>> Here is the relationship between fq, q, and bq: >>>>>>> >>>>>>> fq: selection, does not affect ranking >>>>>>> q: selection and ranking >>>>>>> bq: does not affect selection, affects ranking >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org >>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>> >>>>>>> >>>>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley >>>>>>>> <mark.tremb...@etrailer.com> wrote: >>>>>>>> >>>>>>>> I have a question about the best way to rank my results based on a >>>>>>>> score field that can have different values per document and where each >>>>>>>> document can have different scores based on which term is queried. >>>>>>>> >>>>>>>> Essentially what I'm wanting to have happen is provide a list of terms >>>>>>>> that when matched via a query it returns a corresponding score to help >>>>>>>> boost the original document. So if I had a document with a >>>>>>>> multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], >>>>>>>> [Boost3|100] and my search query is "Boost2", I want that document's >>>>>>>> result to be boosted by 20. Also note that "Boost2" can boost >>>>>>>> different documents at different levels. The query to select the >>>>>>>> actual documents will select against other fields in the document and >>>>>>>> could possibly return documents with any combination of B1 terms. >>>>>>>> >>>>>>>> I'm still trying to figure out how best to model this in my index, >>>>>>>> either as child documents, or in another collection, or if it would >>>>>>>> make more sense to figure out how to make it work via payloads or by >>>>>>>> boosting the terms at index time. >>>>>>>> >>>>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete >>>>>>>> replica of all collections. >>>>>>>> >>>>>>>> The document structure I've been toying with the most is to put the >>>>>>>> boosts into a separate index and join them using !join syntax and >>>>>>>> returning the scores, but I've not had any luck getting quality >>>>>>>> results from those tests. The extra "scores" index is structured like >>>>>>>> this (I'll add the json for my test collections at the end of the >>>>>>>> email): >>>>>>>> id:Document1_Boost1 >>>>>>>> B1_s:Boost1 >>>>>>>> B1_f:10 >>>>>>>> id:Document1_Boost3 >>>>>>>> B1_s:Boost3 >>>>>>>> B1_f:100 >>>>>>>> Using this structure, I get close, but the scores are not what I'm >>>>>>>> expecting. If I use the following query, the explain says it's using >>>>>>>> the score from Document6_Boost2 even though my query is specifying >>>>>>>> B1_s:Boost3 >>>>>>>> http://localhost:8983/solr/generic/select?q={!join from=id >>>>>>>> to=B1_name_ss fromIndex=scores >>>>>>>> score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true >>>>>>>> >>>>>>>> <lstname="explain"> >>>>>>>> <strname="Document6"> >>>>>>>> *3.379996* = Score based on join value Document6_Boost2 >>>>>>>> </str> >>>>>>>> <strname="Document1"> >>>>>>>> *2.2533307* = Score based on join value Document1_Boost1 >>>>>>>> </str> >>>>>>>> <strname="Document7"> >>>>>>>> *0.24786638* = Score based on join value Document7_Boost333 >>>>>>>> </str> >>>>>>>> <strname="Document3">*0.0* = Score based on join value >>>>>>>> Document3_NoBoost</str> >>>>>>>> </lst> >>>>>>>> >>>>>>>> My guess is that it's now doing an all document query on the "scores" >>>>>>>> collection to return the scores in addition to the B1_s query I've >>>>>>>> passed in. I can't figure out where it's getting those scores from as >>>>>>>> a simple query against the "scores" collection returns scores like I'd >>>>>>>> expect to see them based on a similar query: >>>>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND >>>>>>>> _val_:B1_f&fl=score,*&debugQuery=true >>>>>>>> >>>>>>>> <lstname="explain"> >>>>>>>> <strname="Document1_Boost3"> >>>>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) >>>>>>>> [ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0), >>>>>>>> product of: 0.8926926 = queryWeight, product of: 1.9808292 = >>>>>>>> idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = >>>>>>>> fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = >>>>>>>> termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = >>>>>>>> fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: >>>>>>>> 100.0 = float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm >>>>>>>> </str> >>>>>>>> <strname="Document6_Boost3"> >>>>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) >>>>>>>> [ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0), >>>>>>>> product of: 0.8926926 = queryWeight, product of: 1.9808292 = >>>>>>>> idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = >>>>>>>> fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = >>>>>>>> termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = >>>>>>>> fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: >>>>>>>> 30.0 = float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm >>>>>>>> </str> >>>>>>>> </lst> >>>>>>>> >>>>>>>> I feel like I'm getting close to what I need, but it's just not clear >>>>>>>> to me what I'm missing at this point. >>>>>>>> >>>>>>>> The other option I've been toying with is using payloads, but actually >>>>>>>> utilizing the payloads as part of the scoring process is beyond me at >>>>>>>> this time. >>>>>>>> >>>>>>>> Any thoughts or hints on the best way to boost the relevancy of these >>>>>>>> scoreswould be appreciated. >>>>>>>> Thanks >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> GENERIC: >>>>>>>> { >>>>>>>> "id" : "Document1", >>>>>>>> "B1_ss" : ["Boost1|10","Boost3|100"], >>>>>>>> "title_s" : "Title1" >>>>>>>> ,"otherstuff_ss" : ["stuff1","suggestion"] >>>>>>>> ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"] >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document2", >>>>>>>> "B1_ss" : ["Boost2|20"], >>>>>>>> "name_s" : "Product2", >>>>>>>> "title_s" : "Title2" >>>>>>>> ,"otherstuff_ss" : ["stuff2","recommendation"] >>>>>>>> ,"B1_name_ss" : ["Document2_Boost1"] >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document3", >>>>>>>> "name_s" : "Product3", >>>>>>>> "B1_ss" : ["NoBoost"], >>>>>>>> "title_s" : "Title3" >>>>>>>> ,"otherstuff_ss" : ["stuff3","new","suggestion"] >>>>>>>> ,"B1_name_ss" : ["Document3_NoBoost"] >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document4", >>>>>>>> "name_s" : "Product4", >>>>>>>> "title_s" : "Title4" >>>>>>>> ,"otherstuff_ss" : ["stuff4","old","suggestion"] >>>>>>>> } , >>>>>>>> { >>>>>>>> "id" : "Document5", >>>>>>>> "name_s" : "Product5", >>>>>>>> "title_s" : "Title5" >>>>>>>> ,"otherstuff_ss" : ["stuff5","recommendation"] >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document6", >>>>>>>> "name_s" : "Product6", >>>>>>>> "B1_ss" : ["Boost2|15","Boost3|30"], >>>>>>>> "title_s" : "Title6" >>>>>>>> ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"] >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document7", >>>>>>>> "name_s" : "Product7", >>>>>>>> "B1_ss" : ["NoBoost","Boost333|1.1"], >>>>>>>> "title_s" : "Title7" >>>>>>>> ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"] >>>>>>>> } >>>>>>>> >>>>>>>> SCORES: >>>>>>>> { >>>>>>>> "id" : "Document1_Boost1", >>>>>>>> "B1_s" : "Boost1", >>>>>>>> "B1_f" : 10 >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document1_Boost3", >>>>>>>> "B1_s" : "Boost3", >>>>>>>> "B1_f" : 100 >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document2_Boost2", >>>>>>>> "B1_s" : "Boost2", >>>>>>>> "B1_f" : 20 >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document3_NoBoost", >>>>>>>> "B1_s" : "NoBoost" >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document6_Boost2", >>>>>>>> "B1_s" : "Boost2", >>>>>>>> "B1_f" : 15 >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document6_Boost3", >>>>>>>> "B1_s" : "Boost3", >>>>>>>> "B1_f" : 30 >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document7_NoBoost", >>>>>>>> "B1_s" : "NoBoost" >>>>>>>> }, >>>>>>>> { >>>>>>>> "id" : "Document7_Boost333", >>>>>>>> "B1_s" : "Boost333", >>>>>>>> "B1_f" : 1.1 >>>>>>>> } >>>>>>>> >> >