bq: I am also trying to figure out if I can place extra dimensions to the solr score which takes other attributes into consideration
Have you looked at function queries? The whole point of them is to do something that influences score, which may be quite complex. There are ways to, say, multiply the value of a field into the score calculations, include the value of an external file field, etc... See: http://wiki.apache.org/solr/FunctionQuery Best Erick On Thu, Jul 25, 2013 at 2:55 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > I agree with your comment on separating noise with the actual relevant > result. > My approach to separate relevant result with noise is not algorithmic but > an absolute measure, i.e. top 5 or top 10 results will always be relevant > (at-least the probability is higher). > But again, that kind of simple sort can be done by the client too. > > The current relevant results are purely based off PMIs which is calculated > using the clickstream data. I am also trying to figure out if I can place > extra dimensions to the solr score which takes other attributes into > consideration. > i.e. extending the way solr computes the score with attachment_count (more > attachments, more important), confidence (stronger source has higher > confidence) etc. > > Is there a way I can have my custom scoring function which extends (and not > overwrites) solr's scores? > > Thanks, > -Utkarsh > > > On Wed, Jul 24, 2013 at 7:35 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> You can certainly just include the attachment count in the >> response and have the app apply the secondary sort. But.... >> that doesn't separate the "noise" as you say. >> >> How would you identify "noise"? If you don't have an algorithmic >> way to do that, I don't know how you'd manage to separate >> the signal from the noise.... >> >> Best >> Erick >> >> On Wed, Jul 24, 2013 at 4:37 PM, Utkarsh Sengar <utkarsh2...@gmail.com> >> wrote: >> > I have a solr query which has a bunch of boost params for relevancy. This >> > search works fine and returns the most relevant documents as per the user >> > query. For example, if user searches for: "iphone 5", keywords like >> > "apple", "wifi" etc are boosted. I get these keywords from external >> > training. The top 10-20 results are iphone 5 phones and then it follows >> > iphone cases and other noise. >> > >> > But I also have a field in the schema called: attachment_count. I need to >> > sort the top N result I get after boost based on this field. >> > >> > Example: >> > I want to sort the top 5 documents based on attachment_count on the >> boosted >> > result (which are relevant for the user). >> > >> > 1. iphone 5 32gb, attachment_count=0 >> > 2. iphone 5 16gb, attachment_count=5 >> > 3. iphone 5 32gb, attachment_count=10 >> > 4. iphone 4gs, attachment_count=3 >> > 5. iphone 4, attachment_count=1 >> > ... >> > 11. iphone 5 case, attachment_count=100 >> > >> > >> > Expected result: >> > 1. iphone 5 32gb, attachment_count=10 >> > 2. iphone 5 16gb, attachment_count=5 >> > 3. iphone 4gs, attachment_count=3 >> > 4. iphone 4, attachment_count=1 >> > 5. iphone 5 32gb, attachment_count=0 >> > ... >> > 11. iphone 5 case, attachment_count=100 >> > >> > >> > Is this possible using a function query? I am not sure how the results >> will >> > look like but I want to try it out. >> > >> > -- >> > Thanks, >> > -Utkarsh >> > > > > -- > Thanks, > -Utkarsh