Hi, I have a solr schema which has fields related to Indian legal judgments and want to provide a search engine on top of them. I came across a problem which I thought I would take the group's advise on.
For discussion sake let us assume there are only two fields "assessee" and "itat_order" which are text fields; the latter has the entire judgment of the court in text form. Now I search using dismax against these 2 fields using a query like below http://localhost:8983/solr/itat/select?q=additional+depreciation&start=20&rows=30&fl=assessee%2C+itat_order%2C+score&wt=xml&indent=true&defType=dismax&qf=assessee<http://techgaruda.com:8983/solr/itat/select?q=additional+depreciation&start=20&rows=30&fl=assessee%2C+itat_order%2C+score&wt=xml&indent=true&defType=dismax&qf=assessee> ^0.3+itat_order^0.2 For such a dismax query, the words additional depreciation (2 words without quotes), we get results with additional and depreciation separately occurring having higher score than results which have the words additional depreciation occurring immediately together. Why does this happen? Shouldn't we ideally be getting exact matches of additional depreciation first and then matches which have both the words but apart from each other after these exact matches? (In general when I search for A B shouldn't I get matches with A B as they appear first and then A and B separated by distance or singly occuring?) Below I have pasted the score and # of occurences given for three results; if you want I can share the text fields in these cases too. (Also, for what its worth, the solr index uses only a whitespacerfilterfactory and lowercasefilterfactory for querying and indexing) thanks Vulcanoid """ decision of Heatshrink Technologies : score : 0.083743244 additional depreciation : 0 occurrence additional : 2 occurrences depreciation : 27 occurrences decision of Srinivasa Raju score : 0.08313061 additional depreciation : 0 occurrences additional : 5 occurrences depreciation : 30 occurrences decision of Nani Agro Foods score : 0.08217349 additional depreciation : 5 occurrences additional : 5 occurrences depreciation : 5 occurrences """