After a lot of debugging, I finally found why the order of collapse results are not matching the uncollapsed results. I can't say if it is a bug in the implementation of fieldcollapse or not.
*Explaination:* Actually, I am querying the fieldcollapse with some filters to restrict the collapsing to some particular categories only by appending the parameter: fq=ctype:(1+2+8+6+3). In: NonAdjacentDocumentCollapser.doQuery() Line: DocSet filter = searcher.getDocSet(filterQueries); Here, filter docset is got without any scores (since I have filter in my query, this line actually gets executed) and also stored in the filter cache. In the next line in the code, the actual uncollapsed DocSet is got passing the DocSetScoreCollector. Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter, DocSetAwareCollector collector) Line: if (filterCache != null) Because of the filter cache not being null, and no result for the query in the cache, the line: first = getDocSetNC(absQ,null); gets executed. Notice, over here the DocSetScoreCollector is not passed. Hence, results are collected without any scores. This makes the uncollapsedDocSet to be without any scores and hence the sorting is not done based on score. @Martijn: Is what I am right or I should use field collapsing in some other way. Else, what is the ideal fix for this problem (I am not an active developer, so can't say the fix that I do will not break anything). -- Thanks, Varun Gupta On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta <varun.vgu...@gmail.com>wrote: > When I used collapse.threshold=1, out of the 5 categories 4 had the same > top result, but 1 category had a different result (it was the 3rd result > coming for that category when I used threshold as 3). > > -- > Thanks, > Varun Gupta > > > > On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen < > martijn.is.h...@gmail.com> wrote: > >> I would not expect that Solr 1.4 build is the cause of the problem. >> Just out of curiosity does the same happen when collapse.threshold=1? >> >> 2009/12/11 Varun Gupta <varun.vgu...@gmail.com>: >> > Here is the field type configuration of ctype: >> > <field name="ctype" type="integer" indexed="true" stored="true" >> > omitNorms="true" /> >> > >> > In solrconfig.xml, this is how I am enabling field collapsing: >> > <searchComponent name="query" >> > class="org.apache.solr.handler.component.CollapseComponent"/> >> > >> > Apart from this, I made no changes in solrconfig.xml for field collapse. >> I >> > am currently not using the field collapse cache. >> > >> > I have applied the patch on the Solr 1.4 build. I am not using the >> latest >> > solr nightly build. Can that cause any problem? >> > >> > -- >> > Thanks >> > Varun Gupta >> > >> > >> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < >> > martijn.is.h...@gmail.com> wrote: >> > >> >> I tried to reproduce a similar situation here, but I got the expected >> >> and correct results. Those three documents that you saw in your first >> >> search result should be the first in your second search result (unless >> >> the index changes or the sort changes ) when fq on that specific >> >> category. I'm not sure what is causing this problem. Can you give me >> >> some more information like the field type configuration for the ctype >> >> field and how have configured field collapsing? >> >> >> >> I did find another problem to do with field collapse caching. The >> >> collapse.threshold or collapse.maxdocs parameters are not taken into >> >> account when caching, which is off course wrong because they do matter >> >> when collapsing. Based on the information you have given me this >> >> caching problem is not the cause of the situation you have. I will >> >> update the patch that fixes this problem shortly. >> >> >> >> Martijn >> >> >> >> 2009/12/10 Varun Gupta <varun.vgu...@gmail.com>: >> >> > Hi Martijn, >> >> > >> >> > I am not sending the collapse parameters for the second query. Here >> are >> >> the >> >> > queries I am using: >> >> > >> >> > *When using field collapsing (searching over all categories):* >> >> > >> >> >> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch >> >> > >> >> > categories is represented as the field "ctype" above. >> >> > >> >> > *Without using field collapsing:* >> >> > >> >> >> spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch >> >> > >> >> > I append "&fq=ctype:1" to the above queries when trying to get >> results >> >> for a >> >> > particular category. >> >> > >> >> > -- >> >> > Thanks >> >> > Varun Gupta >> >> > >> >> > >> >> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen < >> >> > martijn.is.h...@gmail.com> wrote: >> >> > >> >> >> Hi Varun, >> >> >> >> >> >> Can you send the whole requests (with params), that you send to Solr >> >> >> for both queries? >> >> >> In your situation the collapse parameters only have to be used for >> the >> >> >> first query and not the second query. >> >> >> >> >> >> Martijn >> >> >> >> >> >> 2009/12/10 Varun Gupta <varun.vgu...@gmail.com>: >> >> >> > Hi, >> >> >> > >> >> >> > I have documents under 6 different categories. While searching, I >> want >> >> to >> >> >> > show 3 documents from each category along with a link to see all >> the >> >> >> > documents under a single category. I decided to use field >> collapsing >> >> so >> >> >> that >> >> >> > I don't have to make 6 queries (one for each category). Currently >> I am >> >> >> using >> >> >> > the field collapsing patch uploaded on 29th Nov. >> >> >> > >> >> >> > Now, the results that are coming after using field collapsing are >> not >> >> >> > matching the results for a single category. For example, for >> category >> >> C1, >> >> >> I >> >> >> > am getting results R1, R2 and R3 using field collapsing, but after >> I >> >> see >> >> >> > results only from the category C1 (without using field collapsing) >> >> these >> >> >> > results are nowhere in the first 10 results. >> >> >> > >> >> >> > Am I doing something wrong or using the field collapsing for the >> wrong >> >> >> > feature? >> >> >> > >> >> >> > I am using the following field collapsing parameters while >> querying: >> >> >> > collapse.field=category >> >> >> > collapse.facet=before >> >> >> > collapse.threshold=3 >> >> >> > >> >> >> > -- >> >> >> > Thanks >> >> >> > Varun Gupta >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Met vriendelijke groet, >> >> >> >> >> >> Martijn van Groningen >> >> >> >> >> > >> >> >> >> >> >> >> >> -- >> >> Met vriendelijke groet, >> >> >> >> Martijn van Groningen >> >> >> > >> >> >> >> -- >> Met vriendelijke groet, >> >> Martijn van Groningen >> > >