thanks hoss... thats a real good explanation .. well, don care about the sort order i just want all of the docs .. and yes score values may be duplicated which will deteriorate my search performance... before going into lucene doc id , i have got creationDate datetime field in my index which i can use as page definition using filter query.. i have learned exposing lucene docid wont be a clever idea, as its again relative to index instance.. where as my index date field will be unique ..and i can definitely create ranges with that..
i ahve got on more doubt .. if i use filter query each time will it result in memory problem like that we see in deep paging issues.. On 23 August 2011 01:05, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : retrieving 1 million docids from solr through paging is resulting in deep > : pagin issues..so i wonder if i can use filter queries to fetch all the 1 > : mllion docids chunk by chunk .. so for me the best filter wiould score... > if > : i can find the maximum score i can filter out other docs .. > : > : what is the minimum value of solr score? i don think it will have > negative > : values.. so if its always above 0.. my first chunk wud be score [0 TO *]& > : rows =10000 my next chunk will start from the max score from first chunk > to > : * with rows =10000 .. this will ensure that while fetching the 1000th > chunk > : solr don have to get all the previous doc ids into memory .. > > a) given an arbitrary query, there is no min/max score (think about > function queries, you could write a math based query that results in > -100000 being the highest score) > > b) you could use an frange query on score to partition your docs like > this. you'd need to start with an unfiltered query, record the docid and > score for all of "page #1" and then use the score of the last docid on > page #1 as the min for your filter when asking for "page #2" (still with > start=0 though) .. but you'd have to manually ignore any docs you'd > already seen because of duplicate scores. > > I'm not sure if this would really gain you much though -- yes this would > work arround some of the memory issues inherient in "deep paging" but it > would still require a lot or rescoring of documents again and again. > > If that type of appraoch works for you, then you'd probably be better off > using your own ID field as the sort/filter instead of score (since there > would be no duplicates) > > Based on your problem description though, it sounds like you don't > actaully care about the scores -- and i don't see anything in your writup > that suggests that the order actually matters to you -- you just want them > "all" ... correct? > > in that case, have you considered jsut using "sort=_docid_ asc" ? > > that gives you the internal lucene doc id "sorting" which actually means > no sorting work is needed, which i *think* means there is no in memory > buffering needed for the deep paging situation. > > > -Hoss > -- -JAME