thanks hoss... thats a real good explanation ..
well,  don care about the sort order i just want all of the docs .. and yes
score values may be duplicated which will deteriorate my search
performance...
before going into lucene doc id , i have got creationDate datetime field in
my index which i can use as page definition using filter query..
i have learned exposing lucene docid wont be a clever idea, as its again
relative to index instance.. where as my index date field will be unique
..and i can definitely create ranges with that..

i ahve got on more doubt .. if i use filter query each time will it result
in memory problem like that we see in deep paging issues..

On 23 August 2011 01:05, Chris Hostetter <hossman_luc...@fucit.org> wrote:

>
> : retrieving 1 million docids from solr through paging is resulting in deep
> : pagin issues..so i wonder if i can use filter queries to fetch all the 1
> : mllion docids chunk by chunk .. so for me the best filter wiould score...
> if
> : i can find the maximum score i can filter out other docs ..
> :
> : what is the minimum value of solr score? i don think it will have
> negative
> : values.. so if its always above 0.. my first chunk wud be score [0 TO *]&
> : rows =10000 my next chunk will start from the max score from first chunk
> to
> : * with rows =10000 .. this will ensure that while fetching the 1000th
> chunk
> : solr don have to get all the previous doc ids into memory ..
>
> a) given an arbitrary query, there is no min/max score (think about
> function queries, you could write a math based query that results in
> -100000 being the highest score)
>
> b) you could use an frange query on score to partition your docs like
> this.  you'd need to start with an unfiltered query, record the docid and
> score for all of "page #1" and then use the score of the last docid on
> page #1 as the min for your filter when asking for "page #2" (still with
> start=0 though) .. but you'd have to manually ignore any docs you'd
> already seen because of duplicate scores.
>
> I'm not sure if this would really gain you much though -- yes this would
> work arround some of the memory issues inherient in "deep paging" but it
> would still require a lot or rescoring of documents again and again.
>
> If that type of appraoch works for you, then you'd probably be better off
> using your own ID field as the sort/filter instead of score (since there
> would be no duplicates)
>
> Based on your problem description though, it sounds like you don't
> actaully care about the scores -- and i don't see anything in your writup
> that suggests that the order actually matters to you -- you just want them
> "all" ... correct?
>
> in that case, have you considered jsut using "sort=_docid_ asc" ?
>
> that gives you the internal lucene doc id "sorting" which actually means
> no sorting work is needed, which i *think* means there is no in memory
> buffering needed for the deep paging situation.
>
>
> -Hoss
>



-- 

-JAME

Reply via email to