Do you want to page through all items or through the result of a
query (like all hits for "civil war" in call number order).
If you want the former, then a text search engine is really
the wrong tool. This problem only requires indexed sequential
file formats (like B-trees), something that worked quite well
30 or 40 years ago, even before relational databases were invented.
Text search engines, like Lucene/Solr, have sorting and traversal
as a secondary feature. Their primary feature is relevance ranking.
With only 8M items, I'd be inclined to put them in a big array
sorted by call number, and use binary search. Sounds dumb, but
it is really fast. The entries would be a simple pair, call
number and key.
wunder
On 11/28/08 4:41 PM, "Naomi Dushay" <[EMAIL PROTECTED]> wrote:
> Gosh, I'm sorry to be so unclear. Hmm. Trying to clarify below:
>
> On Nov 28, 2008, at 3:52 PM, Chris Hostetter wrote:
>
>> Having read through this thread, i'm not sure i understand what
>> exactly
>> the problem is. my naive understanding is...
>>
>> 1) you want to sort by a field
>> 2) you want to be able to "paginate" through all docs in order of this
>> field.
>> 3) you want to be able to start your pagination at any arbitrary
>> value for
>> this field.
>>
>> so (assuming the field is a simple number for now) you could us
>> something
>> like
>>
>> q=yourField:[42 TO *&sort=yourField+asc&rows=10&start-0
>>
>> where "42" is the arbitrary ID someone wants to start at.
>>
>
> perfect. This is the query I'm using.
>
> The results are correct. But the response time sucks.
>
> Reading the docs about caches, I thought I could populate the query
> result cache with an autowarming query and the response time would be
> okay. But that hasn't worked. (See excerpts from my solrConfig file
> below.)
>
> A repeated query is very fast, implying caching happens for a
> particular starting point ("42" above).
>
> Is there a way to populate the cache with the ENTIRE sorted list of
> values for the field, so any arbitrary starting point will get results
> from the cache, rather than grabbing all results from (x) to the end,
> then sorting all these results, then returning the first 10?
>
>
>> This sentence below seems to imply that you have a solution which
>> produces
>> correct results, but doesn't produce results quickly...
>
> right.
>
>> : I have a performance problem and I haven't thought of a clever way
>> around it.
>>
>> ...however this lines seems to suggest that you're having trouble
>> getting at least 10 results from any query (?)
>>
>> : Call numbers are squirrelly, so we can't predict the string that
>> will
>> : appropriately grab at least 10 subsequent documents. They are
>> certainly not
>> : consecutive!
>> :
>> : so from
>> : A123 B34 1970
>> :
>> : we're unable to predict if any of these will return at least 10
>> results:
>
> I was trying to express that I couldn't do this:
>
> myfield:[X TO Y]
>
> because I can't algorithmically compute Y.
>
> Glen Newton suggested a work around, whereby I represent my
> squirrelly, but sortable, field values as floating point numbers, and
> then I can compute Y.
>
>> ...but i'm not sure what exactly that means. for any given field,
>> there
>> is always going to be some values X such that myField:[X TO *] won't
>> return at least 10 docs ... the are the last values in the index in
>> order
>> -- surely it's okay for your app to have an "end" state when you run
>> out
>> of data? :)
>
> yes. Understood. This is not an issue.
>
>> Oh, and BTW...
>>
>> : numbers in sort order". I have also mucked about with the cache
>> : initialization, but that's not working either:
>> :
>> : <listener event="firstSearcher"
>> class="solr.QuerySenderListener">
>>
>> ...make sure you also do a newSearcher listener that does the same
>> thing,
>> otherwise your FieldCache (used for sorting) may not be warmed when
>> commits happen)
>
> Yup yup yup.
>
> from solrconfig:
>
> <filterCache
> class="solr.LRUCache"
> size="20000000"
> initialSize="10000000"
> autowarmCount="500000"/>
>
> <queryResultCache
> class="solr.LRUCache"
> size="10000000"
> initialSize="5000000"
> autowarmCount="5000000"/>
>
>
> <listener event="newSearcher" class="solr.QuerySenderListener">
> <arr name="queries">
> <!-- populate query result cache for sorted queries -->
> <lst>
> <str name="q">shelfkey:[0 TO *]</str>
> <str name="sort">shelfkey asc</str>
> </lst>
> </arr>
> </listener>
>
> <listener event="firstSearcher" class="solr.QuerySenderListener">
> <arr name="queries">
> <!-- populate query result cache for sorted queries -->
> <lst>
> <str name="q">shelfkey:[0 TO *]</str>
> <str name="sort">shelfkey asc</str>
> </lst>
>
>