On 10/31/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > the biggest factor to worry about is the number of "sources" ... the key > to understanidng the performance risks is to understand that: > 1) no matter how many documents do or don't have a value for a given > field, when you sort on thta field, a (cached) array containing one > element for *every* doc in your index is used for that field.
Thanks very much for your helpful reply ... a few follow-up questions: Each element of the cached array is a ... what? The ID of the document? (I'll be happy to answer this myself by reading the source code, but I'm not quite sure where to start looking.) What happens if there are more sort operations on those fields than there is memory to hold the cached arrays? OOM exceptions? Failed searches? Or simply cache evictions and degraded performance? Something else? > 2) sorting dynamic fields is no differnet then sorting regular fields. Good to know, thanks. > ...so if you've got three sources, and from each source you get a > "userRatingAvg" and a "userRatingSum" and you want to sort on them, it > doesn't matter if you create 6 distinct fields, or two dynamic fields; and > it doesn't matter if only 5 of your 400K docs have values for any one of > those fields -- An array of 400K entires is going to be created for each > of those fields the first time you sort on it with each "newSearcher" Is the (max? min?) number of newSearchers something you control in solrconfig.xml? Also, it seems a bit inefficient to bother allocating an array containing an entry for each document when only some small percentage of the documents actually contain values for the field. Would it be worth investigating whether you could somehow avoid this to save some RAM? Thanks again, Charlie