Hi there, I'm evaluating Solr as a replacement for our current search server, and am trying to determine what the best strategy would be to implement our business needs. Our problem is that we have a catalog schema with products and skus, one to many. The most relevant content being indexed is at the product level, in the name and description fields. However we are interested in filtering by sku attributes, and in particular making multiple filters apply to a single sku. For example, find a product that contains a sku that is both blue and on sale. No approach I've tried at collapsing the sku data into the product document works for this. If we put the data in separate fields, there's no way to apply multiple filters to the same sku. and if we concatenate all of the relevant sku data into a single multivalued field then as I understand it, this is just indexed as one large field with extra whitespace between the individual entries, so there's still no way to enforce that an AND filter query applies to the same sku.
One approach I was considering was to create separate indexes for products and skus, and store the product IDs in the sku documents. Then we could apply our own filters to the initially generated list, based on unique query parameters. I thought creating a component between query and facet would be a good place to add such a filter, but further research seems to indicate that this would break paging and sorting. The only other thing I can think of would be to subclass QueryComponent itself, which looks rather daunting-the process() method has no hooks for this sort of thing, it seems I would have to copy the entire existing implementation and add them myself, which looks to be a fair chunk of work and brittle to changes in the trunk code. Ideally it would be nice to be able to handle certain fq parameters in a completely different way, perhaps using a custom query parser, but I haven't wrapped my head around how those work. Does any of this sound remotely doable? Any advice? The other suggestion we are looking at was given to us by our current search provider, which is to index the skus themselves. It looks as if we may be able to make this work using the field collapsing patch from SOLR-236. I have some concerns about this approach though: 1) It will make for a much larger index and longer indexing times (products can have 10 or more skus in our catalog). 2) Because the indexing will be copying the description and name from the product it will be indexing the same content more than once, and the number of times per product will vary based on the number of skus. I'm concerned that this may skew the scoring algorithm, in particular the inverse frequency part. 3) I'm not sure about the performance of the field collapsing patch, I've read contradictory reports on the web. I apologize if this is a bit rambling. If anyone has any advice for our situation it would be very helpful. Thanks, Eric