Kelly, This is a good question you have posed and illustrates a challenge with Solr's limited schema. I don't see how the dedup will help. I would continue with the SKU based approach and use this patch: https://issues.apache.org/jira/browse/SOLR-236 You'll collapse on the product id. My book, p.192, highlights this component as it existed when I wrote it but it has been updated since then.
A recent separate question by you on this list suggests you're going down this path. I would grab the attached SOLR-236.patch file and attempt to apply it to the 1.4 source. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jan 11, 2010, at 5:27 PM, Kelly Taylor wrote: > > I am in the process of building a Solr search solution for my application and > have run into a roadblock with the schema design. Trying to match criteria > in one multi-valued field with corresponding criteria in another > multi-valued field. Any advice would be greatly appreciated. > > BACKGROUND: > My RDBMS data model is such that for every one of my "Product" entities, > there are one-to-many "SKU" entities available for purchase. Each SKU entity > can have its own price, as well as one-to-many options, etc. The web > frontend displays available "Product" entities on both directory and detail > pages. > > In order to take advantage of Solr's facet count, paging, and sorting > functionality, I decided to base the Solr schema on "Product" documents; so > none of my documents currently contain duplicate "Product" data, and all > "SKU" related data is denormalized as necessary, but into multi-valued > fields. For example, I have a document with an "id" field set to > "Product:7," a "docType" field is set to "Product" as well as multi-valued > "SKU" related fields and data like, "sku_color" {Red | Green | Blue}, > "sku_size" {Small | Medium | Large}, "sku_price" {10.00 | 10.00 | 7.99} > > I hit the roadblock when I tried to answer the question, "Which products are > available that contain skus with color Green, size M, and a price of $9.99 > or less?"...and have now begun the switch to "SKU" level indexing. This > also gives me what I need for faceted browsing/navigation, and search > refinement...leading the user to "Product" entities having purchasable "SKU" > entities. But this also means I now have documents which are mostly > duplicates for each "Product," and all, facet counts, paging and sorting is > then inaccurate; so it appears I need do this myself, with multiple Solr > requests. > > Is this really the best approach; and if so, should I use the Solr > Deduplication update processor when indexing and querying? > > Thanks in advance, > Kelly > -- > View this message in context: > http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html > Sent from the Solr - User mailing list archive at Nabble.com. >