On Mon, 2012-10-08 at 08:42 +0200, Torben Honigbaum wrote: > sorry, my fault. This was one of my first ideas. My problem is, that > I've 1.000.000 documents, each with about 20 attributes. Additionally > each document has between 200 and 500 option-value pairs. So if I > denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 / > 2) = 350.000.000 documents, each with 20 attributes.
If you have a few hundred or less distinct primary attributes (the A, B, C's in your example), you could create a new field for each of them: </doc> <str name="id">3</str> <str name="options">A B C D</str> <str name="option_A">200</str> <str name="option_B">400</str> <str name="option_C">240</str> <str name="option_D">310</str> ... ... </doc> Query for "options:A" and facet on field "option_A" to get facets for the specific field. This normalization does increase the index size due to duplicated secondary values between the option-fields, but since our assumption is a relatively small amount of primary values, it should not be too much. Alternatively, if you have many distinct primary attributes, index the pairs as Jack suggests: </doc> <str name="id">3</str> <str name="options">A B C D</str> <str name="option">A=200</str> <str name="option">B=400</str> <str name="option">C=240</str> <str name="option">D=310</str> ... ... </doc> Query for "options:A" and facet on field "option" with field.prefix="A=". Your result will be A=200 (2), A=450 (1)... so you'll have to strip "<whatever>=" before display. This normalization is potentially a lot heavier than the previous one, as we have distinct_primaries * distinct_secondaries distinct values. Worst case, where every document only contains distinct combinations of primary/secondary, we have 350M distinct option-values, which is quite heavy for a single box to facet on. Whether that is better or worse that 350M documents, I don't know. > Is denormalization the only way to handle this problem? I What you are trying to do does look quite a lot like hierarchical faceting, which Solr does not support directly. But even if you apply one of the experimental patches, it does not mitigate the potential combinatorial explosion of your primary & secondary values. So that leaves the question: How many distinct combinations of primary and secondary values do you have? Regards, Toke Eskildsen