I see now that the TermsComponent<http://wiki.apache.org/solr/TermsComponent> supply a lot of the data I was looking for.
// Per 2012/11/20 Per Fredelius <per.fredel...@gmail.com> > Hello Solr users, > > I'm new at using Solr, working with it for my thesis. I have a > configuration up and running, doing the basic stuff, data import, running > queries from a web front end and some faceting. I may still be a bit off on > the faceting terminology but here goes. > > *What my set up is doing at the moment:* > In addition to a small number of static fields that are common to all > articles there is a large variety of dynamic fields with names such as > "p_Material" or "p_Secondary_color_scheme". This is neatly dealt with in > the schema using dynamic fields with a "p_*" wildcard. And while each > article may have a small number of such properties, say 0 to 20, the total > number of unique properties are quite large, say >1000. For a single result > set of ~20 I get sometimes 100 different fields or more. Each field can in > turn have +100 possible values throughout the database. > > *What I'm looking to accomplish:* > I want the user to be able to select from relevant properties > and property values, adding them iteratively/interactively to the query to > refine the result set. > > *How I do this at the moment:* > I scrape field names from the result set and display them in a side bar. > The user may click a field name to 'expand it'. When expansion happens, a > new request is sent to solr, asking for facets of that particular field (or > is it 'values of that particular facet' in IR-speak?), and so the field UI > component is expanded to show the applicable field values. > > """ > p_some_property_1 > p_Material [expanded] > > Concrete > > Glass > > Wood > > Cotton > p_Secondary_color_scheme > p_SomeProperty_31 > p_Battery_type > p_length > p_... > ... > """ > > *Problems with my current approach:* > 1. I don't have any good idea on how to apply *relevancy sorting on my > list of field names*. Currently the user has to comb through a large > number of field names in a plain list format. > I only have 'frequency in result set' as a metric at the moment. There > may be better metrics that take the whole document database into account. > Also, I haven't found a reasonable way to query Solr for field names > relevant to a query. Perhaps I'm overlooking some obvious feature for this > use case? > > 2. It would be nice to *apply clustering to the field names*, so that I > may order them into sub directories in the UI and so that I can retrieve > field names that are relevant but not in the result set. > I have a vague idea how this could be done and it seems to me that > field names would be a very good candidate set for clustering. I could > cluster them according to what documents they appear in. Field names > appearing in the same document would be closely connected. Although I don't > know where to begin in practical terms. What would be the best approach? > Should I make a plugin replacing the default clustering component? Should I > create a separate index, or separate core? I'm thinking creating documents > for each field name with article identifiers as document content. > Has this been done before? Am I heading into a dead end? > > *Late edit: * > Another perhaps obvious addition that I could make would be to store all > field names of each article in a separate 'field names' field, allowing > facet queries "one level up". I'm at the moment uncertain what > possibilities that would allow though. > > // Thanks for any feedback > Per >