(Sorry for spamming) It does not solve the whole issue though. I'm still looking for a way to "cluster the terms of a field".
2012/11/20 Per Fredelius <per.fredel...@gmail.com> > I see now that the TermsComponent<http://wiki.apache.org/solr/TermsComponent> > supply > a lot of the data I was looking for. > > // Per > > > 2012/11/20 Per Fredelius <per.fredel...@gmail.com> > >> Hello Solr users, >> >> I'm new at using Solr, working with it for my thesis. I have a >> configuration up and running, doing the basic stuff, data import, running >> queries from a web front end and some faceting. I may still be a bit off on >> the faceting terminology but here goes. >> >> *What my set up is doing at the moment:* >> In addition to a small number of static fields that are common to all >> articles there is a large variety of dynamic fields with names such as >> "p_Material" or "p_Secondary_color_scheme". This is neatly dealt with in >> the schema using dynamic fields with a "p_*" wildcard. And while each >> article may have a small number of such properties, say 0 to 20, the total >> number of unique properties are quite large, say >1000. For a single result >> set of ~20 I get sometimes 100 different fields or more. Each field can in >> turn have +100 possible values throughout the database. >> >> *What I'm looking to accomplish:* >> I want the user to be able to select from relevant properties >> and property values, adding them iteratively/interactively to the query to >> refine the result set. >> >> *How I do this at the moment:* >> I scrape field names from the result set and display them in a side bar. >> The user may click a field name to 'expand it'. When expansion happens, a >> new request is sent to solr, asking for facets of that particular field (or >> is it 'values of that particular facet' in IR-speak?), and so the field UI >> component is expanded to show the applicable field values. >> >> """ >> p_some_property_1 >> p_Material [expanded] >> > Concrete >> > Glass >> > Wood >> > Cotton >> p_Secondary_color_scheme >> p_SomeProperty_31 >> p_Battery_type >> p_length >> p_... >> ... >> """ >> >> *Problems with my current approach:* >> 1. I don't have any good idea on how to apply *relevancy sorting on my >> list of field names*. Currently the user has to comb through a large >> number of field names in a plain list format. >> I only have 'frequency in result set' as a metric at the moment. >> There may be better metrics that take the whole document database into >> account. Also, I haven't found a reasonable way to query Solr for field >> names relevant to a query. Perhaps I'm overlooking some obvious feature for >> this use case? >> >> 2. It would be nice to *apply clustering to the field names*, so that I >> may order them into sub directories in the UI and so that I can retrieve >> field names that are relevant but not in the result set. >> I have a vague idea how this could be done and it seems to me that >> field names would be a very good candidate set for clustering. I could >> cluster them according to what documents they appear in. Field names >> appearing in the same document would be closely connected. Although I don't >> know where to begin in practical terms. What would be the best approach? >> Should I make a plugin replacing the default clustering component? Should I >> create a separate index, or separate core? I'm thinking creating documents >> for each field name with article identifiers as document content. >> Has this been done before? Am I heading into a dead end? >> >> *Late edit: * >> Another perhaps obvious addition that I could make would be to store all >> field names of each article in a separate 'field names' field, allowing >> facet queries "one level up". I'm at the moment uncertain what >> possibilities that would allow though. >> >> // Thanks for any feedback >> Per >> > >