Re: Apply clustering to field names?

Per Fredelius Tue, 20 Nov 2012 06:16:50 -0800

I see now that the
TermsComponent<http://wiki.apache.org/solr/TermsComponent> supply
a lot of the data I was looking for.


// Per

2012/11/20 Per Fredelius <per.fredel...@gmail.com>

> Hello Solr users,
>
> I'm new at using Solr, working with it for my thesis. I have a
> configuration up and running, doing the basic stuff, data import, running
> queries from a web front end and some faceting. I may still be a bit off on
> the faceting terminology but here goes.
>
> *What my set up is doing at the moment:*
> In addition to a small number of static fields that are common to all
> articles there is a large variety of dynamic fields with names such as
> "p_Material" or "p_Secondary_color_scheme". This is neatly dealt with in
> the schema using dynamic fields with a "p_*" wildcard. And while each
> article may have a small number of such properties, say 0 to 20, the total
> number of unique properties are quite large, say >1000. For a single result
> set of ~20 I get sometimes 100 different fields or more. Each field can in
> turn have +100 possible values throughout the database.
>
> *What I'm looking to accomplish:*
> I want the user to be able to select from relevant properties
> and property values, adding them iteratively/interactively to the query to
> refine the result set.
>
> *How I do this at the moment:*
> I scrape field names from the result set and display them in a side bar.
> The user may click a field name to 'expand it'. When expansion happens, a
> new request is sent to solr, asking for facets of that particular field (or
> is it 'values of that particular facet' in IR-speak?), and so the field UI
> component is expanded to show the applicable field values.
>
> """
> p_some_property_1
> p_Material [expanded]
>   >  Concrete
>   >  Glass
>   >  Wood
>   >  Cotton
> p_Secondary_color_scheme
> p_SomeProperty_31
> p_Battery_type
> p_length
> p_...
> ...
> """
>
> *Problems with my current approach:*
> 1. I don't have any good idea on how to apply *relevancy sorting on my
> list of field names*. Currently the user has to comb through a large
> number of field names in a plain list format.
>     I only have 'frequency in result set' as a metric at the moment. There
> may be better metrics that take the whole document database into account.
> Also, I haven't found a reasonable way to query Solr for field names
> relevant to a query. Perhaps I'm overlooking some obvious feature for this
> use case?
>
> 2. It would be nice to *apply clustering to the field names*, so that I
> may order them into sub directories in the UI and so that I can retrieve
> field names that are relevant but not in the result set.
>     I have a vague idea how this could be done and it seems to me that
> field names would be a very good candidate set for clustering. I could
> cluster them according to what documents they appear in. Field names
> appearing in the same document would be closely connected. Although I don't
> know where to begin in practical terms. What would be the best approach?
> Should I make a plugin replacing the default clustering component? Should I
> create a separate index, or separate core? I'm thinking creating documents
> for each field name with article identifiers as document content.
>     Has this been done before? Am I heading into a dead end?
>
> *Late edit: *
> Another perhaps obvious addition that I could make would be to store all
> field names of each article in a separate 'field names' field, allowing
> facet queries "one level up". I'm at the moment uncertain what
> possibilities that would allow though.
>
> // Thanks for any feedback
> Per
>

Re: Apply clustering to field names?

Reply via email to