(Sorry for spamming) It does not solve the whole issue though. I'm still
looking for a way to "cluster the terms of a field".


2012/11/20 Per Fredelius <per.fredel...@gmail.com>

> I see now that the TermsComponent<http://wiki.apache.org/solr/TermsComponent> 
> supply
> a lot of the data I was looking for.
>
> // Per
>
>
> 2012/11/20 Per Fredelius <per.fredel...@gmail.com>
>
>> Hello Solr users,
>>
>> I'm new at using Solr, working with it for my thesis. I have a
>> configuration up and running, doing the basic stuff, data import, running
>> queries from a web front end and some faceting. I may still be a bit off on
>> the faceting terminology but here goes.
>>
>> *What my set up is doing at the moment:*
>> In addition to a small number of static fields that are common to all
>> articles there is a large variety of dynamic fields with names such as
>> "p_Material" or "p_Secondary_color_scheme". This is neatly dealt with in
>> the schema using dynamic fields with a "p_*" wildcard. And while each
>> article may have a small number of such properties, say 0 to 20, the total
>> number of unique properties are quite large, say >1000. For a single result
>> set of ~20 I get sometimes 100 different fields or more. Each field can in
>> turn have +100 possible values throughout the database.
>>
>> *What I'm looking to accomplish:*
>> I want the user to be able to select from relevant properties
>> and property values, adding them iteratively/interactively to the query to
>> refine the result set.
>>
>> *How I do this at the moment:*
>> I scrape field names from the result set and display them in a side bar.
>> The user may click a field name to 'expand it'. When expansion happens, a
>> new request is sent to solr, asking for facets of that particular field (or
>> is it 'values of that particular facet' in IR-speak?), and so the field UI
>> component is expanded to show the applicable field values.
>>
>> """
>> p_some_property_1
>> p_Material [expanded]
>>   >  Concrete
>>   >  Glass
>>   >  Wood
>>   >  Cotton
>> p_Secondary_color_scheme
>> p_SomeProperty_31
>> p_Battery_type
>> p_length
>> p_...
>> ...
>> """
>>
>> *Problems with my current approach:*
>> 1. I don't have any good idea on how to apply *relevancy sorting on my
>> list of field names*. Currently the user has to comb through a large
>> number of field names in a plain list format.
>>     I only have 'frequency in result set' as a metric at the moment.
>> There may be better metrics that take the whole document database into
>> account. Also, I haven't found a reasonable way to query Solr for field
>> names relevant to a query. Perhaps I'm overlooking some obvious feature for
>> this use case?
>>
>> 2. It would be nice to *apply clustering to the field names*, so that I
>> may order them into sub directories in the UI and so that I can retrieve
>> field names that are relevant but not in the result set.
>>     I have a vague idea how this could be done and it seems to me that
>> field names would be a very good candidate set for clustering. I could
>> cluster them according to what documents they appear in. Field names
>> appearing in the same document would be closely connected. Although I don't
>> know where to begin in practical terms. What would be the best approach?
>> Should I make a plugin replacing the default clustering component? Should I
>> create a separate index, or separate core? I'm thinking creating documents
>> for each field name with article identifiers as document content.
>>     Has this been done before? Am I heading into a dead end?
>>
>> *Late edit: *
>> Another perhaps obvious addition that I could make would be to store all
>> field names of each article in a separate 'field names' field, allowing
>> facet queries "one level up". I'm at the moment uncertain what
>> possibilities that would allow though.
>>
>> // Thanks for any feedback
>> Per
>>
>
>

Reply via email to