Michael,

Thanks for the update! I definitely need to get a 1.4 build see if it makes
a difference.

BTW, maybe instead of using faceting for text
mining/clustering/visualization purpose, we can build a separate feature in
SOLR for this. Many of commercial search engines I have experiences with
(Google Search Appliance, Vivisimo etc) provide dynamic term clustering
based on top N ranked documents (N is a parameter can be configured). When
facet field is highly fragmented (say a text field), the existing set
intersection based approach might no longer be optimum. Aggregating term
vectors over top N docs might be more attractive. Another features I can
really appreciate is to provide search time n-gram term clustering. Maybe
this might be better suited for "spell checker" as it just a different way
to display the alternative search terms.

-Yao


Michael Ludwig-4 wrote:
> 
> Yao Ge schrieb:
> 
>> The facet query is considerably slower comparing to other facets from
>> structured database fields (with highly repeated values). What I found
>> interesting is that even after I constrained search results to just a
>> few hunderd hits using other facets, these text facets are still very
>> slow.
>>
>> I understand that text fields are not good candidate for faceting as
>> it can contain very large number of unique values. However why it is
>> still slow after my matching documents is reduced to hundreds? Is it
>> because the whole filter is cached (regardless the matching docs) and
>> I don't have enough filter cache size to fit the whole list?
> 
> Very interesting questions! I think an answer would both require and
> further an understanding of how filters work, which might even lead to
> a more general guideline on when and how to use filters and facets.
> 
> Even though faceting appears to have changed in 1.4 vs 1.3, it would
> still be interesting to understand the 1.3 side of things.
> 
>> Lastly, what I really want to is to give user a chance to visualize
>> and filter on top relevant words in the free-text fields. Are there
>> alternative to facet field approach? term vectors? I can do client
>> side process based on top N (say 100) hits for this but it is my last
>> option.
> 
> Also a very interesting data mining question! I'm sorry I don't have any
> answers for you. Maybe someone else does.
> 
> Best,
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to