The scope in there is to try to make clustering lighter and more related to
the query.
The summary produced is a fragment that is surrounding the query terms in
the document content.
Actually this is arguably a way to improve the quality of clusters, but for
sure it makes the clustering operation lighter, as the content used to
produce the clusters is much smaller than the full content.

We can discuss of course if the window of text surrounding queries match is
really helpful to cluster the documents in a more precise way.
That is not an easy research topic, and for sure it depends strictly on the
use cases.
For this reason a user should decide if going with the summary ( lighter)
approach or the more comprehensive , full content approach.

Cheers

2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:

> Thank you so much Alessandro.
>
> But i do not find any difference with the quality of the clustering results
> when I change the hl.fragszie to a  even though I've set my
> carrot.produceSummary to true.
>
>
> Regards,
> Edwin
>
>
> On 1 June 2015 at 17:31, Alessandro Benedetti <benedetti.ale...@gmail.com>
> wrote:
>
> > Only to clarify the initial mail, The carrot.fragSize has nothing to do
> > with the number of clusters produced.
> >
> > When you select to work with field summary ( you will work only on
> snippets
> > from the original content, snippets produced by the highlight of the
> query
> > in the content), the fragSize will specify the size of these fragments.
> >
> > From Carrot documentation :
> >
> > carrot.produceSummary
> >
> > When true, the carrot.snippet
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet> field
> > (if
> > no snippet field, then the carrot.title
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field)
> > will
> > be highlighted and the highlighted text will be used for clustering.
> > Highlighting is recommended when the snippet field contains a lot of
> > content. Highlighting can also increase the quality of clustering because
> > the clustered content will get an additional query-specific context.
> > carrot.fragSize
> >
> > The frag size to use for highlighting. Meaningful only when
> > carrot.produceSummary
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary>
> > is
> > true. If not specified, the default highlighting fragsize (hl.fragsize)
> > will be used. If that isn't specified, then 100.
> >
> >
> > Cheers
> >
> > 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> >
> > > Thank you Stanislaw for the links. Will read them up to better
> understand
> > > how the algorithm works.
> > >
> > > Regards,
> > > Edwin
> > >
> > > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > > stanislaw.osin...@carrotsearch.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > The number of clusters primarily depends on the parameters of the
> > > specific
> > > > clustering algorithm. If you're using the default Lingo algorithm,
> the
> > > > number of clusters is governed by
> > > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take
> a
> > > look
> > > > at the documentation (
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > > )
> > > > for some more details (the "Tweaking at Query-Time" section shows how
> > to
> > > > pass the specific parameters at request time). A complete overview of
> > the
> > > > Lingo clustering algorithm parameters is here:
> > > > http://doc.carrot2.org/#section.component.lingo.
> > > >
> > > > Stanislaw
> > > >
> > > > --
> > > > Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
> > > > http://carrotsearch.com
> > > >
> > > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to increase the number of cluster result to be shown
> > during
> > > > the
> > > > > search. I tried to set carrot.fragSize=20 but only 15 cluster
> labels
> > is
> > > > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> > > labels
> > > > > shown.
> > > > >
> > > > > Is this the correct way to do this? I understand that setting it to
> > 20
> > > > > might not necessary mean 20 lables will be shown, as the setting is
> > for
> > > > > maximum number. But when I set this to 5, it should reduce the
> number
> > > of
> > > > > labels to 5?
> > > > >
> > > > > I'm using Solr 5.1.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to