Hi Staszek: I added the parameter as you suggested. (LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent section that describes the Clustering module Changing the value of the parameter did not have any effect on my search results.
However, when I used the Carrot2 workbench, I could see the effect of changing the value. (from 6 clusters it went down to 2 clusters) here is the XML snippet for the searchComponent: <searchComponent name="clusteringComponent" enable="${solr.clustering.enabled:false}" class="org.apache.solr.handler.clustering.ClusteringComponent" > <!-- Declare an engine --> <lst name="engine"> <!-- The name, only one can be named "default" --> <str name="name">default</str> <!-- Class name of Carrot2 clustering algorithm. Currently available algorithms are: * org.carrot2.clustering.lingo.LingoClusteringAlgorithm * org.carrot2.clustering.stc.STCClusteringAlgorithm See http://project.carrot2..org/algorithms.html <http://project.carrot2.org/algorithms.html> for the algorithm's characteristics. --> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <!-- Overriding values for Carrot2 default algorithm attributes. For a description of all available attributes, see: http://download.carrot2.org/stable/manual/#chapter.components. Use attribute key as name attribute of str elements below. These can be further overridden for individual requests by specifying attribute key as request parameter name and attribute value as parameter value. --> <str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str> <str name="LingoClusteringAlgorithm.clusterMergingThreshold">0.0</str> </lst> </searchComponent> I would appreciate any insights into this behavior. Thanks Ramdev On Mar 30, 2011, at 11:51 AM, Stanislaw Osinski wrote: Hi Ramdev, Both of the clustering algorithms that ship with Solr (Lingo and STC) are designed to allow one document to appear in more than one cluster, which actually does make sense in many scenarios. There's no easy way to force them to produce hard clusterings because this would require a complete change in the way the algorithms work. If you need each document to belong to exactly one cluster, you'd have to post-process the clusters to remove the redundant document assignments. Alternatively, in case of the Lingo algorithm, you can try lowering the "LingoClusteringAlgorithm.clusterMergingThreshold" to some value in the range of 0.2--0.5. If you do that, clusters containing overlapping documents will get merged. For more information about this attribute, see here: http://download.carrot2.org/stable/manual/#section.attribute.LingoClusteringAlgorithm.clusterMergingThreshold. Cheers, Staszek On Wed, Mar 30, 2011 at 18:21, Markus Jelsma <markus.jel...@openindex.io> wrote: Yes, you can set engine specific parameters. Check the comments in your snippety. > Hi: > I recently included the CLustering component into Solr and updated the > requestHandler accordingly (in solrconfig.xml). Snippet of the Config for > the CLuserting: > > <searchComponent > name="clusteringComponent" > enable="${solr.clustering.enabled:false}" > class="org.apache.solr.handler.clustering.ClusteringComponent" > > <!-- Declare an engine --> > <lst name="engine"> > <!-- The name, only one can be named "default" --> > <str name="name">default</str> > <!-- > Class name of Carrot2 clustering algorithm. Currently available > algorithms are: > > * org.carrot2.clustering.lingo.LingoClusteringAlgorithm > * org.carrot2.clustering.stc.STCClusteringAlgorithm > > See http://project.carrot2.org/algorithms.html for the > algorithm's characteristics. --> > <str > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgori > thm</str> <!-- > Overriding values for Carrot2 default algorithm attributes. For > a description of all available attributes, see: > http://download.carrot2.org/stable/manual/#chapter.components. Use > attribute key as name attribute of str elements below. These can be > further overridden for individual requests by specifying attribute key as > request parameter name and attribute value as parameter value. > --> > <str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str> > </lst> > <lst name="engine"> > <str name="name">stc</str> > <str > name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm< > /str> </lst> > </searchComponent> > > snippet of the Config for requestHandler > <requestHandler name="standard" class="solr.SearchHandler" > default="true"> <!-- default values for query parameters --> > <lst name="defaults"> > <str name="echoParams">explicit</str> > <!-- > <int name="rows">10</int> > <str name="fl">*</str> > <str name="version">2.1</str> > --> > <bool name="clustering">true</bool> > <str name="clustering.engine">default</str> > <bool name="clustering.results">true</bool> > <!-- The title field --> > <str name="carrot.title">headline</str> > <str name="carrot.url">pi</str> > <!-- The field to cluster on --> > <str name="carrot.snippet">headline</str> > <!-- produce summaries --> > <bool name="carrot.produceSummary">true</bool> > <!-- the maximum number of labels per cluster --> > <!--<int name="carrot.numDescriptions">5</int>--> > <!-- produce sub clusters --> > <bool name="carrot.outputSubClusters">false</bool> > </lst> > <arr name="last-components"> > <str>clusteringComponent</str> > </arr> > </requestHandler> > > > When I perform a search, I see that the Cluster section within the Solr > results shows me results that are not quite consistent. There are two > documents that are reported in two different documents > > Are there parameters that can be set that will prevent this from happening > ? > > > Thanks much > > Ramdev