s is here:
http://doc.carrot2.org/#section.component.lingo.
Stanislaw
--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com
On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> I'm trying to increase the number of cluster result to be shown
Hi,
> I have a Solr instance using the clustering component (with the Lingo
> algorithm) working perfectly. However when I get back the cluster results
> only the ID's of these come back with it. What is the easiest way to
> retrieve full documents instead? Should I parse these IDs into a new que
Hi Sebastián,
Looking quickly through the code of the clustering component, there's
currently no way to output only clusters. Let me see if this can be easily
implemented.
Stanislaw
--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com
On Tue, May 6, 2014 at 6:
> Thank you Ahmet, Staszek and Tomnaso ;)
> so the only way to obtain offline Clustering is to move to a customisation
> !
> I will take a look to the interface of the API ( If you can give me a link
> to the class, it will be appreciated, If not I will find it by myself .
>
The API stub is
the or
>
> Thats weird. As far as I know there is no such thing. There is
> classification stuff but I haven't heard of clustering.
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
I think the wording on the wiki page needs some clarification -- Solr
cont
> Thanks, I'm new to the clustering libraries. I finally made this
> connection when I started browsing through the carrot2 source. I had
> pulled down a smaller MM document collection from our test environment. It
> was not ideal as it was mostly structured, but small. I foolishly thought
> I
--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com
On Thu, Oct 17, 2013 at 11:49 PM, youknow...@heroicefforts.net <
youknow...@heroicefforts.net> wrote:
> Would someone help me out with the syntax for setting
> Tokenizer.documentFields in the ClusteringComp
> I mean measuring the similarity between the document in each cluster.
> Also, difference between document on one cluster with another cluster.
>
> I saw the sample code ClusteringQualityBencmark.java
> However, I do not know how to make use of it for assessing my Solr
> Clustering performance.
>
> Was the picture generated using Lingo 3G algorihtms?
> I saw some sub-clusters inside it.
> Nice pic :)
>
That is correct.
I am interested to learn it.
> How long is the Lingo 3G trial period?
>
I'll send you the details in a private e-mail in a second.
> Is there any way to programmatical
1
Staszek
--
Stanislaw Osinski
http://carrotsearch.com
On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Hi all:
>
> I'm thinking on using nutch combined with solr to index some news sites in
> an intranet. And I was wondering
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com
I have very huge solr index. I want to tag all documents with terms that
> better represent that document like this
> <
> http://search.carrotsearch.com/carrot2-webapp/search?source=web&view=folders&am
>
> Is there any workaround in Solr/Carrot2 So that we could pass tokens that'd
> been filtered with customer tokenizer/filters instead of rawtext that it
> currently
> uses for clustering ?
>
> I read an issue in following link too .
>
> https://issues.apache.org/jira/browse/SOLR-2917
>
>
> Is wri
>
> 3) Measure the size of the index folder, multiply with 8 to get a clue of
>> total index size
>>
> With 12 000 docs my index folder size is: 33Mo
> ps: I use "solr.clustering.enabled=true"
Clustering is performed at search time, it doesn't affect the size of the
index (but obviously it does a
wrote:
> Le 20/05/2012 11:43, Stanislaw Osinski a écrit :
>
> Hi Bruno,
>>
>> Here's the wiki documentation for Solr's clustering component:
>>
>> http://wiki.apache.org/solr/**ClusteringComponent<http://wiki.apache.org/solr/ClusteringComponent>
>
feFieldAccessorImpl.**
> throwSetIllegalArgumentExcepti**on(UnsafeFieldAccessorImpl.**java:150)
>at sun.reflect.**UnsafeObjectFieldAccessorImpl.**set(**
> UnsafeObjectFieldAccessorImpl.**java:63)
>at java.lang.reflect.Field.set(**Field.java:657)
>at org.carrot2.uti
o make the
clustering component and Carrot2 JARs available to the context classloader
by copying them to WEB-INF/lib of the WAR.
Staszek
On Sun, May 20, 2012 at 6:16 PM, Stanislaw Osinski <
stanislaw.osin...@carrotsearch.com> wrote:
> Interesting... let me investigate.
>
> S.
>
>
&g
eFieldAccessorImpl.**java:150)
>at sun.reflect.**UnsafeObjectFieldAccessorImpl.**set(**
> UnsafeObjectFieldAccessorImpl.**java:63)
>at java.lang.reflect.Field.set(**Field.java:657)
>at org.carrot2.util.attribute.**AttributeBinder$**
> AttributeBinderActionBind.
Hi Koji,
It's fixed in trunk and 3.6.1 branch now. If you hit any other issues with
this, let me know.
Staszek
On Sun, May 20, 2012 at 1:02 PM, Koji Sekiguchi wrote:
> Hi Staszek,
>
> I'll wait your fix. Thank you!
>
> Koji Sekiguchi from iPad2
>
> On 2012/05/
Hi Bruno,
Here's the wiki documentation for Solr's clustering component:
http://wiki.apache.org/solr/ClusteringComponent
For configuration examples, take a look at the Configuration section:
http://wiki.apache.org/solr/ClusteringComponent#Configuration.
If you hit any problems, let me know.
St
Hi Koji,
You're right, the current code overwrites the custom tokenizer though it
shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular
dependencies (Carrot2 default tokenizer depends on Lucene), but it
shouldn't be an issue with custom tokenizers.
I'll try to commit a fix later tod
)
> Just thought I'd document it somewhere for a proper fix to be done in the
> 4.0 release.
>
> No issues arose for me but then again Erick mentions it's only used in
> Carrot2 contrib which I'm not using in my deployment.
>
> Thanks for the help!
> Nick
>
&g
Hi Nick,
Which version of Solr do you have in mind? The official 3.x line or 4.0?
The quick and dirty fix to try would be to just replace Guava r05 with the
latest version, chances are it will work (we did that in the past though
the version number difference was smaller).
The proper fix would b
Hi,
Can you paste the logs from the second run?
Thanks,
Staszek
On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro wrote:
> On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote:
> > SEVERE: java.lang.NoClassDefFoundError:
> org/carrot2/core/ControllerFactory
> > at
);
}
Let me know if this did the trick.
Cheers,
S.
On Thu, Dec 1, 2011 at 10:43, Vadim Kisselmann
wrote:
> Hi Stanislaw,
> did you already have time to create a patch?
> If not, can you tell me please which lines in which class in source code
> are relevant?
> Thanks and regards
>
> But my actual live system works on solr 1.4.1. i can only change my
> solrconfig.xml and integrate new packages...
> i check the possibility to upgrade from 1.4.1 to 3.5 with the same index
> (without reinidex) with luceneMatchVersion 2.9.
> i hope it works...
>
Another option would be to chec
Hi,
It looks like some serialization issue related to writing integer ids to
the output. I've just tried a similar configuration on Solr 3.5 and the
integer identifiers looked fine. Can you try the same configuration on Solr
3.5?
Thanks,
Staszek
On Tue, Nov 29, 2011 at 12:03, Vadim Kisselmann
Hi,
You're right -- currently Carrot2 clustering ignores the Solr analysis
chain and uses its own pipeline. It is possible to integrate with Solr's
analysis components to some extent, see the discussion here:
https://issues.apache.org/jira/browse/SOLR-2917.
Staszek
> > Hi
> > Trying to use carr
Hi Pablo,
The reason clustering doesn't work with the "text" field is that the field
is not stored:
For clustering to work, you'll need to keep your documents' titles and
content in stored fields.
Staszek
On Fri, Aug 12, 2011 at 10:28, Pablo Queixalos wrote:
> Hi,
>
>
>
>
>
> I am using s
The "docs" array contained in each cluster contains ids of documents
belonging to the cluster, so for each id you need to look up the document's
content, which comes earlier in the response (in the response/docs array).
Cheers,
Staszek
On Thu, Jun 30, 2011 at 11:50, Romi wrote:
> wanted to use
Hi Walter,
That makes sense, but this has always been a multi-core setup, so the paths
> have not changed, and the clustering component worked fine for core0. The
> only thing new is I have fine tuned core1 (to begin implementing it).
> Previously the solrconfig.xml file was very basic. I replaced
>
> I am asking about the filter after clustering . Faceting is based on the
> single field so,if we need to filter we can search in related field . But
> in clustering it is created by multiple field then how can we create a
> filter for that.
>
> Example
>
> after clusetring you get the foll
It looks like the whole clustering component JAR is not in the classpath. I
remember that I once dealt with a similar issue in Solr 1.4 and the cause
was the relative path of the tag being resolved against the core's
instanceDir, which made the path incorrect when directly copying and pasting
from
Hi,
Can you post the full strack trace? I'd need to know if it's
really org.apache.solr.handler.clustering.ClusteringComponent that's missing
or some other class ClusteringComponent depends on.
Cheers,
Staszek
On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight <
walter.p.closenflei...@gmail.co
>
> and my second question is does clustering effect indexes.
>
No, it doesn't. Clustering is performed only on the search results produced
by Solr, it doesn't change anything in the index.
Cheers,
Staszek
I don't quite follow, I must admit. Maybe it's faceting you're after?
http://wiki.apache.org/solr/SolrFacetingOverview
Staszek
On Wed, Jun 22, 2011 at 08:40, nilay@gmail.com wrote:
> Can you please tell me how can i apply filter in cluster data in Solr ?
>
> Currently i storing docid and
Hi,
Currently, only the clustering of search results is implemented in Solr,
clustering of the whole index is not possible out of the box. In other
words, clustering applies only to the records you fetch during searching.
For example, if you set rows=10, only the 10 returned documents will be
clus
>
> Is it possible to use the clustering component to use predefined clusters
> generated by Mahout?
Actually, the existing Solr ClusteringComponent's API has been designed to
deal with both search results clustering (implemented by Carrot2) and
off-line clustering of the whole index. The latter
Hi Bryan,
You'll also need to make sure the your
${solr.dir}/contrib/clustering/lib directory is in the classpath; that
directory contains the Carrot2 JARs that
provide the classes you're missing. I think the example solrconfig.xml
has the relevant declarations.
Cheers,
S.
On Tue, Jun 7, 2011
Hi Bryan,
You'll also need to make sure the your ${solr.home}/contrib/clustering/lib
directory is in the classpath; that directory contains the Carrot2 JARs that
provide the classes you're missing. I think the example solrconfig.xml has
the relevant declarations.
Cheers,
S.
On Tue, Jun 7, 2011
>
> Both of the clustering algorithms that ship with Solr (Lingo and STC) are
>> designed to allow one document to appear in more than one cluster, which
>> actually does make sense in many scenarios. There's no easy way to force
>> them to produce hard clusterings because this would require a comp
ct. However, I am happy that by adding the threshold to my request URL
> produces the desired results
>
> let me know if I can do any more tests and I will do so. Thanks much
>
> Ramdev
>
>
>
> On Mar 31, 2011, at 10:18 AM, Stanislaw Osinski wrote:
>
> I added the parameter as you suggested.
> (LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent
> section that describes the Clustering module
> Changing the value of the parameter did not have any effect on my search
> results.
>
> However, when I used the Carrot2 wor
> Both of the clustering algorithms that ship with Solr (Lingo and STC) are
> designed to allow one document to appear in more than one cluster, which
> actually does make sense in many scenarios. There's no easy way to force
> them to produce hard clusterings because this would require a complete
Hi Ramdev,
Both of the clustering algorithms that ship with Solr (Lingo and STC) are
designed to allow one document to appear in more than one cluster, which
actually does make sense in many scenarios. There's no easy way to force
them to produce hard clusterings because this would require a compl
Hi,
I think the exception is caused by the fact that you're trying to use the
latest version of Carrot2 with Solr 1.4.x. There are two alternative
solutions here:
* as described in http://wiki.apache.org/solr/ClusteringComponent,
invoke "ant get-libraries"
to get the compatible JAR files.
or
*
ame", SolrQuery.ORDER.asc);
the results should be sorted in first queue by 'type' (only one letter 'A'
or 'B')
and then they should be sorted by names
how I can define hier 'OR' or 'AND' relations?
Best regards,
Stanislaw
2010/9/13 Dennis Gearo
nd there is only one time in index)
If I'm sorting only by one text field, I'm receiving "normal" results w/o
problems.
Where could I do a mistake, or is it a bug?
Best regards,
Stanislaw
> The solr schema has the fields, id, name and desc.
>
> I would like to get docs:["name Field here" ] instead of the doc Id
> field as in
> "docs":["200066", "195650",
>
The idea behind using the document ids was that based on them you could
access the individual documents' content, inc
the group (cluster) of documents.
The description is usually a phrase or a number of phrases. The "docs" field
lists the ids of documents that the algorithm assigned to the cluster.
Can you give an example of the input and output you'd expect?
Thanks!
Stanislaw
Hi all!
I cant load my custom queries from the external file, as written here:
https://issues.apache.org/jira/browse/SOLR-784
This option is seems to be not implemented in current version 1.4.1 of Solr.
It was deleted or it comes first with new version?
regards,
Stanislaw
> The patch should also work with trunk, but I haven't verified it yet.
>
I've just added a patch against solr trunk to
https://issues.apache.org/jira/browse/SOLR-1804.
S.
Hi Matt,
I'm attempting to get the carrot based clustering component (in trunk) to
> work. I see that the clustering contrib has been disabled for the time
> being. Does anyone know if this will be re-enabled soon, or even better,
> know how I could get it working as it is?
>
I've recently create
Hi,
In my SolrJ, I used ModifiableSolrParams and I set ("rows",50) but it
> still returns less than 10 for each cluster.
>
Oh, the number of documents per cluster very much depends on the
characteristics of your documents, it often happens that the algorithms
create larger numbers of smaller clus
Hi,
I am attempting to cluster a query. It kinda works, but where my
> (regular) query returns 500 results the cluster only shows 1-10 hits for
> each cluster (5 clusters). Never more than 10 docs and I know its not
> right. What could be happening here? It should be showing dozens of
> documents
ngine from Carrot Search.
Thanks!
Dawid Weiss, Stanislaw Osinski
Carrot Search, i...@carrot-search.com
Hi,
It might be also interesting to add some logging of clustering time (just
filed: https://issues.apache.org/jira/browse/SOLR-1809) to see what the
index search vs clustering proportions are.
Cheers,
S.
On Fri, Mar 5, 2010 at 03:26, Erick Erickson wrote:
> Search time is only partially depen
> I'll give a try to stopwords treatbment, but the problem is that we
> perform
> POS tagging and then use payloads to keep only Nouns and Adjectives, and we
> thought that could be interesting to perform clustering only with these
> elements, to avoid senseless words.
>
POS tagging could help a
Hi Joan,
I'm trying to use carrot2 (now I started with the workbench) and I can
> cluster any field, but, the text used for clustering is the original raw
> text, the one that was indexed, without any of the processing performed by
> the tokenizer or filters.
> So I get stop words.
>
The easiest
at
i...@carrotsearch.com for details.
Carrot Search Labs shares some small pieces of software we created when
working on Carrot2 and Lingo3G. Please see http://labs.carrotsearch.com for
details and downloads.
Thanks!
Dawid Weiss, Stanislaw Osinski
Carrot Search, i...@carrot-search.com
> You need, in addition to the ones shipped:
> http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
> http://download.carrot2.org/maven2/org/carrot2/nni/1.0.0/nni-1.0.0.jar
>
> http://mirrors.ibiblio.org/pub/mirrors/maven2/org/simpleframework/simple-xml/1.7.3/simple-xml-1.7.3.jar
> http://r
-new-clustering-capabilities/
)
Release notes:
http://project.carrot2.org/release-3.1.0-notes.html
On-line demo:
http://search.carrot2.org
Download:
http://download.carrot2.org
Project website:
http://project.carrot2.org
Thanks,
Staszek
--
Stanislaw Osinski, http://carrot2.org
Hi,
It seems like the problem can be on two layers: 1) getting the right
contents of stop* files for Carrot2, 2) making sure Solr picks up the
changes.
I tried your quick and dirty hack too. It didn't work also. phase like
> "Carbon Atoms in the Group" with "in" still appear in my clustering labe
Hi there,
I try to apply the stoplabels with the instructions that you given in the
> solr clustering Wiki. But it didn't work.
>
> I am runing the patched solr on tomcat. So to enable the stop label. I add
> "-cp " in to my system's CATALINA_OPTS. I
> tried to change the file name from stoplabels
Hi,
On Thu, Aug 13, 2009 at 19:29, Mark Bennett wrote:
There are comments in the Solr materials about having an option to cluster
> based on the entire document set, and some warning about this being
> atypical
> and possibly slow. And from what you're saying, for a big enough docset,
> it
> mi
Hi,
On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote:
Carrot2 has several pluggable algorithms to choose from, though I have no
> evidence that they're "better" than Lucene's. Where TF/IDF is sort of a
> one
> step algebraic calculation, some clustering algorithms use iterative
> approaches, e
Hi,
Sorry for being late to the party, let me try to clear some doubts about
Carrot2.
Do you know under what circumstances or application should we cluster the
> whole corpus of documents vs just the search results?
I think it depends on what you're trying to achieve. If you'd like to give
the
>
> Hmm, I saw the comment in ClusteringDocumentList.java of Carrot2:
>
> /*
> * If you know what query generated the documents you're about to cluster,
> pass
> * the query to the algorithm, which will usually increase clustering
> quality.
> */
> attributes.put(AttributeNames.QUERY, "data mining"
>
> 1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
>> via attributes. Is it OK?
>>
>
> Yes, it only clusters on the Doc List, not the Doc Set (in other words,
> it's your rows that matter)
Just to add to that: Carrot2 should be able to cluster up to ~1000 search
results, but b
Hi there,
> Is it possbile to specify more than one snippet field or should I use copy
> field to copy copy two or three field into single field and specify it in
> snippet field.
Currently, you can specify only one snippet field, so you'd need to use
copy.
Cheers,
S.
Hi.
> I built Solr from SVN today morning. I am using Clustering example. I
> have added my own schema.xml.
>
> The problem is the even though I change carrot.snippet field from
> features to filecontent the clustering results are not changed a bit.
> Please note features field is also there in m
>
> How would we enable people via SOLR-769 to do this?
Good point, Grant! To apply the modified stopwords.* and stoplabels.* files
to Solr, simply make them available in the classpath. For the example Solr
runner scripts that would be something like:
java -cp
-Dsolr.solr.home=./clustering/solr
Hi Antonio,
> To answer your question in terms of minimum term is, I am working with
> "joke text" very short in length so the clusters are not so meaning full.. I
> mean lot of adverbs and nouns, I thought increasing it might give me less
> cluster but bit more meaningful (maybe not).
Clusteri
Hi Antonio,
- is there anyway to have minimum number of labels per cluster?
The current search results clustering algorithms (from Carrot2) by design
generate one label per cluster, so there is no way to force them to create
more. What is the reason you'd like to have more labels per cluster?
I
generic Solr access UI would be great.
>
> Lance
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stanislaw
> Osinski
> Sent: Saturday, August 18, 2007 2:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr + carrot2
upose implementation by cloning the Lucene implementation.
I'm not sure if I'm getting you right here... By "implementation" do you
mean adding to the Swing application an option for pulling data from Solr
(with a configuration dialog for Solr URL etc.)?
Thanks,
Stanislaw
A.
Thanks,
Stanislaw
--
Stanislaw Osinski, [EMAIL PROTECTED]
http://www.carrot-search.com
On 17/08/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
>
> Any updates on this? It certainly would be quite interesting to see how
> well carrot2 clustering can be integrated with solr,
Hi All,
A bit of self-promotion again :) I hope you don't find it out of topic,
after all, some folks are using Carrot2 with Lucene and Solr, and Nutch has
a Carrot2-based clustering plugin.
Staszek
[EMAIL PROTECTED]
___
>
> Has anyone looked into using carrot2 clustering with solr?
>
> I know this is integrated with nutch:
>
> http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/carrot2/Clusterer.html
>
> It looks like carrot has support to read results from a solr index:
>
> http://demo.carrot2.org/
78 matches
Mail list logo