Thanks for your recommendation Toke.
Will try to ask in the carrot forum.
Regards,
Edwin
On 26 August 2015 at 18:45, Toke Eskildsen wrote:
> On Wed, 2015-08-26 at 15:47 +0800, Zheng Lin Edwin Yeo wrote:
>
> > Now I've tried to increase the carrot.fragSize to 75 and
> > carrot.summarySnippets t
On Wed, 2015-08-26 at 15:47 +0800, Zheng Lin Edwin Yeo wrote:
> Now I've tried to increase the carrot.fragSize to 75 and
> carrot.summarySnippets to 2, and set the carrot.produceSummary to
> true. With this setting, I'm mostly able to get the cluster results
> back within 2 to 3 seconds when I set
Hi Toke,
Thank you for the link.
I'm using Solr 5.2.1 but I think the carrot2 bundled will be slightly older
version, as I'm using the latest carrot2-workbench-3.10.3, which is only
released recently. I've changed all the settings like fragSize and
desiredCluserCountBase to be the same on both si
On Wed, 2015-08-26 at 10:10 +0800, Zheng Lin Edwin Yeo wrote:
> I'm currently trying out on the Carrot2 Workbench and get it to call Solr
> to see how they did the clustering. Although it still takes some time to do
> the clustering, but the results of the cluster is much better than mine. I
> thin
Hi Toke,
Thank you for your reply.
I'm currently trying out on the Carrot2 Workbench and get it to call Solr
to see how they did the clustering. Although it still takes some time to do
the clustering, but the results of the cluster is much better than mine. I
think its probably due to the differe
On Tue, 2015-08-25 at 10:40 +0800, Zheng Lin Edwin Yeo wrote:
> Would like to confirm, when I set rows=100, does it mean that it only build
> the cluster based on the first 100 records that are returned by the search,
> and if I have 1000 records that matches the search, all the remaining 900
> rec
Thank you Upayavira for your reply.
Would like to confirm, when I set rows=100, does it mean that it only build
the cluster based on the first 100 records that are returned by the search,
and if I have 1000 records that matches the search, all the remaining 900
records will not be considered for c
I honestly suspect your performance issue is down to the number of terms
you are passing into the clustering algorithm, not to memory usage as
such. If you have *huge* documents and cluster across them, performance
will be slower, by definition.
Clustering is usually done offline, for example on a
Hi Alexandre,
I've tried to use just index=true, and the speed is still the same and not
any faster. If I set to store=false, there's no results that came back with
the clustering. Is this due to the index are not stored, and the clustering
requires indexed that are stored?
I've also increase my
Yes, I'm using store=true.
However, this field needs to be stored as my program requires this field to
be returned during normal searching. I tried the lazyLoading=true, but it's
not working.
Will you do a copy field for the content, and not to set stored="true" for
that field. So that field wil
We use 8gb to 10gb for those size indexes all the time.
Bill Bell
Sent from mobile
> On Aug 23, 2015, at 8:52 AM, Shawn Heisey wrote:
>
>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> Hi Shawn,
>>
>> Yes, I've increased the heap size to 4GB already, and I'm using a machine
>> with 32
unsubscribe
On Sat, Aug 22, 2015 at 9:31 PM, Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
>
> However, I find that clustering is exceeding slow after I index this 1GB of
> data. It took almost 30 seconds to return the cluster results wh
And be aware that I'm sure the more terms in your documents, the slower
clustering will be. So it isn't just the number of docs, the size of
them counts in this instance.
A simple test would be to build an index with just the first 1000 terms
of your clustering fields, and see if that makes a diff
You're confusing clustering with searching. Sure, Solr can index
and lots of data, but clustering is essentially finding ad-hoc
similarities between arbitrary documents. It must take each of
the documents in the result size you specify from your result
set and try to find commonalities.
For perf i
Are you by any chance doing store=true on the fields you want to search?
If so, you may want to switch to just index=true. Of course, they will
then not come back in the results, but do you really want to sling
huge content fields around.
The other option is to do lazyLoading=true and not request
Hi Shawn and Toke,
I only have 520 docs in my data, but each of the documents is quite big in
size, In the Solr, it is using 221MB. So when i set to read from the top
1000 rows, it should just be reading all the 520 docs that are indexed?
Regards,
Edwin
On 23 August 2015 at 22:52, Shawn Heisey
On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
> Hi Shawn,
>
> Yes, I've increased the heap size to 4GB already, and I'm using a machine
> with 32GB RAM.
>
> Is it recommended to further increase the heap size to like 8GB or 16GB?
Probably not, but I know nothing about your data. How many So
Zheng Lin Edwin Yeo wrote:
> However, I find that clustering is exceeding slow after I index this 1GB of
> data. It took almost 30 seconds to return the cluster results when I set it
> to cluster the top 1000 records, and still take more than 3 seconds when I
> set it to cluster the top 100 record
Hi Shawn,
Yes, I've increased the heap size to 4GB already, and I'm using a machine
with 32GB RAM.
Is it recommended to further increase the heap size to like 8GB or 16GB?
Regards,
Edwin
On 23 Aug 2015 10:23, "Shawn Heisey" wrote:
> On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote:
> > I'm usin
On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote:
> I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
>
> However, I find that clustering is exceeding slow after I index this 1GB of
> data. It took almost 30 seconds to return the cluster results when I set it
> to cluster the top
Hi,
I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
However, I find that clustering is exceeding slow after I index this 1GB of
data. It took almost 30 seconds to return the cluster results when I set it
to cluster the top 1000 records, and still take more than 3 seconds when
21 matches
Mail list logo