Re: Solr performance is slow with just 1GB of data indexed

2015-08-26 Thread Zheng Lin Edwin Yeo
Thanks for your recommendation Toke. Will try to ask in the carrot forum. Regards, Edwin On 26 August 2015 at 18:45, Toke Eskildsen wrote: > On Wed, 2015-08-26 at 15:47 +0800, Zheng Lin Edwin Yeo wrote: > > > Now I've tried to increase the carrot.fragSize to 75 and > > carrot.summarySnippets t

Re: Solr performance is slow with just 1GB of data indexed

2015-08-26 Thread Toke Eskildsen
On Wed, 2015-08-26 at 15:47 +0800, Zheng Lin Edwin Yeo wrote: > Now I've tried to increase the carrot.fragSize to 75 and > carrot.summarySnippets to 2, and set the carrot.produceSummary to > true. With this setting, I'm mostly able to get the cluster results > back within 2 to 3 seconds when I set

Re: Solr performance is slow with just 1GB of data indexed

2015-08-26 Thread Zheng Lin Edwin Yeo
Hi Toke, Thank you for the link. I'm using Solr 5.2.1 but I think the carrot2 bundled will be slightly older version, as I'm using the latest carrot2-workbench-3.10.3, which is only released recently. I've changed all the settings like fragSize and desiredCluserCountBase to be the same on both si

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
On Wed, 2015-08-26 at 10:10 +0800, Zheng Lin Edwin Yeo wrote: > I'm currently trying out on the Carrot2 Workbench and get it to call Solr > to see how they did the clustering. Although it still takes some time to do > the clustering, but the results of the cluster is much better than mine. I > thin

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Zheng Lin Edwin Yeo
Hi Toke, Thank you for your reply. I'm currently trying out on the Carrot2 Workbench and get it to call Solr to see how they did the clustering. Although it still takes some time to do the clustering, but the results of the cluster is much better than mine. I think its probably due to the differe

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
On Tue, 2015-08-25 at 10:40 +0800, Zheng Lin Edwin Yeo wrote: > Would like to confirm, when I set rows=100, does it mean that it only build > the cluster based on the first 100 records that are returned by the search, > and if I have 1000 records that matches the search, all the remaining 900 > rec

Re: Solr performance is slow with just 1GB of data indexed

2015-08-24 Thread Zheng Lin Edwin Yeo
Thank you Upayavira for your reply. Would like to confirm, when I set rows=100, does it mean that it only build the cluster based on the first 100 records that are returned by the search, and if I have 1000 records that matches the search, all the remaining 900 records will not be considered for c

Re: Solr performance is slow with just 1GB of data indexed

2015-08-24 Thread Upayavira
I honestly suspect your performance issue is down to the number of terms you are passing into the clustering algorithm, not to memory usage as such. If you have *huge* documents and cluster across them, performance will be slower, by definition. Clustering is usually done offline, for example on a

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Alexandre, I've tried to use just index=true, and the speed is still the same and not any faster. If I set to store=false, there's no results that came back with the clustering. Is this due to the index are not stored, and the clustering requires indexed that are stored? I've also increase my

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Yes, I'm using store=true. However, this field needs to be stored as my program requires this field to be returned during normal searching. I tried the lazyLoading=true, but it's not working. Will you do a copy field for the content, and not to set stored="true" for that field. So that field wil

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time. Bill Bell Sent from mobile > On Aug 23, 2015, at 8:52 AM, Shawn Heisey wrote: > >> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote: >> Hi Shawn, >> >> Yes, I've increased the heap size to 4GB already, and I'm using a machine >> with 32

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Jimmy Lin
unsubscribe On Sat, Aug 22, 2015 at 9:31 PM, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. > > However, I find that clustering is exceeding slow after I index this 1GB of > data. It took almost 30 seconds to return the cluster results wh

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Upayavira
And be aware that I'm sure the more terms in your documents, the slower clustering will be. So it isn't just the number of docs, the size of them counts in this instance. A simple test would be to build an index with just the first 1000 terms of your clustering fields, and see if that makes a diff

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Erick Erickson
You're confusing clustering with searching. Sure, Solr can index and lots of data, but clustering is essentially finding ad-hoc similarities between arbitrary documents. It must take each of the documents in the result size you specify from your result set and try to find commonalities. For perf i

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Alexandre Rafalovitch
Are you by any chance doing store=true on the fields you want to search? If so, you may want to switch to just index=true. Of course, they will then not come back in the results, but do you really want to sling huge content fields around. The other option is to do lazyLoading=true and not request

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Shawn and Toke, I only have 520 docs in my data, but each of the documents is quite big in size, In the Solr, it is using 221MB. So when i set to read from the top 1000 rows, it should just be reading all the 520 docs that are indexed? Regards, Edwin On 23 August 2015 at 22:52, Shawn Heisey

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Shawn Heisey
On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote: > Hi Shawn, > > Yes, I've increased the heap size to 4GB already, and I'm using a machine > with 32GB RAM. > > Is it recommended to further increase the heap size to like 8GB or 16GB? Probably not, but I know nothing about your data. How many So

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Toke Eskildsen
Zheng Lin Edwin Yeo wrote: > However, I find that clustering is exceeding slow after I index this 1GB of > data. It took almost 30 seconds to return the cluster results when I set it > to cluster the top 1000 records, and still take more than 3 seconds when I > set it to cluster the top 100 record

Re: Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Zheng Lin Edwin Yeo
Hi Shawn, Yes, I've increased the heap size to 4GB already, and I'm using a machine with 32GB RAM. Is it recommended to further increase the heap size to like 8GB or 16GB? Regards, Edwin On 23 Aug 2015 10:23, "Shawn Heisey" wrote: > On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote: > > I'm usin

Re: Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Shawn Heisey
On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote: > I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. > > However, I find that clustering is exceeding slow after I index this 1GB of > data. It took almost 30 seconds to return the cluster results when I set it > to cluster the top

Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. However, I find that clustering is exceeding slow after I index this 1GB of data. It took almost 30 seconds to return the cluster results when I set it to cluster the top 1000 records, and still take more than 3 seconds when