Re: How to do a Data sharding for data in a database table

2015-07-02 Thread Erick Erickson
bq: Does Solr automatically loads search index into memory after the index is built? No. That's what the autowarm counts on on your queryResultCache and filterCache are intended to facilitate. Also after every commit, a newSearcher event is fired and any warmup queries you have configured in the n

Re: How to do a Data sharding for data in a database table

2015-07-02 Thread wwang525
Hi, I worked with other search solutions before, and cache management is important in boosting performance. Apart from the cache generated due to user's requests, loading the search index into memory is the very initial step after the index is built. This is to ensure search results to be retrieve

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
bq: The index size is only 1 M records. A 10 times of the record size (> 10M) will likely bring the total response time to > 1 second This is an extrapolation you simply cannot make. Plus you cannot really tell anything from just a few queries about system performance. In fact you must disregard t

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All, I did many tests with very consistent test results. Each query was executed after re-indexing, and only one request was sent to query the index. I disabled filterCache and queryResultCache for this test based on Erick's recommendation. The test document was posted to this email list earli

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html Sent from the Solr - User mailing list archive

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
I'd set filterCache and queryResultCache to zero (size and autowarm count) Leave documentCache alone IMO as it's used to store documents on disk as the pass through various query components and doesn't autowarm anyway. I'd think taking it out would skew your results because of multiple decompress

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi, I am currently investigating the queries with a much small index size (1M) to see the grouping, faceting on the performance degradation. This will allow me to do a lot of tests in a short period of time. However, it looks like the query is executed much faster the second time. This is tested

Re: How to do a Data sharding for data in a database table

2015-06-27 Thread Erick Erickson
Hmmm, indeed it does. Never mind ;) I guess the thing I'd be looking at is garbage collection, here's a very good writeup: http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ Kind of a shot in the dark, but it's possible. Good luck! Erick On Thu, Jun 25, 2015 at 3:26 PM, Wenbin Wang wro

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Guys, I have no problem changing it to 2. However, we are talking about two different applications. The Solr 4.7 has two applications: example and example-DIH. The application example-DIH is the one I started with since it works with database. The example-DIH has the default setting to 4. Re

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Shawn Heisey
On 6/25/2015 10:27 AM, Wenbin Wang wrote: > To clarify the work: > > We are very early in the investigative phase, and the indexing is NOT done > continuously. > > I indexed the data once through Admin UI, and test the query. If I need to > index again, I can use curl or through the Admin UI. > > T

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
To clarify the work: We are very early in the investigative phase, and the indexing is NOT done continuously. I indexed the data once through Admin UI, and test the query. If I need to index again, I can use curl or through the Admin UI. The Solr 4.7 seems to have a default setting of maxWarming

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
You're missing the point. One of the things that can really affect response time is too-frequent commits. The fact that the commit configurations have been commented out indicate that the commits are happening either manually (curl, HTTP request or the like) _or_ you have, say, a SolrJ client that

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Erick, The configuration is largely the default one, and I have not made much change. I am also quite new to Solr although I have a lot of experience in other search products. The whole list of fields need to be retrieved, so I do not have much of a choice. The total size of the index files is

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
bq: Try not to store fields as much as possible. Why? Storing fields certainly adds lots of size to the _disk_ files, but have much less effect on memory requirements than one might think. The *.fdt and *.fdx files in your index are used for the stored data, and they're only read for the top N doc

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread wwang525
schema.xml solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4213

Re: How to do a Data sharding for data in a database table

2015-06-24 Thread William Bell
1GB is too small to start. Try starting the same on both: -Xms8196m -Xmx8196m We use 12GB for these on a similar sized index and it works good. Send schema.xml and solrconfig.xml. Try not to store fields as much as possible. On Wed, Jun 24, 2015 at 8:08 AM, wwang525 wrote: > Hi All, > > I b

Re: How to do a Data sharding for data in a database table

2015-06-24 Thread wwang525
Hi All, I built the Solr index with 14 M records. I have > 20 G RAM in my local machine, and the Solr instance was started with -Xms1024m -Xmx8196m The following query: http://localhost:8983/solr/db-mssql/select?q=*:*&fq=GatewayCode:(YYZ)&fq=DestCode:(CUN)&fq=Duration:(5 OR 6 OR 7 OR 8)&fq=Date

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Carlos Maroto
@lucene.apache.org Subject: RE: How to do a Data sharding for data in a database table Also, since you are tuning for relative times, you can tune on the smaller index. Surely, you will want to test at scale. But tuning query, analyzer or schema options is usually easier to do on a smaller index. If you

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
-user@lucene.apache.org Subject: Re: How to do a Data sharding for data in a database table Do be aware that turning on &debug=query adds a load. I've seen the debug component take 90% of the query time. (to be fair it usually takes a much smaller percentage). But you'll see a sectio

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
ms while the grouping queries take 60-80ms in a test >> environment (< 1M docs). >> > >> > This is ok for us, since we wrote our app to take the grouping queries >> out of the critical path (async query in parallel with two primary queries >> and some work in middle ti

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
ork for > most cases. > > > > -----Original Message- > > From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] > > Sent: Friday, June 19, 2015 9:52 AM > > To: solr-user@lucene.apache.org > > Subject: RE: How to do a Data sharding for data in a database

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
orting steps, but > which? Sometime the schema details matter for performance. Folks on this > list can help with that. > > -Charlie > > -----Original Message----- > From: Wenbin Wang [mailto:wwang...@gmail.com] > Sent: Friday, June 19, 2015 7:55 AM > To: solr-u

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
ang [mailto:wwang...@gmail.com] Sent: Friday, June 19, 2015 7:55 AM To: solr-user@lucene.apache.org Subject: Re: How to do a Data sharding for data in a database table I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or computer disk bound. In addition, the Solr was started w

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
] Sent: Friday, June 19, 2015 7:55 AM To: solr-user@lucene.apache.org Subject: Re: How to do a Data sharding for data in a database table I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or computer disk bound. In addition, the Solr was started with maximal 4G for JVM, and

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or computer disk bound. In addition, the Solr was started with maximal 4G for JVM, and index size is < 2G. In a typical test, I made sure enough free RAM of 10G was available. I have not tuned any parameter in the configuration, it

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Erick Erickson
You've repeated your original statement. Shawn's observation is that 10M docs is a very small corpus by Solr standards. You either have very demanding document/search combinations or you have a poorly tuned Solr installation. On reasonable hardware I expect 25-50M documents to have sub-second resp

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
The query without load is still under 1 second. But under load, response time can be much longer due to the queued up query. We would like to shard the data to something like 6 M / shard, which will still give a under 1 second response time under load. What are some best practice to shard the dat

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Jack Krupansky
10M doesn't sound too demanding. How complex are your queries? How complex is your data - like number of fields and size, like very large documents? Are you sure you have enough RAM to fully cache your index? Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If compute-bo