Do be aware that turning on &debug=query adds a load. I've seen the debug component take 90% of the query time. (to be fair it usually takes a much smaller percentage).
But you'll see a section at the end of the response if you set debug=all with the time each component took so you'll have a sense of the relative time used by each component. Best, Erick On Fri, Jun 19, 2015 at 11:06 AM, Wenbin Wang <wwang...@gmail.com> wrote: > As for now, the index size is 6.5 M records, and the performance is good > enough. I will re-build the index for all the records (14 M) and test it > again with debug turned on. > > Thanks > > > On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> First and most obvious thing to try: >> >> bq: the Solr was started with maximal 4G for JVM, and index size is < 2G >> >> Bump your JVM to 8G, perhaps 12G. The size of the index on disk is very >> loosely coupled to JVM requirements. It's quite possible that you're >> spending >> all your time in GC cycles. Consider gathering GC characteristics, see: >> http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ >> >> As Charles says, on the face of it the system you describe should handle >> quite >> a load, so it feels like things can be tuned and you won't have to >> resort to sharding. >> Sharding inevitably imposes some overhead so it's best to go there last. >> >> From my perspective, this is, indeed, an XY problem. You're assuming >> that sharding >> is your solution. But you really haven't identified the _problem_ other >> than >> "queries are too slow". Let's nail down the reason queries are taking >> a second before >> jumping into sharding. I've just spent too much of my life fixing the >> wrong thing ;) >> >> It would be useful to see a couple of sample queries so we can get a >> feel for how complex they >> are. Especially if you append, as Charles mentions, "debug=true" >> >> Best, >> Erick >> >> On Fri, Jun 19, 2015 at 7:02 AM, Reitzel, Charles >> <charles.reit...@tiaa-cref.org> wrote: >> > Grouping does tend to be expensive. Our regular queries typically >> return in 10-15ms while the grouping queries take 60-80ms in a test >> environment (< 1M docs). >> > >> > This is ok for us, since we wrote our app to take the grouping queries >> out of the critical path (async query in parallel with two primary queries >> and some work in middle tier). But this approach is unlikely to work for >> most cases. >> > >> > -----Original Message----- >> > From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] >> > Sent: Friday, June 19, 2015 9:52 AM >> > To: solr-user@lucene.apache.org >> > Subject: RE: How to do a Data sharding for data in a database table >> > >> > Hi Wenbin, >> > >> > To me, your instance appears well provisioned. Likewise, your analysis >> of test vs. production performance makes a lot of sense. Perhaps your time >> would be well spent tuning the query performance for your app before >> resorting to sharding? >> > >> > To that end, what do you see when you set debugQuery=true? Where does >> solr spend the time? My guess would be in the grouping and sorting steps, >> but which? Sometime the schema details matter for performance. Folks on >> this list can help with that. >> > >> > -Charlie >> > >> > -----Original Message----- >> > From: Wenbin Wang [mailto:wwang...@gmail.com] >> > Sent: Friday, June 19, 2015 7:55 AM >> > To: solr-user@lucene.apache.org >> > Subject: Re: How to do a Data sharding for data in a database table >> > >> > I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or >> computer disk bound. In addition, the Solr was started with maximal 4G for >> JVM, and index size is < 2G. In a typical test, I made sure enough free RAM >> of 10G was available. I have not tuned any parameter in the configuration, >> it is default configuration. >> > >> > The number of fields for each record is around 10, and the number of >> results to be returned per page is 30. So the response time should not be >> affected by network traffic, and it is tested in the same machine. The >> query has a list of 4 search parameters, and each parameter takes a list of >> values or date range. The results will also be grouped and sorted. The >> response time of a typical single request is around 1 second. It can be > 1 >> second with more demanding requests. >> > >> > In our production environment, we have 64 cores, and we need to support > >> > 300 concurrent users, that is about 300 concurrent request per second. >> Each core will have to process about 5 request per second. The response >> time under this load will not be 1 second any more. My estimate is that an >> average of 200 ms response time of a single request would be able to handle >> > 300 concurrent users in production. There is no plan to increase the >> total number of cores 5 times. >> > >> > In a previous test, a search index around 6M data size was able to >> handle > >> > 5 request per second in each core of my 8-core machine. >> > >> > By doing data sharding from one single index of 13M to 2 indexes of 6 or >> 7 M/each, I am expecting much faster response time that can meet the demand >> of production environment. That is the motivation of doing data sharding. >> > However, I am also open to solution that can improve the performance of >> the index of 13M to 14M size so that I do not need to do a data sharding. >> > >> > >> > >> > >> > >> > On Fri, Jun 19, 2015 at 12:39 AM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> You've repeated your original statement. Shawn's observation is that >> >> 10M docs is a very small corpus by Solr standards. You either have >> >> very demanding document/search combinations or you have a poorly tuned >> >> Solr installation. >> >> >> >> On reasonable hardware I expect 25-50M documents to have sub-second >> >> response time. >> >> >> >> So what we're trying to do is be sure this isn't an "XY" problem, from >> >> Hossman's apache page: >> >> >> >> Your question appears to be an "XY Problem" ... that is: you are >> >> dealing with "X", you are assuming "Y" will help you, and you are >> asking about "Y" >> >> without giving more details about the "X" so that we can understand >> >> the full issue. Perhaps the best solution doesn't involve "Y" at all? >> >> See Also: http://www.perlmonks.org/index.pl?node_id=542341 >> >> >> >> So again, how would you characterize your documents? How many fields? >> >> What do queries look like? How much physical memory on the machine? >> >> How much memory have you allocated to the JVM? >> >> >> >> You might review: >> >> http://wiki.apache.org/solr/UsingMailingLists >> >> >> >> >> >> Best, >> >> Erick >> >> >> >> On Thu, Jun 18, 2015 at 3:23 PM, wwang525 <wwang...@gmail.com> wrote: >> >> > The query without load is still under 1 second. But under load, >> >> > response >> >> time >> >> > can be much longer due to the queued up query. >> >> > >> >> > We would like to shard the data to something like 6 M / shard, which >> >> > will still give a under 1 second response time under load. >> >> > >> >> > What are some best practice to shard the data? for example, we could >> >> shard >> >> > the data by date range, but that is pretty dynamic, and we could >> >> > shard >> >> data >> >> > by some other properties, but if the data is not evenly distributed, >> >> > you >> >> may >> >> > not be able shard it anymore. >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data- >> >> in-a-database-table-tp4212765p4212803.html >> >> > Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> > >> > ************************************************************************* >> > This e-mail may contain confidential or privileged information. >> > If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> > >> > TIAA-CREF >> > ************************************************************************* >> > >> > ************************************************************************* >> > This e-mail may contain confidential or privileged information. >> > If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> > >> > TIAA-CREF >> > ************************************************************************* >>