Re: How to do a Data sharding for data in a database table

Wenbin Wang Fri, 19 Jun 2015 04:55:44 -0700

I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or
computer disk bound. In addition, the Solr was started with maximal 4G for
JVM, and index size is < 2G. In a typical test, I made sure enough free RAM
of 10G was available. I have not tuned any parameter in the configuration,
it is default configuration.

The number of fields for each record is around 10, and the number of
results to be returned per page is 30. So the response time should not be
affected by network traffic, and it is tested in the same machine. The
query has a list of 4 search parameters, and each parameter takes a list of
values or date range. The results will also be grouped and sorted. The
response time of a typical single request is around 1 second. It can be > 1
second with more demanding requests.

In our production environment, we have 64 cores, and we need to support >
300 concurrent users, that is about 300 concurrent request per second. Each
core will have to process about 5 request per second. The response time
under this load will not be 1 second any more. My estimate is that an
average of 200 ms response time of a single request would be able to handle
300 concurrent users in production. There is no plan to increase the total
number of cores 5 times.

In a previous test, a search index around 6M data size was able to handle >
5 request per second in each core of my 8-core machine.

By doing data sharding from one single index of 13M to 2 indexes of 6 or 7
M/each, I am expecting much faster response time that can meet the demand
of production environment. That is the motivation of doing data sharding.
However, I am also open to solution that can improve the performance of the
 index of 13M to 14M size so that I do not need to do a data sharding.

On Fri, Jun 19, 2015 at 12:39 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> You've repeated your original statement. Shawn's
> observation is that 10M docs is a very small corpus
> by Solr standards. You either have very demanding
> document/search combinations or you have a poorly
> tuned Solr installation.
>
> On reasonable hardware I expect 25-50M documents to have
> sub-second response time.
>
> So what we're trying to do is be sure this isn't
> an "XY" problem, from Hossman's apache page:
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
> So again, how would you characterize your documents? How many
> fields? What do queries look like? How much physical memory on the
> machine? How much memory have you allocated to the JVM?
>
> You might review:
> http://wiki.apache.org/solr/UsingMailingLists
>
>
> Best,
> Erick
>
> On Thu, Jun 18, 2015 at 3:23 PM, wwang525 <wwang...@gmail.com> wrote:
> > The query without load is still under 1 second. But under load, response
> time
> > can be much longer due to the queued up query.
> >
> > We would like to shard the data to something like 6 M / shard, which will
> > still give a under 1 second response time under load.
> >
> > What are some best practice to shard the data? for example, we could
> shard
> > the data by date range, but that is pretty dynamic, and we could shard
> data
> > by some other properties, but if the data is not evenly distributed, you
> may
> > not be able shard it anymore.
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: How to do a Data sharding for data in a database table

Reply via email to