Thanks Erick and Shawn.

I have reduced number of rows per page from 500K to 100K.
I also increased the ZKclientTimeOut to 30 seconds so that I don't run into
ZK time out issues. The ZK cluster has been deployed on the hosts other
than the SolrCloud hosts.

However, I was trying to increase the number of rows per page due to the
following reason: Running ingestion at the same time as running queries has
increased the amount of time it takes to read results from Solr using the
Cursor approach by 5 times. I am able to read 1M sorted documents in 1 hour
(88 bytes of data per document).

What could be the reason behind the low speed of query execution? I am
running solr servers with heap=16g and off-heap=16g. Off-heap is being used
as the block cache. Do ingestion and query execution both use a lot of
block cache? Should I increase the block cache size in oder to improve the
query performance? Should I increase slab.count or maxDirectMemorySize?

Thanks!

On Sat, Nov 19, 2016 at 8:13 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Returning 500K rows is, as Shawn says, not Solr's sweet spot.
>
> My guess: All the work you're doing trying to return that many
> rows, particularly in SolrCloud mode is simply overloading
> your system to the point that the ZK connection times out. Don't
> do that. If you need that many rows, either Shawn's cursorMark
> option or use export/streaming aggregation are much better
> choices.....
>
> Consider what happens on a sharded request:
> - the initial node sends a sub-request to a replica for each shard.
> - each replica returns it's candidate topN (doc ID and sort criteria)
> - the initial node sorts these lists (1M from each replica in your
> example) to get the true top N
> - the initial node requests the docs from each replica that made it
> into the true top N
> - each replica goes to disk, decompresses the doc and pulls out the fields
> - each replica sends its portion of the top N to the initial node
> - an enormous packet containing all 1M final docs is assembled and
> returned to the client.
> - this sucks up bandwidth and resources
> - that's bad enough, but especially if your ZK nodes are on the same
> box as your Solr nodes they're even more like to have a timeout issue.
>
>
> Best,
> Erick
>
> On Fri, Nov 18, 2016 at 8:45 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 11/18/2016 6:50 PM, Chetas Joshi wrote:
> >> The numFound is millions but I was also trying with rows= 1 Million. I
> will reduce it to 500K.
> >>
> >> I am sorry. It is state.json. I am using Solr 5.5.0
> >>
> >> One of the things I am not able to understand is why my ingestion job is
> >> complaining about "Cannot talk to ZooKeeper - Updates are disabled."
> >>
> >> I have a spark streaming job that continuously ingests into Solr. My
> shards are always up and running. The moment I start a query on SolrCloud
> it starts running into this exception. However as you said ZK will only
> update the state of the cluster when the shards go down. Then why my job is
> trying to contact ZK when the cluster is up and why is the exception about
> updating ZK?
> >
> > SolrCloud and SolrJ (CloudSolrClient) both maintain constant connections
> > to all the zookeeper servers they are configured to use.  If zookeeper
> > quorum is lost, SolrCloud will go read-only -- no updating is possible.
> > That is what is meant by "updates are disabled."
> >
> > Solr and Lucene are optimized for very low rowcounts, typically two or
> > three digits.  Asking for hundreds of thousands of rows is problematic.
> > The cursorMark feature is designed for efficient queries when paging
> > deeply into results, but it assumes your rows value is relatively small,
> > and that you will be making many queries to get a large number of
> > results, each of which will be fast and won't overload the server.
> >
> > Since it appears you are having a performance issue, here's a few things
> > I have written on the topic:
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Thanks,
> > Shawn
> >
>

Reply via email to