On 4/1/2016 8:56 PM, Erick Erickson wrote:
> bq: The bottleneck is definitely Solr.
>
> Since you commented out the server.add(doclist), you're right to focus
> there. I've seen
> a few things that help.
>
> 1> batch the documents, i.e. in the doclist above the list should be
> on the order of 1,00
Shawn:
bq: The bottleneck is definitely Solr.
Since you commented out the server.add(doclist), you're right to focus
there. I've seen
a few things that help.
1> batch the documents, i.e. in the doclist above the list should be
on the order of 1,000 docs. Here
are some numbers I worked up one tim
On 3/24/2016 11:57 AM, tedsolr wrote:
> My post was scant on details. The numbers I gave for collection sizes are
> projections for the future. I am in the midst of an upgrade that will be
> completed within a few weeks. My concern is that I may not be able to
> produce the throughput necessary to
Well, for comparison I routinely get 20K docs/second on my Mac Pro
indexing Wikipedia docs. I _think_ I have 4 shards when I do this, all
in the same JVM. I'd be surprised if you can't get your 5K docs/sec,
but you may indeed need more shards.
All that said, 4G for the JVM is kind of constrained,
Hi Erick,
My post was scant on details. The numbers I gave for collection sizes are
projections for the future. I am in the midst of an upgrade that will be
completed within a few weeks. My concern is that I may not be able to
produce the throughput necessary to index an entire collection quickly
Impossible to say if for no other reason than you haven't told us
how many physical machines this is spread over ;).
For the process you've outlined to work, all the fields are stored,
right? So why not use Atomic Updates? You still have to query
the docs.
About querying. If I'm reading this righ