Very simple way to know where to start looking: just don't send the
docs to Solr.

Somewhere you have some code like:

SolrClient client = new CloudSolrClient...
while (more docs from the DB) {
    doc_list = build_document_list()
    client.add(doc_list);
}

Just comment out the client.add line and run it the program. Very
often, performance like this is slow in getting the data from the DB
rather than Solr being the bottleneck.

Another very easy thing to look at: Are your CPUs running hot? If your
Solr nodes are just idling along at, say, 10% then you're not feeding
docs fast enough.

Final thing to check: Are you batching? Batching docs can
significantly increase throughput, see:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/

And Solr simply shouldn't be running out of memory unless you're
sending utterly massive documents or sending huge numbers of docs.
When indexing, the default ramBufferSizeMB is 100, meaning that when
the in-memory structures exceed 100MB, they are flushed to disk. What
are you commit intervals (both soft and hard)?

Best,
Erick

On Thu, Feb 15, 2018 at 5:56 AM, Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
> So it is not SolrJ, but Solr is your problem?
>
> In your first email there was nothing about heap exceptions, only the runtime 
> about loading.
>
> What do you means by "injecting too many rows", what is "too many"?
>
> Some numbers while loading from scratch:
> - single node 412GB index
> - 92 fields
> - 123.6 million docs
> - 1.937 billion terms
> - loading from file system
> - indexing time 9 hrs 5 min
> - using SolJ ConcurrentUpdateSolrClient
> --- queueSize=10000, threads=12
> --- waitFlush=true, waitSearcher=true, softcommit=false
> And, Solr must be configured to "swallow" all this :-)
>
>
> You say "8GB per node" so it is SolrCloud?
>
> Anyhting else than heap exception?
>
> How many commits?
>
> Regards
> Bernd
>
>
> Am 15.02.2018 um 10:31 schrieb LOPEZ-CORTES Mariano-ext:
>> Injecting too many rows into Solr throws Java heap exception (Higher memory? 
>> We have 8GB per node).
>>
>> Have DIH support for paging queries?
>>
>> Thanks!
>>
>> -----Message d'origine-----
>> De : Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
>> Envoyé : jeudi 15 février 2018 10:13
>> À : solr-user@lucene.apache.org
>> Objet : Re: Reading data from Oracle
>>
>> And where is the bottleneck?
>>
>> Is it reading from Oracle or injecting to Solr?
>>
>> Regards
>> Bernd
>>
>>
>> Am 15.02.2018 um 08:34 schrieb LOPEZ-CORTES Mariano-ext:
>>> Hello
>>>
>>> We have to delete our Solr collection and feed it periodically from an 
>>> Oracle database (up to 40M rows).
>>>
>>> We've done the following test: From a java program, we read chunks of data 
>>> from Oracle and inject to Solr (via Solrj).
>>>
>>> The problem : It is really really slow (1'5 nights).
>>>
>>> Is there one faster method to do that ?
>>>
>>> Thanks in advance.
>>>

Reply via email to