Good morning

I have the following situation I have to index the OCR of about 550,000
pages of newspapers counting an average of 3,500 words per page and making
a document per word the records are many.

At the moment I have 1 instance of Solr and 8 servers that read and write
all on the same instance at the same time, at the beginning everything is
fine after a while when I add, delete or commit it gives me a TimeOut error
towards the solr server.

I suspect the problem is due to the fact that it is that I do many commit
operations of many docs at a time (practically if the newspaper is 30 pages
I do 105,000 add and in the end I commit), if everyone does this and 8
servers within walking distance of each other I think this creates problems
for Solr.

What can I do to solve the problem?
Do I make a commi to each add?
Is it possible to configure the solr server to apply the add and delete
commands, and to commit it, the server autonomously supports the available
resources as it seems to do for the optmized command?
Reading the documentation I would have found this configuration to
implement but not if it solves my problem

<deletionPolicy class="solr.SolrDeletionPolicy">
  <str name="maxCommitsToKeep">1</str>
  <str name="maxOptimizedCommitsToKeep">0</str>
  <str 
name="maxCommitAge">1DAY</str></deletionPolicy><infoStream>false</infoStream>



Thanks for your consideration
Massimiliano Randazzo

Reply via email to