On 10/1/2020 6:57 AM, Manisha Rahatadkar wrote:
We are using Apache Solr 7.7 on Windows platform. The data is synced to Solr 
using Solr.Net commit. The data is being synced to SOLR in batches. The 
document size is very huge (~0.5GB average) and solr indexing is taking long 
time. Total document size is ~200GB. As the solr commit is done as a part of 
API, the API calls are failing as document indexing is not completed.

A single document is five hundred megabytes? What kind of documents do you have? You can't even index something that big without tweaking configuration parameters that most people don't even know about. Assuming you can even get it working, there's no way that indexing a document like that is going to be fast.

   1.  What is your advise on syncing such a large volume of data to Solr KB.

What is "KB"?  I have never heard of this in relation to Solr.

   2.  Because of the search requirements, almost 8 fields are defined as Text 
fields.

I can't figure out what you are trying to say with this statement.

   3.  Currently Solr_JAVA_MEM is set to 2gb. Is that enough for such a large 
volume of data?

If just one of the documents you're sending to Solr really is five hundred megabytes, then 2 gigabytes would probably be just barely enough to index one document into an empty index ... and it would probably be doing garbage collection so frequently that it would make things REALLY slow. I have no way to predict how much heap you will need. That will require experimentation. I can tell you that 2GB is definitely not enough.

   4.  How to set up Solr in production on Windows? Currently it's set up as a 
standalone engine and client is requested to take the backup of the drive. Is 
there any other better way to do? How to set up for the disaster recovery?

I would suggest NOT doing it on Windows. My reasons for that come down to costs -- a Windows Server license isn't cheap.

That said, there's nothing wrong with running on Windows, but you're on your own as far as running it as a service. We only have a service installer for UNIX-type systems. Most of the testing for that is done on Linux.

   5.  How to benchmark the system requirements for such a huge data

I do not know what all your needs are, so I have no way to answer this. You're going to know a lot more about it that any of us are.

Thanks,
Shawn

Reply via email to