Hello!

We are evaluating Solr usage in our organization and have come to the point 
where we are past the functional tests and are now looking in choosing the best 
deployment topology.
Here are some details about the structure of the problem: The application deals 
with storing and retrieving artifacts of various types. The artifact are stored 
in Projects. Each project can have hundreds of thousands of artifacts (total on 
all types) and our largest customers have hundreds of projects (~300-800) 
though the vast majority have tens of project (~30-100).

Core granularity
In terms of Core granularity- it seems to me that a core per project is 
sensible, as pushing everything to a single core will probably be too much. The 
entities themselves will have a special type field for distinction.
Moreover, it may be that not all of the project are active in a given time so 
this allows their indexes to remain on latent on disk.


Availability and synchronization
Our application is deployed on premise on our customers sites- we cannot go too 
crazy as to the amount of extra resources we demand from them- e.g. dedicated 
indexing servers. We pretty much need to make do with what is already there.

For now, we are planning to use the DIH to maintain the index. Each node the 
cluster on the app will have its own local index. When a project is created (or 
the feature is enabled on an existing project), a core is created for it on 
each one of the nodes, a full import is executed and then a delta import is 
scheduled to run on each one of the nodes. This gives us simplicity but I am 
wondering about the performance and memory consumption costs? Also, I am 
wondering whether we should use replication for this purpose. The requirement 
is for the index to be updated once in 30 seconds - are delta imports design 
for this?

I understand that this is a very complex problem in general. I tried to 
highlight all the most significant aspects and will appreciate some initial 
guidance. Note that we are planning to execute performance and stress testing 
no matter what but the assumption is that the topology of the solution can be 
predetermined with the existing data.




Reply via email to