On 7/31/2019 6:47 AM, profiuser wrote:
we have something about 400 000 000 items in a solr collection.
We have set up auto commit property for this collection to 15 minutes.
Is a big collection and we using some caches etc. Therefore we have big
autocommit value.

I would set autoCommit to 60 seconds (a value of 60000) with openSearcher set to false. This will not affect change visibility in any way, but it will keep your transaction logs from becoming huge. Commits that do NOT open a new searcher are very fast.

Then I would use autoSoftCommit as a failsafe on change visibility. Start with a value between two and five minutes.

This have disadvantage that we haven't NRT searches.

We would like to have NRT at least for searching for the newly added items.

We read about new functionality "Category routed alilases" in a solr version
8.1.

And we got an idea, that we could add to our collection schema field for
routing.
And at the time of indexing we check if item is new and to routing field we
set up value "new", or the item is older than some time period we set up
value to "old".
And we will have one category routed alias routedCollection, and there will
be 2 collections old and new.

If we index new item, router choose new collection and this item is inserted
to it. After some period we reindex item and we decide that this item is old
and to routing field we set up value "old". Router decide to update (insert)
item to collection old. But we expect that solr automatically check
uniqueness in all routed collections. And if solr found item in other
collection, than will be automatically deleted. But not !!!

Is this expected behaviour?

I know very little about the new routed collection capability, but in general, I would not expect Solr to check more than one collection for an existing ID value when it is indexing. I don't think there's anything happening at that level that even knows about other collections. If you want to split your index into hot and cold pieces, you're probably going to need to have your indexing software be aware of that and either figure out where to send deletes, or just send deletes to all parts of the index.

What kind of lag time do you think about when you imagine near real time indexing? Note that extremely short NRT times may not be achievable, especially with the large index you're using. A good starting point in my opinion is 30000, which is 30 seconds.

What I would do is use the autoCommit and autoSoftCommit settings that I mentioned above, and include a "commitWithin" parameter on all indexing requests. The commitWithin would be for NRT.

Thanks,
Shawn

Reply via email to