We are working on a proposal and feeling streaming API along with export
handler will best fit for our usecases. We are already of having a structure in
solr in which we are using graph queries to produce hierarchical structure. Now
from the structure we need to join couple of more collections. We have
5 different collections. Collection 1- 800 k records.
Collection 2- 200k records.
Collection 3 - 7k records.
Collection 4 - 6 million records. Collection 5 -
150 k records we are using the below strategy
innerJoin( intersect( innerJoin(collection 1,collection
2), innerJoin(Collection 3, Collection 4)), collection 5).
We are seeing performance is too slow when we start having
collection 4. Just with collection 1 2 5 the results are coming in 2 secs. The
moment I have included collection 4 in the query I could see a performance
impact. I believe exporting large results from collection 4 is causing the
issie. Currently I am using single sharded collection with no replica. I
thinking if we can increase the memory as first option to increase performance
as processing doc values need more memory. Then if that did not worked I can
check using parallel stream/ sharding. Kindly advise is there could be anything
else I missing?
Sent from Yahoo Mail on Android