We are working on a proposal and feeling streaming API along with export 
handler will best fit for our usecases. We are already of having a structure in 
solr in which we are using graph queries to produce hierarchical structure. Now 
from the structure we need to join couple of more collections.         We have 
5 different collections.                           Collection 1- 800 k records. 
                                  Collection 2- 200k records.                   
                Collection 3 - 7k records.                                      
 Collection 4 - 6 million records.                             Collection 5 - 
150 k records                               we are using the below strategy     
                        innerJoin( intersect( innerJoin(collection 1,collection 
2), innerJoin(Collection 3, Collection 4)), collection 5).                      
              We are seeing performance is too slow when we start having 
collection 4. Just with collection 1 2 5 the results are coming in 2 secs. The 
moment I have included collection 4 in the query I could see  a performance 
impact. I believe exporting large results from collection 4 is causing the 
issie. Currently I am using single sharded collection with no replica. I 
thinking if we can increase the memory as first option to increase performance 
as processing doc values need more memory. Then if that did not worked I can 
check using parallel stream/ sharding. Kindly advise is there could be anything 
else I  missing?
Sent from Yahoo Mail on Android

Reply via email to