I am not sure I fully understood your use case, but let me suggest few different possible solutions :
1) Query Time join approach : you keep 2 collections, one static with all the pages, one that just store lighweight documents containing the crawling interaction : 1) Id, content -> Pages 2)pageId, ExperimentId, CrawlingCycleId ->CrawlingInteractions Then your query will be something like this ( to retrieve pageId): http://localhost:8983/solr/select?q={!join+from=id+to=pageId}text:query&fq=CrawlingCycleId:[N To K] To retrieve the entire page can be more problematic as you have to reverse the Join and you will join on millions of items. Not sure if it's going to work 2) You use atomic updates[1], and for each experiment and iteration you just add the fields you want ( experimentId and CrawlingCycleId). Be careful here as Atomic Updates doesn't mean you are not going to write the entire document again ( this is valid only under certain condition which doesn't apply to your use case i think), but at least it will give you a bit of advantage as your post requests pushing the document will be much more lightweight. [1] https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html ----- --------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html