I'd like to be able to define within a single Solr core, a set of indexes in multiple directories. This is really useful for indexing in Hadoop or integrating with Katta where an EmbeddedSolrServer is distributed to the Hadoop cluster and indexes are generated in parallel and returned to Solr slave servers. It seems like this could be done using a custom IndexReaderFactory that opens a MultiReader over the directories. SolrIndexWriter usage in this context would be limited to incremental updates (if anything).
It would be great for Solr docSet caching to operate at the SegmentReader level so the small incremental updates don't cause a massive cache regeneration. Maybe there's a way to trick Solr into doing this today by using multiple EmbeddedSolrServer instances for each large segment/shard, and executing a local distributed query to them? This way each EmbeddedSolrServer maintains caches that are not disturbed by shard updates. Ideally if I had to use multiple cores, I'd rather not have maintain separate instances of /conf on disk but could pass the same in memory rep of solrconfig and schema into the core?