I have a similar use-case. Check out the export capability and using
cursorMark.
-Joe
On 2/2/2015 8:14 AM, Matteo Grolla wrote:
Hi,
I'm thinking about having an instance of solr (SolrA) with all fields
stored and just id indexed in addition with a normal production instance of
solr (SolrB) that is used for the searches.
This would allow me to read only what changed from previous crawl, update SolrA
and send the full document to SolrB. Without forcing SolrB to have all fields
stored.
In addition I have some batch jobs that work on the whole collection and making
them work on SolrA would allow me to detect the document that changed and
submit only those to SolrB.
The point is that to run this job I'll need to scan through all documents from
SolrA, I'll query on *:* and then go through all pages, which is not the
typical usage of Solr.
SolrA will contain a few tens of GB of data coming from hundreds of thousands
docs.
Do you think I'm gonna run into troubles using Solr this way?
I'd like to use Solr (for SolrA) for ease of maintenance, because Sys admin are
already trained with Solr
thanks