Mark, Agreed that Replication wouldn't help, I was dreaming that there was some intermediate format used in replication.
Ideally you are right, I could just reindex the data and go on with life, but my case is not so simple. Currently we have some set of processes which is run against the raw artifact to index things of interest within the text document. I don't believe (and I need to check with the folks who wrote this) that I have an easy way to do this currently but this would be my preference. Andrzej, Isn't the codec stuff merged with trunk now? Admittedly I know very little about Lucene's index format but I'd be willing to be a guinea pig if you needed a tester. On Thu, Dec 8, 2011 at 5:34 AM, Andrzej Bialecki <a...@getopt.org> wrote: > On 08/12/2011 05:00, Mark Miller wrote: >> >> Replication just copies the index, so I'm not sure how this would help >> offhand? >> >> With SolrCloud this is a breeze - just fire up another replica for a shard >> and the current index will replicate to it. >> >> If you where willing to export the data to some portable format and then >> pull it back in, why not just store the original data and reindex? > > > This was actually one of the situations that motivated that jira issue - > there are scenarios where reindexing, or keeping the original data, is very > costly, in terms of space, time, I/O, pre-processing costs, curating, > merging, etc, etc... > > The good news is that once the recent work on the codecs is merged with the > trunk then we can revisit this issue and implement it with much less effort > than before - we could even start by modifying SimpleTextCodec to be more > lenient, and proceed from there. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com >