Re: Solr re-indexing in case of store=false

Erick Erickson Mon, 09 May 2016 21:25:54 -0700

Stored data is compressed by default, anecdotally there's about
a 2:1 compression ratio.


But the _other_ reason not to store all the data is that
it then gets replicated. If you have master/slave or SolrCloud
with replicas, you have N copies of your index and each and
every one of them has a copy of all your stored data....

Best,
Erick

On Mon, May 9, 2016 at 6:14 AM, Ali Nazemian <alinazem...@gmail.com> wrote:
> Dear Erick,
> Hi,
> Thank you very much. About the storing part you are right, unless the
> primary datastore uses some kind of data compression which in my case it
> does (I am using Cassandra as a primary datastore), and I am not sure about
> Solr that it has any kind of compression or not.
> According to your reply, it seems that I have to do that in a hard way.  I
> mean using the primary datastore to build the index from scratch.
>
> Sincerely,
>
> On Sun, May 8, 2016 at 11:07 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> bq: I would be grateful if somebody could introduce other way of
>> re-indexing
>> the whole data without using another datastore
>>
>> Not possible currently. Consider what's _in_ the index when stored="false".
>> The actual terms are the output of the entire analysis chain, including
>> stemming, stopword removal, synonym substitution etc. Since the
>> indexing process is lossy, you simply cannot reconstruct the original
>> stream from the indexed terms.
>>
>> I suppose one _could_ do this in the case of docValues only index with
>> the new return-values-from-docvalues functionality, but even that's lossy
>> because the order of returned values may not be the original insertion
>> order. And if that suits your needs, a pretty simple driver program would
>> suffice.
>>
>> To do this from indexed-only terms you'd have to somehow store the
>> original version of each term or store some codes indicating exactly
>> how to reconstruct the original steam, which very possibly would take
>> up as much space as if you'd just stored the values anyway. _And_ it
>> would burden every one else who didn't want to do this with a bloated
>> index.
>>
>> Best,
>> Erick
>>
>> On Sun, May 8, 2016 at 4:25 AM, Ali Nazemian <alinazem...@gmail.com>
>> wrote:
>> > Dear all,
>> > Hi,
>> > I was wondering, is it possible to re-index Solr 6.0 data in case of
>> > store=false? I am using Solr as a secondary datastore, and for the sake
>> of
>> > space efficiency all the fields (except id) are considered as
>> store=false.
>> > Currently, due to some changes in application business, Solr schema
>> should
>> > change, and in order to see the effect of changing schema on old data, I
>> > have to do the re-index process.  I know that one way of re-indexing in
>> > Solr is reading data from one collection (core) and inserting that to
>> > another one, but this solution is not possible for store=false fields,
>> and
>> > re-indexing the whole data through primary datastore is kind of costly,
>> so
>> > I would be grateful if somebody could introduce other way of re-indexing
>> > the whole data without using another datastore.
>> >
>> > Sincerely,
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian

Re: Solr re-indexing in case of store=false

Reply via email to