Doesn’t re-indexing a document just delete/replace….

It’s complicated. For the individual document, yes. The problem
comes because the field is inconsistent _between_ documents, and
segment merging blows things up.

Consider. I have segment1 with documents indexed with the old
schema (String in this case). I  change my schema and index the same
field as a text type.

Eventually, a segment merge happens and these two segments get merged
into a single new segment. How should the field be handled? Should it
be defined as String or Text in the new segment? If you convert the docs
with a Text definition for the field to String,
you’d lose the ability to search for individual tokens. If you convert the
String to Text, you don’t have any guarantee that the information is even
available.

This is just the tip of the iceberg in terms of trying to change the 
definition of a field. Take the case of changing the analysis chain,
say you use a phonetic filter on a field then decide to remove it and
do not store the original. Erick might be encoded as “ENXY” so the 
original data is simply not there to convert. Ditto removing a 
stemmer, lowercasing, applying a regex, …...


From Mike McCandless:

"This really is the difference between an index and a database:
 we do not store, precisely, the original documents.  We store 
an efficient derived/computed index from them.  Yes, Solr/ES 
can add database-like behavior where they hold the true original 
source of the document and use that to rebuild Lucene indices 
over time.  But Lucene really is just a "search index" and we 
need to be free to make important improvements with time."

And all that aside, you have to re-index all the docs anyway or
your search results will be inconsistent. So leaving aside the 
impossible task of covering all the possibilities on the fly, it’s
better to plan on re-indexing….

Best,
Erick


> On Oct 16, 2020, at 3:16 PM, David Hastings <hastings.recurs...@gmail.com> 
> wrote:
> 
> "If you want to
> keep the same field name, you need to delete all of the
> documents in the index, change the schema, and reindex."
> 
> actually doesnt re-indexing a document just delete/replace anyways assuming
> the same id?
> 
> On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> 
>> Just as a side note,
>> 
>>> indexed="true"
>> If you are storing 32K message, you probably are not searching it as a
>> whole string. So, don't index it. You may also want to mark the field
>> as 'large' (and lazy):
>> 
>> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
>> 
>> When you are going to make it a text field, you will probably be
>> having the same issues as well.
>> 
>> And honestly, if you are not storing those fields to search, maybe you
>> need to consider the architecture. Maybe those fields do not need to
>> be in Solr at all, but in external systems. Solr (or any search
>> system) should not be your system of records since - as the other
>> reply showed - some of the answers are "reindex everything".
>> 
>> Regards,
>>   Alex.
>> 
>> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar <yaswanth...@gmail.com>
>> wrote:
>>> 
>>> I am using solr 8.2
>>> 
>>> Can I change the schema fieldtype from string to solr.TextField
>>> without indexing?
>>> 
>>>    <field name="messagetext" type="string" indexed="true"
>> stored="true"/>
>>> 
>>> The reason is that string has only 32K char limit where as I am looking
>> to
>>> store more than 32K now.
>>> 
>>> The contents on this field doesn't require any analysis or tokenized but
>> I
>>> need this field in the queries and as well as output fields.
>>> 
>>> --
>>> Thanks & Regards,
>>> Yaswanth Kumar Konathala.
>>> yaswanth...@gmail.com
>> 

Reply via email to