In addition, what happens at query time when documents have
been index under a varying field type? Well, it doesn’t work well.

The full set of steps for uninterrupted searching is:

1. Add the new text field.
2. Reindex to populate that.
3. Switch querying to use the new text field.
4. Change the old string field to indexed=“false” stored=“false” and/or stop
including that field in search updates and/or populating it with copyField.
5. Reindex again to clean up all occurrences of the old field.
6. Remove the old field from the schema.

I just finished this process on two big clusters in prod. We had
created a bunch of extra fields for a series of A/B tests on 
relevance improvements. Those tests were finished, so we 
needed to remove those from the index. It was slightly simpler
because we had already stopped querying those fields.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 16, 2020, at 12:57 PM, David Hastings <hastings.recurs...@gmail.com> 
> wrote:
> 
> Gotcha, thanks for the explanation.  another small question if you
> dont mind, when deleting docs they arent actually removed, just tagged as
> deleted, and the old field/field type is still in the index until
> merged/optimized as well, wouldnt that cause almost the same conflicts
> until then?
> 
> On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> Doesn’t re-indexing a document just delete/replace….
>> 
>> It’s complicated. For the individual document, yes. The problem
>> comes because the field is inconsistent _between_ documents, and
>> segment merging blows things up.
>> 
>> Consider. I have segment1 with documents indexed with the old
>> schema (String in this case). I  change my schema and index the same
>> field as a text type.
>> 
>> Eventually, a segment merge happens and these two segments get merged
>> into a single new segment. How should the field be handled? Should it
>> be defined as String or Text in the new segment? If you convert the docs
>> with a Text definition for the field to String,
>> you’d lose the ability to search for individual tokens. If you convert the
>> String to Text, you don’t have any guarantee that the information is even
>> available.
>> 
>> This is just the tip of the iceberg in terms of trying to change the
>> definition of a field. Take the case of changing the analysis chain,
>> say you use a phonetic filter on a field then decide to remove it and
>> do not store the original. Erick might be encoded as “ENXY” so the
>> original data is simply not there to convert. Ditto removing a
>> stemmer, lowercasing, applying a regex, …...
>> 
>> 
>> From Mike McCandless:
>> 
>> "This really is the difference between an index and a database:
>> we do not store, precisely, the original documents.  We store
>> an efficient derived/computed index from them.  Yes, Solr/ES
>> can add database-like behavior where they hold the true original
>> source of the document and use that to rebuild Lucene indices
>> over time.  But Lucene really is just a "search index" and we
>> need to be free to make important improvements with time."
>> 
>> And all that aside, you have to re-index all the docs anyway or
>> your search results will be inconsistent. So leaving aside the
>> impossible task of covering all the possibilities on the fly, it’s
>> better to plan on re-indexing….
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Oct 16, 2020, at 3:16 PM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>> 
>>> "If you want to
>>> keep the same field name, you need to delete all of the
>>> documents in the index, change the schema, and reindex."
>>> 
>>> actually doesnt re-indexing a document just delete/replace anyways
>> assuming
>>> the same id?
>>> 
>>> On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch <
>> arafa...@gmail.com>
>>> wrote:
>>> 
>>>> Just as a side note,
>>>> 
>>>>> indexed="true"
>>>> If you are storing 32K message, you probably are not searching it as a
>>>> whole string. So, don't index it. You may also want to mark the field
>>>> as 'large' (and lazy):
>>>> 
>>>> 
>> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
>>>> 
>>>> When you are going to make it a text field, you will probably be
>>>> having the same issues as well.
>>>> 
>>>> And honestly, if you are not storing those fields to search, maybe you
>>>> need to consider the architecture. Maybe those fields do not need to
>>>> be in Solr at all, but in external systems. Solr (or any search
>>>> system) should not be your system of records since - as the other
>>>> reply showed - some of the answers are "reindex everything".
>>>> 
>>>> Regards,
>>>>  Alex.
>>>> 
>>>> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar <yaswanth...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> I am using solr 8.2
>>>>> 
>>>>> Can I change the schema fieldtype from string to solr.TextField
>>>>> without indexing?
>>>>> 
>>>>>   <field name="messagetext" type="string" indexed="true"
>>>> stored="true"/>
>>>>> 
>>>>> The reason is that string has only 32K char limit where as I am looking
>>>> to
>>>>> store more than 32K now.
>>>>> 
>>>>> The contents on this field doesn't require any analysis or tokenized
>> but
>>>> I
>>>>> need this field in the queries and as well as output fields.
>>>>> 
>>>>> --
>>>>> Thanks & Regards,
>>>>> Yaswanth Kumar Konathala.
>>>>> yaswanth...@gmail.com
>>>> 
>> 
>> 

Reply via email to