Gotcha, thanks for the explanation.  another small question if you
dont mind, when deleting docs they arent actually removed, just tagged as
deleted, and the old field/field type is still in the index until
merged/optimized as well, wouldnt that cause almost the same conflicts
until then?

On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Doesn’t re-indexing a document just delete/replace….
>
> It’s complicated. For the individual document, yes. The problem
> comes because the field is inconsistent _between_ documents, and
> segment merging blows things up.
>
> Consider. I have segment1 with documents indexed with the old
> schema (String in this case). I  change my schema and index the same
> field as a text type.
>
> Eventually, a segment merge happens and these two segments get merged
> into a single new segment. How should the field be handled? Should it
> be defined as String or Text in the new segment? If you convert the docs
> with a Text definition for the field to String,
> you’d lose the ability to search for individual tokens. If you convert the
> String to Text, you don’t have any guarantee that the information is even
> available.
>
> This is just the tip of the iceberg in terms of trying to change the
> definition of a field. Take the case of changing the analysis chain,
> say you use a phonetic filter on a field then decide to remove it and
> do not store the original. Erick might be encoded as “ENXY” so the
> original data is simply not there to convert. Ditto removing a
> stemmer, lowercasing, applying a regex, …...
>
>
> From Mike McCandless:
>
> "This really is the difference between an index and a database:
>  we do not store, precisely, the original documents.  We store
> an efficient derived/computed index from them.  Yes, Solr/ES
> can add database-like behavior where they hold the true original
> source of the document and use that to rebuild Lucene indices
> over time.  But Lucene really is just a "search index" and we
> need to be free to make important improvements with time."
>
> And all that aside, you have to re-index all the docs anyway or
> your search results will be inconsistent. So leaving aside the
> impossible task of covering all the possibilities on the fly, it’s
> better to plan on re-indexing….
>
> Best,
> Erick
>
>
> > On Oct 16, 2020, at 3:16 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > "If you want to
> > keep the same field name, you need to delete all of the
> > documents in the index, change the schema, and reindex."
> >
> > actually doesnt re-indexing a document just delete/replace anyways
> assuming
> > the same id?
> >
> > On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Just as a side note,
> >>
> >>> indexed="true"
> >> If you are storing 32K message, you probably are not searching it as a
> >> whole string. So, don't index it. You may also want to mark the field
> >> as 'large' (and lazy):
> >>
> >>
> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
> >>
> >> When you are going to make it a text field, you will probably be
> >> having the same issues as well.
> >>
> >> And honestly, if you are not storing those fields to search, maybe you
> >> need to consider the architecture. Maybe those fields do not need to
> >> be in Solr at all, but in external systems. Solr (or any search
> >> system) should not be your system of records since - as the other
> >> reply showed - some of the answers are "reindex everything".
> >>
> >> Regards,
> >>   Alex.
> >>
> >> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar <yaswanth...@gmail.com>
> >> wrote:
> >>>
> >>> I am using solr 8.2
> >>>
> >>> Can I change the schema fieldtype from string to solr.TextField
> >>> without indexing?
> >>>
> >>>    <field name="messagetext" type="string" indexed="true"
> >> stored="true"/>
> >>>
> >>> The reason is that string has only 32K char limit where as I am looking
> >> to
> >>> store more than 32K now.
> >>>
> >>> The contents on this field doesn't require any analysis or tokenized
> but
> >> I
> >>> need this field in the queries and as well as output fields.
> >>>
> >>> --
> >>> Thanks & Regards,
> >>> Yaswanth Kumar Konathala.
> >>> yaswanth...@gmail.com
> >>
>
>

Reply via email to