Gotcha, thanks for the explanation. another small question if you dont mind, when deleting docs they arent actually removed, just tagged as deleted, and the old field/field type is still in the index until merged/optimized as well, wouldnt that cause almost the same conflicts until then?
On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson <erickerick...@gmail.com> wrote: > Doesn’t re-indexing a document just delete/replace…. > > It’s complicated. For the individual document, yes. The problem > comes because the field is inconsistent _between_ documents, and > segment merging blows things up. > > Consider. I have segment1 with documents indexed with the old > schema (String in this case). I change my schema and index the same > field as a text type. > > Eventually, a segment merge happens and these two segments get merged > into a single new segment. How should the field be handled? Should it > be defined as String or Text in the new segment? If you convert the docs > with a Text definition for the field to String, > you’d lose the ability to search for individual tokens. If you convert the > String to Text, you don’t have any guarantee that the information is even > available. > > This is just the tip of the iceberg in terms of trying to change the > definition of a field. Take the case of changing the analysis chain, > say you use a phonetic filter on a field then decide to remove it and > do not store the original. Erick might be encoded as “ENXY” so the > original data is simply not there to convert. Ditto removing a > stemmer, lowercasing, applying a regex, …... > > > From Mike McCandless: > > "This really is the difference between an index and a database: > we do not store, precisely, the original documents. We store > an efficient derived/computed index from them. Yes, Solr/ES > can add database-like behavior where they hold the true original > source of the document and use that to rebuild Lucene indices > over time. But Lucene really is just a "search index" and we > need to be free to make important improvements with time." > > And all that aside, you have to re-index all the docs anyway or > your search results will be inconsistent. So leaving aside the > impossible task of covering all the possibilities on the fly, it’s > better to plan on re-indexing…. > > Best, > Erick > > > > On Oct 16, 2020, at 3:16 PM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > "If you want to > > keep the same field name, you need to delete all of the > > documents in the index, change the schema, and reindex." > > > > actually doesnt re-indexing a document just delete/replace anyways > assuming > > the same id? > > > > On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > >> Just as a side note, > >> > >>> indexed="true" > >> If you are storing 32K message, you probably are not searching it as a > >> whole string. So, don't index it. You may also want to mark the field > >> as 'large' (and lazy): > >> > >> > https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties > >> > >> When you are going to make it a text field, you will probably be > >> having the same issues as well. > >> > >> And honestly, if you are not storing those fields to search, maybe you > >> need to consider the architecture. Maybe those fields do not need to > >> be in Solr at all, but in external systems. Solr (or any search > >> system) should not be your system of records since - as the other > >> reply showed - some of the answers are "reindex everything". > >> > >> Regards, > >> Alex. > >> > >> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar <yaswanth...@gmail.com> > >> wrote: > >>> > >>> I am using solr 8.2 > >>> > >>> Can I change the schema fieldtype from string to solr.TextField > >>> without indexing? > >>> > >>> <field name="messagetext" type="string" indexed="true" > >> stored="true"/> > >>> > >>> The reason is that string has only 32K char limit where as I am looking > >> to > >>> store more than 32K now. > >>> > >>> The contents on this field doesn't require any analysis or tokenized > but > >> I > >>> need this field in the queries and as well as output fields. > >>> > >>> -- > >>> Thanks & Regards, > >>> Yaswanth Kumar Konathala. > >>> yaswanth...@gmail.com > >> > >