The field is stored somewhere
> On Dec 26, 2019, at 3:22 PM, Nicolas Paris <nicolas.pa...@riseup.net> wrote: > > Hi Eric > > Below a part of the managed-schema. There is 1k section* fields. The > second experience, I removed the copyField, droped the collection and > re-indexed the whole. To mesure the index size, I went to solr-cloud and > looked in the cloud part: 40GO per shard. I also look at the folder > size. I made some tests and the _text_ field is indexed. > > <field name="_text_" type="text_fr" indexed="true" stored="false" > multiValued="true"/> > <dynamicField name="section*" type="text_fr" indexed="true" stored="true" > multiValued="true"/> > <copyField source="section*" dest="_text_"/> > > <fieldType name="text_fr" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" > replacement=" " replace="all"/> > <filter class="solr.ICUFoldingFilterFactory"/> > <!-- removes l', etc --> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_fr.txt" format="snowball" /> > <filter class="solr.FrenchLightStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms-fr.txt" ignoreCase="true" expand="true"/> > <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" > replacement=" " replace="all"/> > <filter class="solr.ICUFoldingFilterFactory"/> > <!-- removes l', etc --> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_fr.txt" format="snowball" /> > <filter class="solr.FrenchLightStemFilterFactory"/> > </analyzer> > </fieldType> > > > > > >> On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote: >> This simply cannot be true unless the destination copyField is >> indexed=false, docValues=false stored=false. I.e. “some circumstances” means >> there’s really no use in using the copyField in the first place. I suppose >> that if you don’t store any term vectors, no position information nothing >> except, say, the terms then maybe you’ll have extremely minimal size. But >> even in that case, I’d use the original field in an “fq” clause which >> doesn’t use any scoring in place of using the copyField. >> >> Each field is stored in a separate part of the relevant files (.tim, .pos, >> etc). Term frequencies are kept on a _per field_ basis for instance. >> >> So this pretty much has to be small sample size or other measurement error. >> >> Best, >> Erick >> >>>> On Dec 26, 2019, at 9:27 AM, Nicolas Paris <nicolas.pa...@riseup.net> >>>> wrote: >>> >>> Anyway, that´s good news copy field does not increase indexe size in >>> some circumstance: >>> - the copied fields and the target field share the same datatype >>> - the target field is not stored >>> >>> this is tested on text fields >>> >>> >>> On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: >>>> >>>> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: >>>>> #2 you initially said you were talking about 1k documents. >>>> >>>> Hi Dave. Again, sorry for the confusion. This is 1k fields >>>> (general_text), over 50M large documents copied into one _text_ field. >>>> 4 shards, 40GB per shard in both case, with/without the _text_ field >>>> >>>>> >>>>>> On Dec 25, 2019, at 3:07 AM, Nicolas Paris <nicolas.pa...@riseup.net> >>>>>> wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> If you are redoing the indexing after changing the schema and >>>>>>> reloading/restarting, then you can ignore me. >>>>>> >>>>>> I am sorry to say that I have to ignore you. Indeed, my tests include >>>>>> recreating the collection from scratch - with and without the copy >>>>>> fields. >>>>>> In both cases the index size is the same ! (while the _text_ field is >>>>>> working correctly) >>>>>> >>>>>>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: >>>>>>>> On 12/24/2019 5:11 PM, Nicolas Paris wrote: >>>>>>>> Do you mean "copy fields" is only an action of changing the schema ? >>>>>>>> I was thinking it was adding a new field and eventually a new index to >>>>>>>> the collection >>>>>>> >>>>>>> The copy that copyField does happens at index time. Reindexing is >>>>>>> required >>>>>>> after changing the schema, or nothing happens. >>>>>>> >>>>>>> If you are redoing the indexing after changing the schema and >>>>>>> reloading/restarting, then you can ignore me. >>>>>>> >>>>>>> Thanks, >>>>>>> Shawn >>>>>>> >>>>>> >>>>>> -- >>>>>> nicolas >>>>> >>>> >>>> -- >>>> nicolas >>>> >>> >>> -- >>> nicolas >> > > -- > nicolas