Re: does copyFields increase indexe size ?

David Hastings Thu, 26 Dec 2019 13:38:48 -0800

The field is stored somewhere


> On Dec 26, 2019, at 3:22 PM, Nicolas Paris <nicolas.pa...@riseup.net> wrote:
> 
> Hi Eric
> 
> Below a part of the managed-schema. There is 1k section* fields. The
> second experience, I removed the copyField, droped the collection and
> re-indexed the whole. To mesure the index size, I went to solr-cloud and
> looked in the cloud part: 40GO per shard. I also look at the folder
> size. I made some tests and the _text_ field is indexed.
> 
>    <field name="_text_" type="text_fr" indexed="true" stored="false" 
> multiValued="true"/> 
>    <dynamicField name="section*" type="text_fr" indexed="true" stored="true" 
> multiValued="true"/>
>    <copyField source="section*" dest="_text_"/>
> 
>    <fieldType name="text_fr" class="solr.TextField" 
> positionIncrementGap="100">
> 
>    <analyzer type="index">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
> 
>    <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" 
> replacement=" " replace="all"/>
>      <filter class="solr.ICUFoldingFilterFactory"/>
>        <!-- removes l', etc -->
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
> articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_fr.txt" format="snowball" />
>        <filter class="solr.FrenchLightStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" 
> synonyms="synonyms-fr.txt" ignoreCase="true" expand="true"/>
>    <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" 
> replacement=" " replace="all"/>
>      <filter class="solr.ICUFoldingFilterFactory"/>
>        <!-- removes l', etc -->
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
> articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_fr.txt" format="snowball" />
>        <filter class="solr.FrenchLightStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> 
> 
> 
> 
>> On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote:
>> This simply cannot be true unless the destination copyField is 
>> indexed=false, docValues=false stored=false. I.e. “some circumstances” means 
>> there’s really no use in using the copyField in the first place. I suppose 
>> that if you don’t store any term vectors, no position information nothing 
>> except, say, the terms then maybe you’ll have extremely minimal size. But 
>> even in that case, I’d use the original field in an “fq” clause which 
>> doesn’t use any scoring in place of using the copyField.
>> 
>> Each field is stored in a separate part of the relevant files (.tim, .pos, 
>> etc). Term frequencies are kept on a _per field_ basis for instance.
>> 
>> So this pretty much has to be small sample size or other measurement error.
>> 
>> Best,
>> Erick
>> 
>>>> On Dec 26, 2019, at 9:27 AM, Nicolas Paris <nicolas.pa...@riseup.net> 
>>>> wrote:
>>> 
>>> Anyway, that´s good news copy field does not increase indexe size in
>>> some circumstance:
>>> - the copied fields and the target field share the same datatype
>>> - the target field is not stored
>>> 
>>> this is tested on text fields
>>> 
>>> 
>>> On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
>>>> 
>>>> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
>>>>> #2 you initially said you were talking about 1k documents. 
>>>> 
>>>> Hi Dave. Again, sorry for the confusion. This is 1k fields
>>>> (general_text), over 50M large  documents copied into one _text_ field. 
>>>> 4 shards, 40GB per shard in both case, with/without the _text_ field
>>>> 
>>>>> 
>>>>>> On Dec 25, 2019, at 3:07 AM, Nicolas Paris <nicolas.pa...@riseup.net> 
>>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> If you are redoing the indexing after changing the schema and
>>>>>>> reloading/restarting, then you can ignore me.
>>>>>> 
>>>>>> I am sorry to say that I have to ignore you. Indeed, my tests include
>>>>>> recreating the collection from scratch - with and without the copy
>>>>>> fields.
>>>>>> In both cases the index size is the same ! (while the _text_ field is
>>>>>> working correctly)
>>>>>> 
>>>>>>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
>>>>>>>> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
>>>>>>>> Do you mean "copy fields" is only an action of changing the schema ?
>>>>>>>> I was thinking it was adding a new field and eventually a new index to
>>>>>>>> the collection
>>>>>>> 
>>>>>>> The copy that copyField does happens at index time.  Reindexing is 
>>>>>>> required
>>>>>>> after changing the schema, or nothing happens.
>>>>>>> 
>>>>>>> If you are redoing the indexing after changing the schema and
>>>>>>> reloading/restarting, then you can ignore me.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Shawn
>>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> nicolas
>>>>> 
>>>> 
>>>> -- 
>>>> nicolas
>>>> 
>>> 
>>> -- 
>>> nicolas
>> 
> 
> -- 
> nicolas

Re: does copyFields increase indexe size ?

Reply via email to