Re: Best field definition which is only use for filter query.

Erik Hatcher Wed, 22 Jul 2020 05:09:16 -0700

Wouldn’t a “string” field be as good, if not better, for this use case?


> On Jul 22, 2020, at 08:02, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> fq clauses are just like the q clause except for two things:
> 1> no scoring is done
> 2> the entire result set _can_ be stored in the filterCache.
> 
> so if a value isn’t indexed, it can’t be used in either an fq or q clause.
> 
> The thread you reference is under the assumption (and this is the default in 
> some versions of Solr) that docValues=true. And yes, that will be very, very 
> slow. Think “table scan”.
> 
> Also, the default pint type is not as efficient for single-value searches 
> like this, the trie fields are better. Trie support will be kept until 
> there’s a good alternative for the single-value lookup with pint.
> 
> So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. 
> If you have neither docValues=true nor index=true, the query won’t work at 
> all. You’ll have to adequately size your hardware if index size is a concern.
> 
> Best,
> Erick
> 
>> On Jul 22, 2020, at 7:18 AM, Raj Yadav <rajkum...@cse.ism.ac.in> wrote:
>> 
>> Below is the sample document
>> 
>> 
>> 
>> 
>> 
>> *{"filedA": 1,"filedB": "","filedC": "Sher","filedD":
>> "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}*
>> As you can see we have 5 fields and one of the field names is "rules".
>> Field Definition:
>> <field name="geo_rules" type="pint" indexed="true" stored="false"
>> multiValued="true">
>> 
>> The only operation that we do on this field is filtering.
>> example: => fq=rules:203
>> 
>> *Problems:*
>> 1. The problem over here is, for `rules` field we have
>> marked indexed="true" and it is consuming a large percentage of total index
>> size.
>> 2. Another problem is, a large chunk of our document update request is
>> mainly for this(rules) field.
>> 
>> If I marked `indexed=false` for this field (by default pint field type have
>> docValue=true)
>> *<field name="geo_rules" type="pint" indexed="false" stored="false"
>> multiValued="true">*
>> Then following thread is suggesting that filter operation (which is also
>> one kind of search operation) will be very slow
>> https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html
>> 
>> Is there a way to not keep indexed=true for `rules` field and still does
>> not impact our search(filtering performance). Or any other solution which
>> can help in reducing our total index size and also does not increase
>> search(filter) latency
>> 
>> Regards,
>> Raj
>

Re: Best field definition which is only use for filter query.

Reply via email to