Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

Erick Erickson Mon, 20 Jul 2015 21:39:54 -0700

This really seems like an XY problem. _Why_ are you faceting on a
tokenized field?
What are you really trying to accomplish? Because faceting on a generalized
content field that's an analyzed field is often A Bad Thing. Try going into the
admin UI>> Schema Browser for that field, and you'll see how many unique terms
you have in that field. Faceting on that many unique terms is rarely
useful to the
end user, so my suspicion is that you're not doing what you think you
are. Or you
have an unusual use-case. Either way, we need to understand what use-case
you're trying to support in order to respond helpfully.


You say that using facet.enum works, this is very surprising. That method uses
the filterCache to create a bitset for each unique term. Which is totally
incompatible with the uninverted field error you're reporting, so I
clearly don't
understand something about your setup. Are you _sure_?

Best,
Erick

On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian <alinazem...@gmail.com> wrote:
> Dear Toke and Davidphilip,
> Hi,
> The fieldtype text_fa has some custom language specific normalizer and
> charfilter, here is the schema.xml value related for this field:
> <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter
> class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_fa.txt" />
>       </analyzer>
>       <analyzer type="query">
>         <charFilter
> class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_fa.txt" />
>       </analyzer>
>     </fieldType>
>
> I did try the facet.method=enum and it works fine. Did you mean that
> actually applying facet on analyzed field is wrong?
>
> Best regards.
>
> On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
>
>> Ali Nazemian <alinazem...@gmail.com> wrote:
>> > I have a collection of 1.6m documents in Solr 5.2.1.
>> > [...]
>> > Caused by: java.lang.IllegalStateException: Too many values for
>> > UnInvertedField faceting on field content
>> > [...]
>> > <field name="content" type="text_fa" stored="true" indexed="true"
>> > default="noval" termVectors="true" termPositions="true"
>> > termOffsets="true"/>
>>
>> You are hitting an internal limit in Solr. As davidphilip tells you, the
>> solution is docValues, but they cannot be enabled for text fields. You need
>> String fields, but the name of your field suggests that you need
>> analyzation & tokenization, which cannot be done on String fields.
>>
>> > Would you please help me to solve this problem?
>>
>> With the information we have, it does not seem to be easy to solve: It
>> seems like you want to facet on all terms in your index. As they need to be
>> String (to use docValues), you would have to do all the splitting on white
>> space, normalization etc. outside of Solr.
>>
>> - Toke Eskildsen
>>
>
>
>
> --
> A.Nazemian

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

Reply via email to