Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

Ali Nazemian Wed, 22 Jul 2015 03:59:37 -0700

Dear Alessandro,
Thank you very much.
Yeah sure it is far better, I did not think of that ;)


Best regards.

On Wed, Jul 22, 2015 at 2:31 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> In addition to Erick answer :
> I agree 100% on your observations, but I would add that actually, DocValues
> should be provided for all not tokenized fields instead of for all not
> analysed fields.
>
> In the end there will be not practical difference if you build the
> docValues structures for fields that have a keywordTokenizer ( and for
> example a lowercaseTokenFilter following) .
> Some charFilters before and simple token filter after can actually be
> useful when sorting or faceting ( let's simplify those 2 as main uses for
> DocValues) .
>
> Of course relaxing the use of DocValues from primitive types to analysed
> types can be problematic, but there are scenarios where can be a good fit.
> I should study a little bit more in deep, what are the current constraints
> that are blocking docValues to be applied to analysed fields.
>
> Cheers
>
>
> Cheers
>
> 2015-07-21 5:38 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:
>
> > This really seems like an XY problem. _Why_ are you faceting on a
> > tokenized field?
> > What are you really trying to accomplish? Because faceting on a
> generalized
> > content field that's an analyzed field is often A Bad Thing. Try going
> > into the
> > admin UI>> Schema Browser for that field, and you'll see how many unique
> > terms
> > you have in that field. Faceting on that many unique terms is rarely
> > useful to the
> > end user, so my suspicion is that you're not doing what you think you
> > are. Or you
> > have an unusual use-case. Either way, we need to understand what use-case
> > you're trying to support in order to respond helpfully.
> >
> > You say that using facet.enum works, this is very surprising. That method
> > uses
> > the filterCache to create a bitset for each unique term. Which is totally
> > incompatible with the uninverted field error you're reporting, so I
> > clearly don't
> > understand something about your setup. Are you _sure_?
> >
> > Best,
> > Erick
> >
> > On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian <alinazem...@gmail.com>
> > wrote:
> > > Dear Toke and Davidphilip,
> > > Hi,
> > > The fieldtype text_fa has some custom language specific normalizer and
> > > charfilter, here is the schema.xml value related for this field:
> > > <fieldType name="text_fa" class="solr.TextField"
> > positionIncrementGap="100">
> > >       <analyzer type="index">
> > >         <charFilter
> > > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter
> > >
> class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="lang/stopwords_fa.txt" />
> > >       </analyzer>
> > >       <analyzer type="query">
> > >         <charFilter
> > > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter
> > >
> class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="lang/stopwords_fa.txt" />
> > >       </analyzer>
> > >     </fieldType>
> > >
> > > I did try the facet.method=enum and it works fine. Did you mean that
> > > actually applying facet on analyzed field is wrong?
> > >
> > > Best regards.
> > >
> > > On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen <
> t...@statsbiblioteket.dk>
> > > wrote:
> > >
> > >> Ali Nazemian <alinazem...@gmail.com> wrote:
> > >> > I have a collection of 1.6m documents in Solr 5.2.1.
> > >> > [...]
> > >> > Caused by: java.lang.IllegalStateException: Too many values for
> > >> > UnInvertedField faceting on field content
> > >> > [...]
> > >> > <field name="content" type="text_fa" stored="true" indexed="true"
> > >> > default="noval" termVectors="true" termPositions="true"
> > >> > termOffsets="true"/>
> > >>
> > >> You are hitting an internal limit in Solr. As davidphilip tells you,
> the
> > >> solution is docValues, but they cannot be enabled for text fields. You
> > need
> > >> String fields, but the name of your field suggests that you need
> > >> analyzation & tokenization, which cannot be done on String fields.
> > >>
> > >> > Would you please help me to solve this problem?
> > >>
> > >> With the information we have, it does not seem to be easy to solve: It
> > >> seems like you want to facet on all terms in your index. As they need
> > to be
> > >> String (to use docValues), you would have to do all the splitting on
> > white
> > >> space, normalization etc. outside of Solr.
> > >>
> > >> - Toke Eskildsen
> > >>
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
A.Nazemian

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

Reply via email to