In addition to Erick answer :
I agree 100% on your observations, but I would add that actually, DocValues
should be provided for all not tokenized fields instead of for all not
analysed fields.

In the end there will be not practical difference if you build the
docValues structures for fields that have a keywordTokenizer ( and for
example a lowercaseTokenFilter following) .
Some charFilters before and simple token filter after can actually be
useful when sorting or faceting ( let's simplify those 2 as main uses for
DocValues) .

Of course relaxing the use of DocValues from primitive types to analysed
types can be problematic, but there are scenarios where can be a good fit.
I should study a little bit more in deep, what are the current constraints
that are blocking docValues to be applied to analysed fields.

Cheers


Cheers

2015-07-21 5:38 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:

> This really seems like an XY problem. _Why_ are you faceting on a
> tokenized field?
> What are you really trying to accomplish? Because faceting on a generalized
> content field that's an analyzed field is often A Bad Thing. Try going
> into the
> admin UI>> Schema Browser for that field, and you'll see how many unique
> terms
> you have in that field. Faceting on that many unique terms is rarely
> useful to the
> end user, so my suspicion is that you're not doing what you think you
> are. Or you
> have an unusual use-case. Either way, we need to understand what use-case
> you're trying to support in order to respond helpfully.
>
> You say that using facet.enum works, this is very surprising. That method
> uses
> the filterCache to create a bitset for each unique term. Which is totally
> incompatible with the uninverted field error you're reporting, so I
> clearly don't
> understand something about your setup. Are you _sure_?
>
> Best,
> Erick
>
> On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
> > Dear Toke and Davidphilip,
> > Hi,
> > The fieldtype text_fa has some custom language specific normalizer and
> > charfilter, here is the schema.xml value related for this field:
> > <fieldType name="text_fa" class="solr.TextField"
> positionIncrementGap="100">
> >       <analyzer type="index">
> >         <charFilter
> > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter
> > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="lang/stopwords_fa.txt" />
> >       </analyzer>
> >       <analyzer type="query">
> >         <charFilter
> > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter
> > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="lang/stopwords_fa.txt" />
> >       </analyzer>
> >     </fieldType>
> >
> > I did try the facet.method=enum and it works fine. Did you mean that
> > actually applying facet on analyzed field is wrong?
> >
> > Best regards.
> >
> > On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
> > wrote:
> >
> >> Ali Nazemian <alinazem...@gmail.com> wrote:
> >> > I have a collection of 1.6m documents in Solr 5.2.1.
> >> > [...]
> >> > Caused by: java.lang.IllegalStateException: Too many values for
> >> > UnInvertedField faceting on field content
> >> > [...]
> >> > <field name="content" type="text_fa" stored="true" indexed="true"
> >> > default="noval" termVectors="true" termPositions="true"
> >> > termOffsets="true"/>
> >>
> >> You are hitting an internal limit in Solr. As davidphilip tells you, the
> >> solution is docValues, but they cannot be enabled for text fields. You
> need
> >> String fields, but the name of your field suggests that you need
> >> analyzation & tokenization, which cannot be done on String fields.
> >>
> >> > Would you please help me to solve this problem?
> >>
> >> With the information we have, it does not seem to be easy to solve: It
> >> seems like you want to facet on all terms in your index. As they need
> to be
> >> String (to use docValues), you would have to do all the splitting on
> white
> >> space, normalization etc. outside of Solr.
> >>
> >> - Toke Eskildsen
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to