Dear Alessandro, Thank you very much. Yeah sure it is far better, I did not think of that ;)
Best regards. On Wed, Jul 22, 2015 at 2:31 PM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > In addition to Erick answer : > I agree 100% on your observations, but I would add that actually, DocValues > should be provided for all not tokenized fields instead of for all not > analysed fields. > > In the end there will be not practical difference if you build the > docValues structures for fields that have a keywordTokenizer ( and for > example a lowercaseTokenFilter following) . > Some charFilters before and simple token filter after can actually be > useful when sorting or faceting ( let's simplify those 2 as main uses for > DocValues) . > > Of course relaxing the use of DocValues from primitive types to analysed > types can be problematic, but there are scenarios where can be a good fit. > I should study a little bit more in deep, what are the current constraints > that are blocking docValues to be applied to analysed fields. > > Cheers > > > Cheers > > 2015-07-21 5:38 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > > > This really seems like an XY problem. _Why_ are you faceting on a > > tokenized field? > > What are you really trying to accomplish? Because faceting on a > generalized > > content field that's an analyzed field is often A Bad Thing. Try going > > into the > > admin UI>> Schema Browser for that field, and you'll see how many unique > > terms > > you have in that field. Faceting on that many unique terms is rarely > > useful to the > > end user, so my suspicion is that you're not doing what you think you > > are. Or you > > have an unusual use-case. Either way, we need to understand what use-case > > you're trying to support in order to respond helpfully. > > > > You say that using facet.enum works, this is very surprising. That method > > uses > > the filterCache to create a bitset for each unique term. Which is totally > > incompatible with the uninverted field error you're reporting, so I > > clearly don't > > understand something about your setup. Are you _sure_? > > > > Best, > > Erick > > > > On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian <alinazem...@gmail.com> > > wrote: > > > Dear Toke and Davidphilip, > > > Hi, > > > The fieldtype text_fa has some custom language specific normalizer and > > > charfilter, here is the schema.xml value related for this field: > > > <fieldType name="text_fa" class="solr.TextField" > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <charFilter > > > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/> > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter > > > > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="lang/stopwords_fa.txt" /> > > > </analyzer> > > > <analyzer type="query"> > > > <charFilter > > > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/> > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter > > > > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="lang/stopwords_fa.txt" /> > > > </analyzer> > > > </fieldType> > > > > > > I did try the facet.method=enum and it works fine. Did you mean that > > > actually applying facet on analyzed field is wrong? > > > > > > Best regards. > > > > > > On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen < > t...@statsbiblioteket.dk> > > > wrote: > > > > > >> Ali Nazemian <alinazem...@gmail.com> wrote: > > >> > I have a collection of 1.6m documents in Solr 5.2.1. > > >> > [...] > > >> > Caused by: java.lang.IllegalStateException: Too many values for > > >> > UnInvertedField faceting on field content > > >> > [...] > > >> > <field name="content" type="text_fa" stored="true" indexed="true" > > >> > default="noval" termVectors="true" termPositions="true" > > >> > termOffsets="true"/> > > >> > > >> You are hitting an internal limit in Solr. As davidphilip tells you, > the > > >> solution is docValues, but they cannot be enabled for text fields. You > > need > > >> String fields, but the name of your field suggests that you need > > >> analyzation & tokenization, which cannot be done on String fields. > > >> > > >> > Would you please help me to solve this problem? > > >> > > >> With the information we have, it does not seem to be easy to solve: It > > >> seems like you want to facet on all terms in your index. As they need > > to be > > >> String (to use docValues), you would have to do all the splitting on > > white > > >> space, normalization etc. outside of Solr. > > >> > > >> - Toke Eskildsen > > >> > > > > > > > > > > > > -- > > > A.Nazemian > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- A.Nazemian