Hi Wael, You can try out JSON faceting - it’s not just about rq/resp format, but it uses different implementation as well. In any case you will have to index documents differently in order to be able to use docValues.
HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 7 Nov 2017, at 09:26, Wael Kader <w...@softech-lb.com> wrote: > > Hi, > > The whole index has 100M but when I add the criteria, it will filter the > data to maybe 10k as a max number of rows. > The facet isn't working when the total number of records in the index is > 100M but it was working at 5M. > > I have social media & RSS data in the index and I am trying to get the word > count for a specific user on specific date intervals. > > Regards, > Wael > > On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> _Why_ do you want to get the word counts? Faceting on all of the >> tokens for 100M docs isn't something Solr is ordinarily used for. As >> Emir says it'll take a huge amount of memory. You can use one of the >> function queries (termfreq IIRC) that will give you the count of any >> individual term you have and will be very fast. >> >> But getting all of the word counts in the index is probably not >> something I'd use Solr for. >> >> This may be an XY problem, you're asking how to do something specific >> (X) without explaining what the problem you're trying to solve is (Y). >> Perhaps there's another way to accomplish (Y) if we knew more about >> what it is. >> >> Best, >> Erick >> >> >> >> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović >> <emir.arnauto...@sematext.com> wrote: >>> Hi Wael, >>> You are faceting on analyzed field. This results in field being >> uninverted - fieldValueCache being built - on first call after every >> commit. This is both time and memory consuming (you can check in admin >> console in stats how much memory it took). >>> What you need to do is to create multivalue string field (not text) and >> parse values (do analysis steps) on client side and store it like that. >> This will allow you to enable docValues on that field and avoid building >> fieldValueCache. >>> >>> HTH, >>> Emir >>> -- >>> Monitoring - Log Management - Alerting - Anomaly Detection >>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>> >>> >>> >>>> On 6 Nov 2017, at 13:06, Wael Kader <w...@softech-lb.com> wrote: >>>> >>>> Hi, >>>> >>>> I am using a custom field. Below is the field definition. >>>> I am using this because I don't want stemming. >>>> >>>> >>>> <fieldType name="text_no_stem2" class="solr.TextField" >>>> positionIncrementGap="100"> >>>> <analyzer type="index"> >>>> <charFilter class="solr.MappingCharFilterFactory" >>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> >>>> <filter class="solr.StopFilterFactory" >>>> ignoreCase="true" >>>> words="stopwords.txt" >>>> enablePositionIncrements="true" >>>> /> >>>> <filter class="solr.WordDelimiterFilterFactory" >>>> protected="protwords.txt" >>>> generateWordParts="0" >>>> generateNumberParts="1" >>>> catenateWords="1" >>>> catenateNumbers="1" >>>> catenateAll="0" >>>> splitOnCaseChange="1" >>>> preserveOriginal="1"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> >>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <charFilter class="solr.MappingCharFilterFactory" >>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" >>>> ignoreCase="true" expand="true"/> >>>> <filter class="solr.StopFilterFactory" >>>> ignoreCase="true" >>>> words="stopwords.txt" >>>> enablePositionIncrements="true" >>>> /> >>>> <!--ORIGINAL generateNumberParts="1"--> >>>> <filter class="solr.WordDelimiterFilterFactory" >>>> protected="protwords.txt" >>>> generateWordParts="0" >>>> catenateWords="0" >>>> catenateNumbers="0" >>>> catenateAll="0" >>>> splitOnCaseChange="1" >>>> preserveOriginal="1"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <!-- ORIGINAL filter class="solr.SnowballPorterFilterFactory" >>>> language="English" protected="protwords.txt"/--> >>>> <!-- Webel: switch off Porter-stemmer algorithm to enforce whole >>>> word match --> >>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> >>>> Regards, >>>> Wael >>>> >>>> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović < >>>> emir.arnauto...@sematext.com> wrote: >>>> >>>>> Hi Wael, >>>>> Can you provide your field definition and sample query. >>>>> >>>>> Thanks, >>>>> Emir >>>>> -- >>>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>>> Solr & Elasticsearch Consulting Support Training - >> http://sematext.com/ >>>>> >>>>> >>>>> >>>>>> On 6 Nov 2017, at 08:30, Wael Kader <w...@softech-lb.com> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I am having an index with around 100 Million documents. >>>>>> I have a multivalued column that I am saving big chunks of text data >> in. >>>>> It >>>>>> has around 20 GB of RAM and 4 CPU's. >>>>>> >>>>>> I was doing faceting on it to get word cloud but it was taking around >> 1 >>>>>> second to retrieve when the data was 5-10 Million . >>>>>> Now I have more data and its taking minutes to get the results (that >> is >>>>> if >>>>>> it gets it and SOLR doesn't crash). Whats the best way to make it run >> or >>>>>> maybe its not scalable to make it run on my current schema and design >>>>> with >>>>>> News articles. >>>>>> >>>>>> I am looking to find the best solution for this. Maybe create another >>>>> index >>>>>> to split the data while inserting it or maybe if I change some >> settings >>>>> in >>>>>> SolrConfig or add some RAM, it would perform better. >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Wael >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Wael >>> >> > > > > -- > Regards, > Wael