Hi, The whole index has 100M but when I add the criteria, it will filter the data to maybe 10k as a max number of rows. The facet isn't working when the total number of records in the index is 100M but it was working at 5M.
I have social media & RSS data in the index and I am trying to get the word count for a specific user on specific date intervals. Regards, Wael On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson <erickerick...@gmail.com> wrote: > _Why_ do you want to get the word counts? Faceting on all of the > tokens for 100M docs isn't something Solr is ordinarily used for. As > Emir says it'll take a huge amount of memory. You can use one of the > function queries (termfreq IIRC) that will give you the count of any > individual term you have and will be very fast. > > But getting all of the word counts in the index is probably not > something I'd use Solr for. > > This may be an XY problem, you're asking how to do something specific > (X) without explaining what the problem you're trying to solve is (Y). > Perhaps there's another way to accomplish (Y) if we knew more about > what it is. > > Best, > Erick > > > > On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović > <emir.arnauto...@sematext.com> wrote: > > Hi Wael, > > You are faceting on analyzed field. This results in field being > uninverted - fieldValueCache being built - on first call after every > commit. This is both time and memory consuming (you can check in admin > console in stats how much memory it took). > > What you need to do is to create multivalue string field (not text) and > parse values (do analysis steps) on client side and store it like that. > This will allow you to enable docValues on that field and avoid building > fieldValueCache. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > >> On 6 Nov 2017, at 13:06, Wael Kader <w...@softech-lb.com> wrote: > >> > >> Hi, > >> > >> I am using a custom field. Below is the field definition. > >> I am using this because I don't want stemming. > >> > >> > >> <fieldType name="text_no_stem2" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <charFilter class="solr.MappingCharFilterFactory" > >> mapping="mapping-ISOLatin1Accent.txt"/> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <filter class="solr.WordDelimiterFilterFactory" > >> protected="protwords.txt" > >> generateWordParts="0" > >> generateNumberParts="1" > >> catenateWords="1" > >> catenateNumbers="1" > >> catenateAll="0" > >> splitOnCaseChange="1" > >> preserveOriginal="1"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> > >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <charFilter class="solr.MappingCharFilterFactory" > >> mapping="mapping-ISOLatin1Accent.txt"/> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <!--ORIGINAL generateNumberParts="1"--> > >> <filter class="solr.WordDelimiterFilterFactory" > >> protected="protwords.txt" > >> generateWordParts="0" > >> catenateWords="0" > >> catenateNumbers="0" > >> catenateAll="0" > >> splitOnCaseChange="1" > >> preserveOriginal="1"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <!-- ORIGINAL filter class="solr.SnowballPorterFilterFactory" > >> language="English" protected="protwords.txt"/--> > >> <!-- Webel: switch off Porter-stemmer algorithm to enforce whole > >> word match --> > >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> > >> Regards, > >> Wael > >> > >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović < > >> emir.arnauto...@sematext.com> wrote: > >> > >>> Hi Wael, > >>> Can you provide your field definition and sample query. > >>> > >>> Thanks, > >>> Emir > >>> -- > >>> Monitoring - Log Management - Alerting - Anomaly Detection > >>> Solr & Elasticsearch Consulting Support Training - > http://sematext.com/ > >>> > >>> > >>> > >>>> On 6 Nov 2017, at 08:30, Wael Kader <w...@softech-lb.com> wrote: > >>>> > >>>> Hello, > >>>> > >>>> I am having an index with around 100 Million documents. > >>>> I have a multivalued column that I am saving big chunks of text data > in. > >>> It > >>>> has around 20 GB of RAM and 4 CPU's. > >>>> > >>>> I was doing faceting on it to get word cloud but it was taking around > 1 > >>>> second to retrieve when the data was 5-10 Million . > >>>> Now I have more data and its taking minutes to get the results (that > is > >>> if > >>>> it gets it and SOLR doesn't crash). Whats the best way to make it run > or > >>>> maybe its not scalable to make it run on my current schema and design > >>> with > >>>> News articles. > >>>> > >>>> I am looking to find the best solution for this. Maybe create another > >>> index > >>>> to split the data while inserting it or maybe if I change some > settings > >>> in > >>>> SolrConfig or add some RAM, it would perform better. > >>>> > >>>> -- > >>>> Regards, > >>>> Wael > >>> > >>> > >> > >> > >> -- > >> Regards, > >> Wael > > > -- Regards, Wael