Re: Faceting Word Count

2017-11-09 Thread Toke Eskildsen
On Wed, 2017-11-08 at 16:58 +0200, Wael Kader wrote: > Facets are taking around 1 minute to return data now. Can you verify if this is due to updates causing a new searcher to be opened or if it just takes that long? Easy way to test it to stop updating the index then do a few call with different

Re: Faceting Word Count

2017-11-08 Thread alessandro.benedetti
Apart from the performance, to get a "word cloud" from a subset of documents it is a slighly different problem than getting the facets out of it. If my understanding is correct, what you want is to extract the "significant terms" out of your results set.[1] Using faceting is a rough approximation

Re: Faceting Word Count

2017-11-08 Thread Wael Kader
Hi, I want to know the best option for getting word cloud in SOLR. Is it saving the data as multivalued, using vector, JSON faceting(didn't work with me)? Terms doesn't work because I can't provide any criteria. I don't mind changing the design but I need to know the best feasible way that won't

Re: Faceting Word Count

2017-11-08 Thread Emir Arnautović
Hi Wael, You can try out JSON faceting - it’s not just about rq/resp format, but it uses different implementation as well. In any case you will have to index documents differently in order to be able to use docValues. HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Faceting Word Count

2017-11-07 Thread Erick Erickson
bq: 10k as a max number of rows. This doesn't matter. In order to facet on the word count, Solr has to be prepared to facet on all possible docs. For all Solr knows, a _single_ document may contain every word so the size of the structure that contains the counters has to be prepared for N buckets,

Re: Faceting Word Count

2017-11-07 Thread Wael Kader
Hi, The whole index has 100M but when I add the criteria, it will filter the data to maybe 10k as a max number of rows. The facet isn't working when the total number of records in the index is 100M but it was working at 5M. I have social media & RSS data in the index and I am trying to get the wo

Re: Faceting Word Count

2017-11-06 Thread Jokin C
He said that it's using to get a word cloud, if it's not related to the search and it's a generic word cloud of the index, using the luke request handler to get the first 250 o 500 word could work. http://localhost:8983/solr/core/admin/luke?fl=text&numTerms=500&wt=json On Mon, Nov 6, 2017 at 4:4

Re: Faceting Word Count

2017-11-06 Thread Erick Erickson
_Why_ do you want to get the word counts? Faceting on all of the tokens for 100M docs isn't something Solr is ordinarily used for. As Emir says it'll take a huge amount of memory. You can use one of the function queries (termfreq IIRC) that will give you the count of any individual term you have an

Re: Faceting Word Count

2017-11-06 Thread Emir Arnautović
Hi Wael, You are faceting on analyzed field. This results in field being uninverted - fieldValueCache being built - on first call after every commit. This is both time and memory consuming (you can check in admin console in stats how much memory it took). What you need to do is to create multiv

Re: Faceting Word Count

2017-11-06 Thread Wael Kader
Hi, I am using a custom field. Below is the field definition. I am using this because I don't want stemming. Regards,

Re: Faceting Word Count

2017-11-06 Thread Emir Arnautović
Hi Wael, Can you provide your field definition and sample query. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 6 Nov 2017, at 08:30, Wael Kader wrote: > > Hello, > > I am having an index

Faceting Word Count

2017-11-05 Thread Wael Kader
Hello, I am having an index with around 100 Million documents. I have a multivalued column that I am saving big chunks of text data in. It has around 20 GB of RAM and 4 CPU's. I was doing faceting on it to get word cloud but it was taking around 1 second to retrieve when the data was 5-10 Million