On Wed, 2017-11-08 at 16:58 +0200, Wael Kader wrote:
> Facets are taking around 1 minute to return data now.
Can you verify if this is due to updates causing a new searcher to be
opened or if it just takes that long? Easy way to test it to stop
updating the index then do a few call with different
Apart from the performance, to get a "word cloud" from a subset of documents
it is a slighly different problem than getting the facets out of it.
If my understanding is correct, what you want is to extract the "significant
terms" out of your results set.[1]
Using faceting is a rough approximation
Hi,
I want to know the best option for getting word cloud in SOLR.
Is it saving the data as multivalued, using vector, JSON faceting(didn't
work with me)? Terms doesn't work because I can't provide any criteria.
I don't mind changing the design but I need to know the best feasible way
that won't
Hi Wael,
You can try out JSON faceting - it’s not just about rq/resp format, but it uses
different implementation as well. In any case you will have to index documents
differently in order to be able to use docValues.
HTH
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr &
bq: 10k as a max number of rows.
This doesn't matter. In order to facet on the word count, Solr has to
be prepared to facet on all possible docs. For all Solr knows, a
_single_ document may contain every word so the size of the structure
that contains the counters has to be prepared for N buckets,
Hi,
The whole index has 100M but when I add the criteria, it will filter the
data to maybe 10k as a max number of rows.
The facet isn't working when the total number of records in the index is
100M but it was working at 5M.
I have social media & RSS data in the index and I am trying to get the wo
He said that it's using to get a word cloud, if it's not related to the
search and it's a generic word cloud of the index, using the luke request
handler to get the first 250 o 500 word could work.
http://localhost:8983/solr/core/admin/luke?fl=text&numTerms=500&wt=json
On Mon, Nov 6, 2017 at 4:4
_Why_ do you want to get the word counts? Faceting on all of the
tokens for 100M docs isn't something Solr is ordinarily used for. As
Emir says it'll take a huge amount of memory. You can use one of the
function queries (termfreq IIRC) that will give you the count of any
individual term you have an
Hi Wael,
You are faceting on analyzed field. This results in field being uninverted -
fieldValueCache being built - on first call after every commit. This is both
time and memory consuming (you can check in admin console in stats how much
memory it took).
What you need to do is to create multiv
Hi,
I am using a custom field. Below is the field definition.
I am using this because I don't want stemming.
Regards,
Hi Wael,
Can you provide your field definition and sample query.
Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 6 Nov 2017, at 08:30, Wael Kader wrote:
>
> Hello,
>
> I am having an index
Hello,
I am having an index with around 100 Million documents.
I have a multivalued column that I am saving big chunks of text data in. It
has around 20 GB of RAM and 4 CPU's.
I was doing faceting on it to get word cloud but it was taking around 1
second to retrieve when the data was 5-10 Million
12 matches
Mail list logo