Spell check with data from database and not from english dictionary

2020-01-22 Thread seeteshh
Hello all, Can the spell check feature be configured with words/data fetched from a database and not from the English dictionary? Regards, Seetesh Hindlekar - Seetesh Hindlekar -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Early termination in Lucene 8

2020-01-22 Thread Wei
Hi, I am excited to see Lucene 8 introduced BlockMax WAND as a major speed improvement https://issues.apache.org/jira/browse/LUCENE-8135. My question is, how does it integrate with facet request, when the numFound won't be exact? I did some search but haven't found any documentation on this. Any

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
Cool. Glad to help. :) Cheers, Edward Em qua, 22 de jan de 2020 16:44, Arnold Bronley escreveu: > I knew about the + and other signs and their connections to MUST and other > operators. What I did not understand was why it was not adding parentheses > around the expression. In your first replay

Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It's hard to predict will it be faster read docValues files or uninvert field ad-hoc and read them from heap. Only test might judge it. On Wed, Jan 22, 2020 at 11:08 PM kumar gaurav wrote: > HI Mikhail > > for example :- 6GB index size (Parent-child documents) > indexing in 12 hours interval . >

QParser does not retain double quotes

2020-01-22 Thread Arnold Bronley
Hi, I have following code that does some parsing with QParser plugin. I noticed that it does not retain the double quotes in the filterQueryString. How should make it retain the double quotes? QParser.getParser(filterQueryString, null, req).getQuery(); filterQueryString passed = id:"x:1234"

Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
HI Mikhail for example :- 6GB index size (Parent-child documents) indexing in 12 hours interval . need to use uniqueBlock for json facet for child faceting . Should i use docValues="true" for _root_ field ? Thanks . regards Kumar Gaurav On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev w

Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It depends from env. On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav wrote: > Hi Everyone > > Should i use docValues="true" for _root_ field to improve nested child > json.facet performance ? i am using uniqueBlock() . > > > Thanks in advance . > > regards > Kumar Gaurav > -- Sincerely yours M

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
I knew about the + and other signs and their connections to MUST and other operators. What I did not understand was why it was not adding parentheses around the expression. In your first replay you mentioned that - 'roughly, a builder for each query enclosed in "parenthesis"' - that was the key po

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
Thanks, Edaward. This was the exact answer I was looking for :) On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro wrote: > If you are using Lucene's BooleanQueryBuilder then you need to do nesting > of your queries (roughly, a builder for each query enclosed in > "parenthesis"). > > A query like (t

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
Oh, you asked about the meaning of the plus sign too. Well, I recommend reading a book* or any tutorial, but the clauses of boolean queries there are three occurences, SHOULD, MUST and MUST_NOT, that roughly translate to OR, AND, and NOT, respectively. The plus sign means MUST, the minus sign mea

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Also its not looks like box is slow . because for following query prepare time is 3 ms but facet time is 84ms on the same box .Don't know why prepare time was huge for that example :( . debug: { - rawquerystring: "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}", - queryst

Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread kumar gaurav
Hi Everyone Should i use docValues="true" for _root_ field to improve nested child json.facet performance ? i am using uniqueBlock() . Thanks in advance . regards Kumar Gaurav

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
Lots of thanks Mikhail. Also can you please answer - Should i use docValues="true" for _root_ field to improve this json.facet performance ? On Wed, Jan 22, 2020 at 11:42 PM Mikhail Khludnev wrote: > Initial request refers unknown (to me) query parser {!simpleFilter, I > can't comment on it. >

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Initial request refers unknown (to me) query parser {!simpleFilter, I can't comment on it. Parsing queries took in millis: - time: 261, usually prepare for query takes a moment. I suspect the box is really slow per se or encounter heavy load. And then facets took about 6 times more - facet_module

Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Edward Ribeiro
If you are using Lucene's BooleanQueryBuilder then you need to do nesting of your queries (roughly, a builder for each query enclosed in "parenthesis"). A query like (text:child AND text:toys) OR age:12 would be: Query query1 = new TermQuery(new Term("text", "toys")); Query query2 = new TermQuery

Apache Solr HTTP health endpoint for blackbox_exporter probings

2020-01-22 Thread Daniel Trüssel
Hey With DuckDuckGo I found no HTTP health endpoint for Solr. I use https://github.com/prometheus/blackbox_exporter to probe our apps. JMX_exporter is not an option, I need to use blackbox. Please point me in the right direction. kind regards Daniel

Re: Lucene query to Solr query

2020-01-22 Thread Edward Ribeiro
equivalent to "+(topics:29)^2 (topics:38)^3 +(-id:41135)", I mean. :) Edward On Wed, Jan 22, 2020 at 1:51 PM Edward Ribeiro wrote: > Hi, > > A more or less equivalent query (using Solr's LuceneQParser) to > "topics:29^2 AND (-id:41135) topics:38^3" would be: > > topics:29^2 AND (-id:41135) topi

Re: Lucene query to Solr query

2020-01-22 Thread Edward Ribeiro
Hi, A more or less equivalent query (using Solr's LuceneQParser) to "topics:29^2 AND (-id:41135) topics:38^3" would be: topics:29^2 AND (-id:41135) topics:38^3 Edward On Mon, Jan 20, 2020 at 1:10 AM Arnold Bronley wrote: > Hi, > > I have a Lucene query as following (toString represenation of

Re: Is it possible to add stemming in a text_exact field

2020-01-22 Thread Edward Ribeiro
Hi, One possible solution would be to create a second field (e.g., text_general) that uses DefaultTokenizer, or other tokenizer that breaks the string into tokens, and use a copyField to copy the content from text_exact to text_general. Then, you can use edismax parser to search both fields, but g

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread kumar gaurav
HI Mikhail Here is full debug log . Please have a look . debug: { - rawquerystring: "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}", - querystring: "{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}", - parsedquery: "AllParentsAware(ToParentBlockJoin

Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Michael Gibney
Rajdeep, you say that "suddenly" heap space is getting full ... does this mean that some variant of this configuration was working for you at some point, or just that the failure happens quickly? If heap space and faceting are indeed the bottleneck, you might make sure that you have docValues enab

Is it possible to add stemming in a text_exact field

2020-01-22 Thread Dhanesh Radhakrishnan
Hello, I'm facing an issue with stemming. My search query is "restaurant dubai" and returns results. If I search "restaurants dubai" it returns no data. How to stem this keyword "restaurant dubai" with "restaurants dubai" ? I'm using a text exact field for search. Here is the field definition

Re: regarding Extracting text from Images

2020-01-22 Thread Steve Ge
In my experience, enabling Tika at server level can result in memory heap space used up under high volume of extraction, and bring down Solr entirely.   Likely due to garbage collector not able to keep up w/ load, even tuning garbage collector didn't resolve the problem completely.  Not recommen

Query Regarding SOLR cross collection join

2020-01-22 Thread Doss
HI, SOLR version 8.3.1 (10 nodes), zookeeper ensemble (3 nodes) One of our use cases requires joins, we are joining 2 large indexes. As required by SOLR one index (2GB) has one shared and 10 replicas and the other has 10 shard (40GB / Shard). The query takes too much time, some times in minutes

Re: SolrCloud upgrade concern

2020-01-22 Thread Jason Gerlowski
Hi Arnold, The stability and complexity issues Mark highlighted in his post aren't just imagined - there are real, sometimes serious, bugs in SolrCloud features. But at the same time there are many many stable deployments out there where SolrCloud is a real success story for users. Small example

null:org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /roles.json

2020-01-22 Thread sotna
Hi. We have SolrCloud enabled on production environment (2 Solr [16 GB RAM each] nodes and 3 Zookeeper nodes, each hosted on separate server) Quite seldom Solr loose connection to zookeeper search stop working. After we restarting all zookeeper nodes at a time - it starts working again I Solr l

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Screenshot didn't come though the list. That excerpt doesn't have any informative numbers. On Tue, Jan 21, 2020 at 5:18 PM kumar gaurav wrote: > Hi Mikhail > > Thanks for your reply . Please help me in this . > > Followings are the screenshot:- > > [image: image.png] > > > [image: image.png] > >

Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Toke Eskildsen
On Sun, 2020-01-19 at 21:19 -0500, Mehai, Lotfi wrote: > I had a similar issue with a large number of facets. There is no way > (At least I know) your can get an acceptable response time from > search engine with high number of facets. Just for the record then it is doable under specific circumst

Re: regarding Extracting text from Images

2020-01-22 Thread Retro
Good day, We solved the situation. Here is what was used and changed: In our installation we used Tesseract version 3.05, Tika version 1.17, SOLR version 7.4. We actually, had TIKA version 1.17, not 18. 1. Changed from HOCR to TXT >>> in file parseContext.xml 2. Had to start SOLR as a root