Re: Using BasicAuth with SolrJ Code

2017-04-12 Thread Zheng Lin Edwin Yeo
This is what I get when I run the code. org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/testing: Expected mime type application/octet-stream but got text/html. Error 401 require authentication HTTP ERROR 401 Problem accessin

Re: Grouped Result sort issue

2017-04-12 Thread alessandro.benedetti
"You're telling Solr to return the highest scoring doc in each group. However, you're asking to order the _groups_ in ascending score order (i.e. the group with the lowest scoring doc first) of _any_ doc in that group, not just the one(s) returned. These are two separate things. " This is quit

Re: simple matches not catching at query time

2017-04-12 Thread alessandro.benedetti
hi John, I am a bit confused here. Let's focus on one field and one document. Given this parsed phrase query : manufacturer_split_syn:"vendor vendor" and the document 1 : D1 {"id":"1" "manufacturer_split_syn" : "vendor"} Are you expecting this to match ? because it shouldn't ... let's try to

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-04-12 Thread Pratik Thaker
Hi All, I am facing this issue since very long, can you please provide your suggestion on it ? Regards, Pratik Thaker -Original Message- From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com] Sent: 09 February 2017 21:24 To: 'solr-user@lucene.apache.org' Subject: RE: DistributedU

maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hi, One of our 2 shard collections is rather small and gets all its entries reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times greater than numDoc, the merger is never scheduled but settings are default. We just overwrite the existing entries, all of them. Here are the sta

Re: maxDoc ten times greater than numDoc

2017-04-12 Thread alessandro.benedetti
Hi Markus, maxDocs includes deletions : Deleted Docs: 74026 + Num Docs: 8336 = Max Doc:82362 Cheers - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nab

RE: maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hello - i know it includes all those deleted/overwritten documents. But having 89,9 % deleted documens is quite unreasonable, so i would expect the mergeScheduler to kick in at least once in a while. It doesn't with default settings so i am curious what is wrong. Our large regular search cluste

Re: Filtering results by minimum relevancy score

2017-04-12 Thread alessandro.benedetti
I am not completely sure that the potential benefit of merging less docs in sharded pagination overcomes the additional time needed to apply the filtering function query. I would need to investigate more in details the frange internals. Cheers - --- Alessandro Benedetti Search C

Re: simple matches not catching at query time

2017-04-12 Thread John Blythe
you can view some of my analyses here that has caused me grief and confusion: http://imgur.com/a/Fcht3 here is a debug output: "rawquerystring":"\"ZIMMER:ZIMMER US\"", "querystring":"\"ZIMMER:ZIMMER US\"", "parsedquery":"(+DisjunctionMaxQuery((manufacturer_syn:\"zimmer zimmer\" | manufact

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Dorian Hoxha
@alessandro Elastic-search has it: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html On Wed, Apr 12, 2017 at 1:49 PM, alessandro.benedetti wrote: > I am not completely sure that the potential benefit of merging less docs in > sharded pagination overcom

RE: maxDoc ten times greater than numDoc

2017-04-12 Thread alessandro.benedetti
This may be incorrect, but I think that even if a merge happened and the disk space is actually released, the deleted docs count will still be there. What about your index size ? is the index 10 times bigger than expected ? Cheers - --- Alessandro Benedetti Search Consultant, R

Autosuggestion

2017-04-12 Thread OTH
Hello, Is there any recommended way to achieve auto-suggestion in textboxes using Solr? I'm new to Solr, but right now I have achieved this functionality by using an example I found online, doing this: I added a copy field, which is of the following type:

Enable Gzip compression Solr 6.0

2017-04-12 Thread Mahmoud Almokadem
Hello, How can I enable Gzip compression for Solr 6.0 to save bandwidth between the server and clients? Thanks, Mahmoud

Re: maxDoc ten times greater than numDoc

2017-04-12 Thread Shawn Heisey
On 4/12/2017 5:11 AM, Markus Jelsma wrote: > One of our 2 shard collections is rather small and gets all its entries > reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times > greater than numDoc, the merger is never scheduled but settings are default. > We just overwrite the ex

Re: Autosuggestion

2017-04-12 Thread Andrea Gazzarini
Hi, I think you got an old post. I would have a look at the built-in feature, first. These posts can help you to get a quick overview: https://cwiki.apache.org/confluence/display/solr/Suggester http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html https://lucidworks.com/2015/03/04/

Re: simple matches not catching at query time

2017-04-12 Thread Mikhail Khludnev
John, Double quotes is a sign of a phrase query (and round braces inside of double quotes is a horrible to think about beast). Since the query is a disjunction of phrases and the shingle it has no chance to match any of indexed values from screenshots. Probably you need to flip autoGeneratePhraseQ

Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb
We have a Solr cluster that still takes queries that join between cores (I know, bad). We can't change that anytime soon however and I was hoping there was a band-aid I could use in the mean time to make deployments of new nodes cleaner. When we want to add a new node to cluster we'll have a brief

Re: Enable Gzip compression Solr 6.0

2017-04-12 Thread Rick Leir
Hi Mahmoud I assume you are running Solr 'behind' a web application, so Solr is not directly on the net. The gzip compression is an Apache thing, and relates to your web application. Connections to Solr are within your infrastructure, so you might not want to gzip them. But maybe your setup​ i

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Shawn Heisey
On 4/10/2017 8:59 AM, David Kramer wrote: > I’ve done quite a bit of searching on this. Pretty much every page I > find says it’s a bad idea and won’t work well, but I’ve been asked to > at least try it to reduce the number of completely unrelated results > returned. We are not trying to normalize

Re: Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb
Forgot to mention. We're using solr 5.5.2 in Solr cloud mode. Everything is single sharded at the moment as the collections are still quite small. On Wed, Apr 12, 2017 at 3:30 PM, Callum Lamb wrote: > We have a Solr cluster that still takes queries that join between cores (I > know, bad). We can

Re: Getting error while excuting full import

2017-04-12 Thread Shawn Heisey
On 4/10/2017 3:47 AM, ankur.168 wrote: > Hi All,I am trying to use solr with 2 cores interacting with 2 different > databases, one core is executing full-import successfully where as when I am > running for 2nd one it is throwing table or view not found exception. If I > am using the query directly

Re: Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Erick Erickson
No good ideas here with current Solr. I just raised SOLR-10484 for the generic ability to take a replica out of action (including the ADDREPLICA operation). Your understanding is correct, Solr will route requests to active replicas. Is it possible that you can load the "from" core first _then_ add

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Erick Erickson
Well, just because ES has it doesn't mean it's A Good Thing. IMO, it's just a "feel good" kind of thing for people who don't really understand scoring. >From that page: "Note, most times, this does not make much sense, but is provided for advanced use cases." I've written enough weasel-worded cav

Re: maxDoc ten times greater than numDoc

2017-04-12 Thread Erick Erickson
Yes, this is very strange. My bet: you have something custom, a setting, indexing code, whatever that is getting in the way. Second possibility (really stretching here): your merge settings are set to 10 segments having to exist before merging and somehow not all the docs in the segments are repla

Re: Grouped Result sort issue

2017-04-12 Thread Erick Erickson
Alessandro: I should have been explicit that I'm hypothesizing somewhat here, so believe me at your own risk ;) bq: So it means that group sorting is independent of the group head sorting that's my hypothesis, but it's _not_ based on knowing the code. Best, Erick On Wed, Apr 12, 2017 at 2:05 A

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-12 Thread Allison, Timothy B.
The release candidate for POI was just cut...unfortunately, I think after Nick Burch fixed the 'PolylineTo' issue...thank you, btw, for opening that! That'll be done within a week unless there are surprises. Once that's out, I have to update a few things, but I'd think we'd have a candidate for

Re: Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb
We can do that in most cases and that's what we've been doing up until now to prevent failed requests. All the more incentive to get rid of those joins then I guess! Thanks. On Wed, Apr 12, 2017 at 4:16 PM, Erick Erickson wrote: > No good ideas here with current Solr. I just raised SOLR-10484

What does the replication factor parameter in collections api do?

2017-04-12 Thread Johannes Knaus
Hi, I am still quite new to Solr. I have the following setup: A SolrCloud setup with 38 nodes, maxShardsPerNode=2, implicit routing with routing field, and replication factor=2. Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then a

KeywordTokenizer and multiValued field

2017-04-12 Thread Walter Underwood
Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single string? I really hope it is the former. I can’t find this in the docs (including JavaDocs). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.or

Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Andrea Gazzarini
Hi Wunder, I think it's the first option: if you have 3 values then the analyzer chain is executed three times. Andrea On 12/04/17 18:45, Walter Underwood wrote: Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single st

Re: What does the replication factor parameter in collections api do?

2017-04-12 Thread Erick Erickson
really <3>. replicationFactor is used to set up your collection initially, you have to be able to change your topology afterwards so it's ignored thereafter. Once your replica is added, it's automatically made use of by the collection. On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus wrote: > Hi,

Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Amarnath palavalli
Hello, I am using S3 as the primary store for data directory of core. To achieve this, I have the following in Solrconfig.xml: ** * s3a://amar-hdfs/solr* * /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop* * true* * 4* * true* * 16384* * true* * true* * 16* * 192* * * When I acces

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
Thanks for your response Shawn and Wunder. Hi Shawn, Here is the system config: Total system memory = 512 GB each server handles two 500 MB cores Number of solr docs per 500 MB core = 200 MM The average heap usage is around 4-6 GB. When the read starts using the Cursor approach, the heap usage

Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
The idea is to not return poorly matching results, not to limit the number of results returned. One query may have hundreds of excellent matches and another query may have 7. So cutting off by the number of results is trivial but not useful. Again, we are not doing this for performance reasons

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query parser this is possible. I tried it in the past.I am searching the original post. I think it was Yonik's post. https://cwiki.apache.org/confluence/display/solr/Function+Queries Ahmet

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Walter Underwood
Fine. It can’t be done. If it was easy, Solr/Lucene would already have the feature, right? Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just

RE: Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Cahill, Trey
Hi Amarnath, From this log snippet: " 2017-04-12 17:53:44.900 INFO (searcherExecutor-12-thread-1-processing-x:amar1) [ x:amar1] o.a.s.c.SolrCore [amar1] Registered new searcher Searcher@3f61e7f2[amar1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_16(6.4.2):c97790) Uninv

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi, I cannot find it. However it should be something like  q=hello&fq={!frange l=0.5}query($q) Ahmet On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan wrote: Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query parser this is

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Mikhail Khludnev
And what is the rows parameter? 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" написал: > Thanks for your response Shawn and Wunder. > > Hi Shawn, > > Here is the system config: > > Total system memory = 512 GB > each server handles two 500 MB cores > Number of solr docs per 500 MB core = 200

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
I am running a query that returns 10 MM docs in total and the number of rows per page is 100K. On Wed, Apr 12, 2017 at 12:53 PM, Mikhail Khludnev wrote: > And what is the rows parameter? > > 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" > написал: > > > Thanks for your response Shawn and Wu

Solr 6.2 - Creating cores via replication from master?

2017-04-12 Thread Pouliot, Scott
Is it possible to create a core on a master SOLR server and have it automatically replicated to a new slave core? We're running SOLR 6.2 at the moment, and manually creating the core on the master, and then the slave. Once we feed the master we're good to go. My manager approached me with a ch

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Doug Turnbull
David I think it can be done, but a score has no real *meaning* to your domain other than the one you engineer into it. There's no 1-100 scale that guarantees at 100 that your users will love the results. Solr isn't really a turn key solution. It requires you to understand more deeply what relevan

Re: Solr 6.2 - Creating cores via replication from master?

2017-04-12 Thread Shawn Heisey
On 4/12/2017 2:05 PM, Pouliot, Scott wrote: > Is it possible to create a core on a master SOLR server and have it > automatically replicated to a new slave core? We're running SOLR 6.2 at the > moment, and manually creating the core on the master, and then the slave. > Once we feed the master

RE: Solr 6.2 - Creating cores via replication from master?

2017-04-12 Thread Pouliot, Scott
Yeah...I need to get SOLR Cloud up and running. For some reason, I have yet to succeed with it using an external Zookeeper for some reason. Ugghhh Thanks for the confirmation! -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, April 12, 2017 4:19 PM To

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Erick Erickson
Oh my. Returning 100K rows per request is usually poor practice. One hopes these are very tiny docs. But this may well be an "XY" problem. What kinds of information are you returning in your docs and could they all be docValues types? In which case you would be waaay far ahead by using the various

Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Ahmet Arslan
I don't understand the first option, what is each value? Keyword tokenizer emits single token, analogous to string type. On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood wrote: Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of v

Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
Thank you! That worked. From: Ahmet Arslan Date: Wednesday, April 12, 2017 at 3:15 PM To: "solr-user@lucene.apache.org" , David Kramer Subject: Re: Filtering results by minimum relevancy score Hi, I cannot find it. However it should be something like q=hello&fq={!frange l=0.5}query($q) Ah

RE: Japanese character is garbled when using TikaEntityProcessor

2017-04-12 Thread Noriyuki TAKEI
Thanks!!I appreciate for your quick reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Japanese-character-is-garbled-when-using-TikaEntityProcessor-tp4329217p4329657.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
I am getting back 100K results per page. The fields have docValues enabled and I am getting sorted results based on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes). I have a solr Cloud of 80 nodes. There will be one shard that will get top 100K docs from each shard and apply merge sort

unexpected docvalues type NONE

2017-04-12 Thread Prashant Saraswat
Hi, I'm using Solr 6.4.0. The schema was created on 6.4.0 and I indexed several hundred thousand documents and everything was fine. Now I added one field to the schema: I suddenly start getting this error for certain queries ( not all queries and even for queries that have nothing to do with t

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Shawn Heisey
On 4/12/2017 5:19 PM, Chetas Joshi wrote: > I am getting back 100K results per page. > The fields have docValues enabled and I am getting sorted results based on > "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes). > > I have a solr Cloud of 80 nodes. There will be one shard that will ge

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Koji Sekiguchi
Hi Walter, May I ask a tangential question? I'm curious the following line you wrote: > Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just no

Re: unexpected docvalues type NONE

2017-04-12 Thread Shawn Heisey
On 4/12/2017 8:04 PM, Prashant Saraswat wrote: > I'm using Solr 6.4.0. The schema was created on 6.4.0 and I indexed several > hundred thousand documents and everything was fine. > > Now I added one field to the schema: > > stored="true" required="false"/> > > I suddenly start getting this error f

Re: unexpected docvalues type NONE

2017-04-12 Thread Prashant Saraswat
Hi Shawn, The listing_lastmodified field was not changed. I only added a new field. I have removed the field, but I still get the error. Thanks Prashant On Wed, Apr 12, 2017 at 11:20 PM, Shawn Heisey wrote: > On 4/12/2017 8:04 PM, Prashant Saraswat wrote: > > I'm using Solr 6.4.0. The schema w

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Erick Erickson
You're missing the point of my comment. Since they already are docValues, you can use the /export functionality to get the results back as a _stream_ and avoid all of the overhead of the aggregator node doing a merge sort and all of that. You'll have to do this from SolrJ, but see CloudSolrStream.

Re: Enable Gzip compression Solr 6.0

2017-04-12 Thread Mahmoud Almokadem
Thanks Rick, I already running Solr on my infrastructure and behind a web application. The web application is working as a proxy before Solr, so I think I can compress the content on Solr end. But I have made it on the proxy now. Thanks again, Mahmoud > On Apr 12, 2017, at 4:31 PM, Rick Leir

Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Erick Erickson
So I have a field named "key" that uses KeywordTokenizer and has multiValued="true" set. A doc like val one yet another value third My field will have exactly three indexed tokens val one yet another value third Best, Erick On Wed, Apr 12, 2017 at 2:38 PM, Ahmet Arslan wrote: > I don't