Question regarding Upgrading to SolrCloud

2017-10-05 Thread Gopesh Sharma
Hello Guys, As of now we are running Solr 3.4 with Master Slave Configuration. We are planning to upgrade it to the lastest version (6.6 or 7). Questions I have before upgrading 1. Since we do not have a lot of data, is it required to move to SolrCloud or continue using it Master Slave 2

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Toke Eskildsen
On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > The bit-vectors in filterCache are as long as the maximum number of > documents in a core. If there are a billion docs per core, every bit > vector will have a billion bits making its size as 10 9 / 8 = 128 mb The tricky part here is there are sparse

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
Hi Bjarke, It is not multiterm that is causing query parser to skip analysis chain but wildcard. The majority of query parsers do not analyse query string if there are wildcards. HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Traini

Re: Question regarding Upgrading to SolrCloud

2017-10-05 Thread Emir Arnautović
Hi Sharma, Please see inline answers. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Oct 2017, at 09:00, Gopesh Sharma wrote: > > Hello Guys, > > As of now we are running Solr 3.4 with

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
Well, according to https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ multiterm means wildcard range prefix so it is that way i'm using the word. That same article explains how analysis will be performed with wildcards if the analyzers are multi-term awar

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
Hi Bjarke, You are right - I jumped into wrong/old conclusion as the simplest answer to your question. I guess looking at the code could give you an answer. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://semate

tf function query

2017-10-05 Thread Dmitry Kan
Hi, According to https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions tf(field, term) requires a term as a second parameter. Is there a possibility to pass in an entire input query (multiterm and boolean) to the function? The context here is that we d

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
2017-10-05 11:29 GMT+02:00 Emir Arnautović : > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest answer > to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would ex

Question regarding Upgrading to SolrCloud

2017-10-05 Thread Gopesh Sharma
Hello Guys, As of now we are running Solr 3.4 with Master Slave Configuration. We are planning to upgrade it to the lastest version (6.6 or 7). Questions I have before upgrading 1. Since we do not have a lot of data, is it required to move to SolrCloud or continue using it Master Slave 2

RE: tf function query

2017-10-05 Thread Junte Zhang
I am afraid this is not possible, since getting frequencies for phrases is not possible, unless the phrases are created as tokens (i.e. using n-grams or shingles) and indexed. If someone has a solution for this, then I am interested as well. /JZ -Original Message- From: Dmitry Kan [mai

Re: tf function query

2017-10-05 Thread Erik Hatcher
How about the query() function? Just be clever about the query you specify ;) > On Oct 5, 2017, at 06:14, Dmitry Kan wrote: > > Hi, > > According to > https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions > > tf(field, term) requires a term as a sec

Re: Solr boost function taking precedence over relevance boosting

2017-10-05 Thread alessandro.benedetti
I would try to use an additive boost and the ^= boost operator: - name_property :( test^=2 ) will assign a fixed score of 2 if the match happens ( it is a constant score query) - additive boost will be 0http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phra

RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
There's every chance that I'm missing something at the Solr level, but it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not applying analysis to multiterms. When I call this on 7.0.0: QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer); return q

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-) Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. : > There's every cha

Re: tf function query

2017-10-05 Thread Erick Erickson
What would you expect as output? tf(field, "a OR b AND c NOT d"). I'm not sure what term frequency would even mean in that situation. tf is a pretty simple function, it expects a single term and there's now way I know of to do what you're asking. Best, Erick On Thu, Oct 5, 2017 at 3:14 AM, Dmit

Solrcloud replication not working

2017-10-05 Thread solr2020
Hi, We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the solr cluster.This solrcloud running in ubuntu OS. The problem is replication is not happening between these two solr instances. sometimes it replicate 10% of the content and sometimes not. In Zookeeper ensemble we h

Re: Question regarding Upgrading to SolrCloud

2017-10-05 Thread Erick Erickson
Gopesh: There is brand new functionality in Solr 7, see: SOLR-10233, the "PULL" replica type which is a hybrid SolrCloud replica that uses master/slave type replication. You should find this in the reference guide, the 7.0 ref guide should be published soon. Meanwhile, that JIRA will let you know.

Re: Solrcloud replication not working

2017-10-05 Thread Erick Erickson
We need a lot more data to say anything useful, please read: https://wiki.apache.org/solr/UsingMailingLists What do you see in your Solr logs? What have you tried to do to diagnose this? Do you have enough disk space? Best, Erick On Thu, Oct 5, 2017 at 6:56 AM, solr2020 wrote: > Hi, > > We are

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Erick Erickson
The other thing I'd point out is that if your hit ratio is low, you might as well disable it entirely. Finally, if you have any a-priori knowledge that certain fq clauses are very unlikely to be re-used, add {!cache=false}. If you also add cost=101, then the fq clause will only be evaluated for do

Solrcloud replication not working

2017-10-05 Thread solr2020
Hi, We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the solr cluster.This solrcloud running in ubuntu OS. The problem is replication is not happening between these two solr instances. sometimes it replicate 10% of the content and sometimes not. In Zookeeper ensemble we h

Re: Solrcloud replication not working

2017-10-05 Thread solr2020
thanks. We dont see any error message/any message in logs. And we have enough disk space. We are running solr as root user in ubuntu box but zookeeper process running as zookeeper user.Will that cause the problem? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Error adding replica after a delete replica

2017-10-05 Thread Webster Homer
A colleague of mine was testing how solrcloud replica recovery works. We have had a lot of issues with replicas going into recovery mode, replicas down and in recovery failed states. So to test, he deleted a healthy replica in one of our development. First the delete operation timed out, but the r

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 10:07 AM, Erick Erickson wrote: > The other thing I'd point out is that if your hit ratio is low, you > might as well disable it entirely. I'd normally recommend against turning it off entirely, except in *very* custom cases. Even if the user doesn't reuse filter queries,

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote: > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > https://issues.apache.org/jira/browse/SOLR-7372 > I am not sure if it works with all caches in Solr, but in my world it

Recommendations for number of open files?

2017-10-05 Thread Webster Homer
We have begun to see errors around too many open files on one of our solrcloud nodes. One replica tries to open >8000 files. This replica tries to startup and then fails the open files are exceeded upon startup as it tries to recover. Our solrclouds have 12 distinct collections. I would think tha

Re: Recommendations for number of open files?

2017-10-05 Thread Erick Erickson
Well, Lucene keeps an open file handle for _every_ file in _every_ index directory. So, for instance, let's say a replica has 10 segments. Each segment is 10-15 individual files. So that's 100-150 file handles right there. And indexes can have many segments. Check to see if "cfs" extensions are in

RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
Prob the usual reasons...no one has submitted a patch yet, or could be a regression after LUCENE-7355. See also: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201407.mbox/%3c1d06a081892adf4589bd83ee24b9dc3025971...@imcmbx02.mitre.org%3E I'll take a look. -Original Message-

Re: Jenkins setup for continuous build

2017-10-05 Thread Chris Hostetter
: I have some custom code in solr (which is not of good quality for : contributing back) so I need to setup my own continuous build solution. I : tried jenkins and was hoping that ant build (ant clean compile) in Execute : Shell textbox will work, but I am stuck at this ivy-fail error: : : To wor

Re: Recommendations for number of open files?

2017-10-05 Thread Webster Homer
The issue is on one of our QA collections which means I don't have access to the systems to see. I have to go through the admins it does have ".cfs" files in the index. However, it turns out that the replica in question has 8007 tlog files. This solrcloud is a target cloud for cdcr. The replica d

Re: Recommendations for number of open files?

2017-10-05 Thread Webster Homer
I wouldn't call it massive. The index is ~9 million documents. So not too big, the documents themselves are pretty small On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson wrote: > Well, Lucene keeps an open file handle for _every_ file in _every_ > index directory. So, for instance, let's say a re

Re: Question regarding Upgrading to SolrCloud

2017-10-05 Thread Cassandra Targett
The 7.0 Ref Guide was released Monday. An overview of the new replica types is available online here: https://lucene.apache.org/solr/guide/7_0/shards-and-indexing-data-in-solrcloud.html#types-of-replicas. The replica type is specified when you either create the collection or add a replica. On Thu

RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
After some more digging, I'm wrong even at the Lucene level. When I use the CustomAnalyzer and make my UC vowel mock filter MultitermAware, I get this with Lucene in trunk: "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck So, there's room for improvement with phrases, but the regular mu

Re: Recommendations for number of open files?

2017-10-05 Thread Webster Homer
Interestingly many of these tlog files (5428 out of 8007) are have 0 length!? What would cause that? As I stated this is a cdcr target collection. On Thu, Oct 5, 2017 at 1:19 PM, Webster Homer wrote: > I wouldn't call it massive. The index is ~9 million documents. So not too > big, the documents

Re: Recommendations for number of open files?

2017-10-05 Thread Erick Erickson
OK, never mind about the file handle limits, let's deal with the tlogs. Although unlimited is a good thing. Do you have buffering disabled on the target cluster? Best Erick On Thu, Oct 5, 2017 at 11:19 AM, Webster Homer wrote: > I wouldn't call it massive. The index is ~9 million documents. So

Re: Recommendations for number of open files?

2017-10-05 Thread Webster Homer
buffering is disabled. Indeed we disable it everywhere as all it seems to do is leave tlogs around forever. Autocommit is set to 60 seconds. The source cdcr request handler looks like this. The first target is the problematic one {"requestHandler":{"/cdcr":{ "name":"/cdcr", "class":"

Re: Recommendations for number of open files?

2017-10-05 Thread Webster Homer
It seems that there was a networking error just prior to the creation of the 0 length files: The files from Sep 27 are all written at 17:56. There was minor packet loss (1 out of 10 packets per 60 second interval) just prior to that time. On Thu, Oct 5, 2017 at 3:11 PM, Webster Homer wrote: > bu

Re: Rescoring from 0 - full

2017-10-05 Thread Dariusz Wojtas
Hi, Your answers have helped me a lot. I've managed to use the LTRQParserPlugin and it does what I need. Full control over scoring with it's re-ranking functionality. I define my custom features and may pass custom params to them using the "efi.*" syntax. Is there something similar to define weight

Solr not preserving milliseconds precision for zero milliseconds

2017-10-05 Thread Pratik Patel
Hello Everyone, Say I have a document like one below. > { > "id":"test", > "startTime":"2013-02-10T18:36:07.000Z" > } I add this document to solr index using the admin UI and "update" request handler. It gets added successfully but when I retrieve this document back using "id"

Re: Solr test runs: test skipping logic

2017-10-05 Thread Chris Hostetter
: I am seeing that in different test runs (e.g., by executing 'ant test' on : the root folder in 'lucene-solr') a different subset of tests are skipped. : Where can I find more about it? I am trying to create parity between test : successes before and after my changes and this is causing confusio

Re: Solr not preserving milliseconds precision for zero milliseconds

2017-10-05 Thread Chris Hostetter
: > "startTime":"2013-02-10T18:36:07.000Z" ... : handler. It gets added successfully but when I retrieve this document back : using "id" I get following. ... : > "startTime":"2013-02-10T18:36:07Z", ... : As you can see, the milliseconds precision in date fiel

Re: mm is not working if you have same term multiple times in query

2017-10-05 Thread Chris Hostetter
: I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated : search term then mm param is not honoured : I have 2 docs in index : Doc1- : name=lock : Doc 2- : name=lock lock : : Now when i'm quering the solr with query : *http://localhost:8983/solr/test2/select?defType=dismax&qf=

solr and machine learning - recommendations?

2017-10-05 Thread Phil Scadden
Now that I am got a big hunk of documents indexed with Solr, I am looking to see whether I can try some machine learning tools to try and extract bibliographic references out of the documents. Anyone got some recommendations about which kits might be good to play with for something like this? No

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread S G
So for large indexes, there is a chance that filterCache of 128 can cause bad GC. And for smaller indexes, it would really not matter that much because well, the index size is small and probably whole of it is in OS-cache anyways. So perhaps a default of 64 would be a much saner choice to get the b