Hello Guys,
As of now we are running Solr 3.4 with Master Slave Configuration. We are
planning to upgrade it to the lastest version (6.6 or 7). Questions I have
before upgrading
1. Since we do not have a lot of data, is it required to move to SolrCloud
or continue using it Master Slave
2
On Wed, 2017-10-04 at 21:42 -0700, S G wrote:
> The bit-vectors in filterCache are as long as the maximum number of
> documents in a core. If there are a billion docs per core, every bit
> vector will have a billion bits making its size as 10 9 / 8 = 128 mb
The tricky part here is there are sparse
Hi Bjarke,
It is not multiterm that is causing query parser to skip analysis chain but
wildcard. The majority of query parsers do not analyse query string if there
are wildcards.
HTH
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Traini
Hi Sharma,
Please see inline answers.
Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 5 Oct 2017, at 09:00, Gopesh Sharma wrote:
>
> Hello Guys,
>
> As of now we are running Solr 3.4 with
Well, according to
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
multiterm means
wildcard
range
prefix
so it is that way i'm using the word. That same article explains how
analysis will be performed with wildcards if the analyzers are multi-term
awar
Hi Bjarke,
You are right - I jumped into wrong/old conclusion as the simplest answer to
your question. I guess looking at the code could give you an answer.
Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://semate
Hi,
According to
https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
tf(field, term) requires a term as a second parameter. Is there a
possibility to pass in an entire input query (multiterm and boolean) to the
function?
The context here is that we d
2017-10-05 11:29 GMT+02:00 Emir Arnautović :
> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest answer
> to your question.
No problem :-)
I guess looking at the code could give you an answer.
>
This is what I would like to avoid out of fear that my head would ex
Hello Guys,
As of now we are running Solr 3.4 with Master Slave Configuration. We are
planning to upgrade it to the lastest version (6.6 or 7). Questions I have
before upgrading
1. Since we do not have a lot of data, is it required to move to SolrCloud
or continue using it Master Slave
2
I am afraid this is not possible, since getting frequencies for phrases is not
possible, unless the phrases are created as tokens (i.e. using n-grams or
shingles) and indexed. If someone has a solution for this, then I am interested
as well.
/JZ
-Original Message-
From: Dmitry Kan [mai
How about the query() function? Just be clever about the query you specify ;)
> On Oct 5, 2017, at 06:14, Dmitry Kan wrote:
>
> Hi,
>
> According to
> https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
>
> tf(field, term) requires a term as a sec
I would try to use an additive boost and the ^= boost operator:
- name_property :( test^=2 ) will assign a fixed score of 2 if the match
happens ( it is a constant score query)
- additive boost will be 0http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
What version of Solr are you using?
I thought this had been fixed fairly recently, but I can't quickly find the
JIRA. Let me take a look.
Best,
Tim
This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and
[2], which handles analysis of multiterms even in phra
There's every chance that I'm missing something at the Solr level, but it
_looks_ at the Lucene level, like ComplexPhraseQueryParser is still not
applying analysis to multiterms.
When I call this on 7.0.0:
QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer);
return q
Thanks Tim,
that might be what I'm experiencing. I'm actually quite certain of it :-)
Do you remember any reason that multi term analysis is not happening in
ComplexPhraseQueryParser?
I'm on 6.6.1, so latest on the 6.x branch.
2017-10-05 14:34 GMT+02:00 Allison, Timothy B. :
> There's every cha
What would you expect as output? tf(field, "a OR b AND c NOT d"). I'm
not sure what term frequency would even mean in that situation.
tf is a pretty simple function, it expects a single term and there's
now way I know of to do what you're asking.
Best,
Erick
On Thu, Oct 5, 2017 at 3:14 AM, Dmit
Hi,
We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the
solr cluster.This solrcloud running in ubuntu OS. The problem is replication
is not happening between these two solr instances. sometimes it replicate
10% of the content and sometimes not.
In Zookeeper ensemble we h
Gopesh:
There is brand new functionality in Solr 7, see: SOLR-10233, the
"PULL" replica type which is a hybrid SolrCloud replica that uses
master/slave type replication. You should find this in the reference
guide, the 7.0 ref guide should be published soon. Meanwhile, that
JIRA will let you know.
We need a lot more data to say anything useful, please read:
https://wiki.apache.org/solr/UsingMailingLists
What do you see in your Solr logs? What have you tried to do to
diagnose this? Do you have enough disk space?
Best,
Erick
On Thu, Oct 5, 2017 at 6:56 AM, solr2020 wrote:
> Hi,
>
> We are
The other thing I'd point out is that if your hit ratio is low, you
might as well disable it entirely.
Finally, if you have any a-priori knowledge that certain fq clauses
are very unlikely to be re-used,
add {!cache=false}. If you also add cost=101, then the fq clause will
only be evaluated for
do
Hi,
We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the
solr cluster.This solrcloud running in ubuntu OS. The problem is replication
is not happening between these two solr instances. sometimes it replicate
10% of the content and sometimes not.
In Zookeeper ensemble we h
thanks.
We dont see any error message/any message in logs. And we have enough disk
space.
We are running solr as root user in ubuntu box but zookeeper process running
as zookeeper user.Will that cause the problem?
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
A colleague of mine was testing how solrcloud replica recovery works. We
have had a lot of issues with replicas going into recovery mode, replicas
down and in recovery failed states. So to test, he deleted a healthy
replica in one of our development. First the delete operation timed out,
but the r
On Thu, Oct 5, 2017 at 10:07 AM, Erick Erickson wrote:
> The other thing I'd point out is that if your hit ratio is low, you
> might as well disable it entirely.
I'd normally recommend against turning it off entirely, except in
*very* custom cases. Even if the user doesn't reuse filter queries,
On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote:
> On Wed, 2017-10-04 at 21:42 -0700, S G wrote:
>
> It seems that the memory limit option maxSizeMB was added in Solr 5.2:
> https://issues.apache.org/jira/browse/SOLR-7372
> I am not sure if it works with all caches in Solr, but in my world it
We have begun to see errors around too many open files on one of our
solrcloud nodes. One replica tries to open >8000 files. This replica tries
to startup and then fails the open files are exceeded upon startup as it
tries to recover.
Our solrclouds have 12 distinct collections. I would think tha
Well, Lucene keeps an open file handle for _every_ file in _every_
index directory. So, for instance, let's say a replica has 10
segments. Each segment is 10-15 individual files. So that's 100-150
file handles right there. And indexes can have many segments.
Check to see if "cfs" extensions are in
Prob the usual reasons...no one has submitted a patch yet, or could be a
regression after LUCENE-7355.
See also:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201407.mbox/%3c1d06a081892adf4589bd83ee24b9dc3025971...@imcmbx02.mitre.org%3E
I'll take a look.
-Original Message-
: I have some custom code in solr (which is not of good quality for
: contributing back) so I need to setup my own continuous build solution. I
: tried jenkins and was hoping that ant build (ant clean compile) in Execute
: Shell textbox will work, but I am stuck at this ivy-fail error:
:
: To wor
The issue is on one of our QA collections which means I don't have access
to the systems to see. I have to go through the admins
it does have ".cfs" files in the index.
However, it turns out that the replica in question has 8007 tlog files.
This solrcloud is a target cloud for cdcr.
The replica d
I wouldn't call it massive. The index is ~9 million documents. So not too
big, the documents themselves are pretty small
On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson
wrote:
> Well, Lucene keeps an open file handle for _every_ file in _every_
> index directory. So, for instance, let's say a re
The 7.0 Ref Guide was released Monday.
An overview of the new replica types is available online here:
https://lucene.apache.org/solr/guide/7_0/shards-and-indexing-data-in-solrcloud.html#types-of-replicas.
The replica type is specified when you either create the collection or
add a replica.
On Thu
After some more digging, I'm wrong even at the Lucene level.
When I use the CustomAnalyzer and make my UC vowel mock filter MultitermAware,
I get this with Lucene in trunk:
"the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck
So, there's room for improvement with phrases, but the regular mu
Interestingly many of these tlog files (5428 out of 8007) are have 0
length!? What would cause that? As I stated this is a cdcr target
collection.
On Thu, Oct 5, 2017 at 1:19 PM, Webster Homer
wrote:
> I wouldn't call it massive. The index is ~9 million documents. So not too
> big, the documents
OK, never mind about the file handle limits, let's deal with the
tlogs. Although unlimited is a good thing.
Do you have buffering disabled on the target cluster?
Best
Erick
On Thu, Oct 5, 2017 at 11:19 AM, Webster Homer wrote:
> I wouldn't call it massive. The index is ~9 million documents. So
buffering is disabled. Indeed we disable it everywhere as all it seems to
do is leave tlogs around forever.
Autocommit is set to 60 seconds.
The source cdcr request handler looks like this. The first target is the
problematic one
{"requestHandler":{"/cdcr":{
"name":"/cdcr",
"class":"
It seems that there was a networking error just prior to the creation of
the 0 length files:
The files from Sep 27 are all written at 17:56.
There was minor packet loss (1 out of 10 packets per 60 second interval)
just prior to that time.
On Thu, Oct 5, 2017 at 3:11 PM, Webster Homer
wrote:
> bu
Hi,
Your answers have helped me a lot.
I've managed to use the LTRQParserPlugin and it does what I need. Full
control over scoring with it's re-ranking functionality.
I define my custom features and may pass custom params to them using the
"efi.*" syntax.
Is there something similar to define weight
Hello Everyone,
Say I have a document like one below.
> {
> "id":"test",
> "startTime":"2013-02-10T18:36:07.000Z"
> }
I add this document to solr index using the admin UI and "update" request
handler. It gets added successfully but when I retrieve this document back
using "id"
: I am seeing that in different test runs (e.g., by executing 'ant test' on
: the root folder in 'lucene-solr') a different subset of tests are skipped.
: Where can I find more about it? I am trying to create parity between test
: successes before and after my changes and this is causing confusio
: > "startTime":"2013-02-10T18:36:07.000Z"
...
: handler. It gets added successfully but when I retrieve this document back
: using "id" I get following.
...
: > "startTime":"2013-02-10T18:36:07Z",
...
: As you can see, the milliseconds precision in date fiel
: I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
: search term then mm param is not honoured
: I have 2 docs in index
: Doc1-
: name=lock
: Doc 2-
: name=lock lock
:
: Now when i'm quering the solr with query
:
*http://localhost:8983/solr/test2/select?defType=dismax&qf=
Now that I am got a big hunk of documents indexed with Solr, I am looking to
see whether I can try some machine learning tools to try and extract
bibliographic references out of the documents. Anyone got some recommendations
about which kits might be good to play with for something like this?
No
So for large indexes, there is a chance that filterCache of 128 can cause
bad GC.
And for smaller indexes, it would really not matter that much because well,
the index size is small and probably whole of it is in OS-cache anyways.
So perhaps a default of 64 would be a much saner choice to get the b
44 matches
Mail list logo