>> We are using Solr to index our data. The data contains £ symbol within the
>> text and for currency. When data is exported from the source system data
>> contains £ symbol, however, when the data is imported into the Solr £ symbol
>> is converted to .
>>
> >How can we keep the £ symbol as is
Shalin,
Given the earlier response by Erick, wondering when this scenario occurs
i.e. when the replica node recovers after a time period, wouldn’t it
automatically recover all the missed updates by connecting to the leader?
My understanding is the below from the responses so far (assuming
replicat
The min_rf parameter does not fail indexing. It only tells you how many
replicas received the live update. So if the value is less than what you
wanted then it is up to you to retry the update later.
On Wed, May 2, 2018 at 3:33 PM, Greenhorn Techie
wrote:
> Hi,
>
> Good Morning!!
>
> In the case
On 5/2/2018 6:23 PM, Erick Erickson wrote:
> Perhaps this is: SOLR-11660?
That definitely looks like the problem that Micheal describes. And it
indicates that restarting Solr instances after restore is a workaround.
The issue also says something that might indicate that collection reload
after r
Perhaps this is: SOLR-11660?
On Wed, May 2, 2018 at 4:46 PM, Shawn Heisey wrote:
> On 5/2/2018 3:52 PM, Michael B. Klein wrote:
>> It works ALMOST perfectly. The restore operation reports success, and if I
>> look at the UI, everything looks great in the Cloud graph view. All green,
>> one leader
On 5/2/2018 3:52 PM, Michael B. Klein wrote:
> It works ALMOST perfectly. The restore operation reports success, and if I
> look at the UI, everything looks great in the Cloud graph view. All green,
> one leader and two other active instances per collection.
>
> But once we start updating, we run i
On 5/2/2018 11:45 AM, Patrick Recchia wrote:
> Is there any logging I can turn on to know when a commit happens and/or
> when a segment is flushed?
The normal INFO-level logging that Solr ships with will log all
commits. It probably doesn't log segment flushes unless they happen as
a result of a
On 5/2/2018 1:03 PM, Mike Konikoff wrote:
> Is there a way to configure the DataImportHandler to use bind variables for
> the entity queries? To improve database performance.
Can you clarify where these variables would come from and precisely what
you want to do?
>From what I can tell, you're tal
On 5/2/2018 2:56 PM, Weffelmeyer, Stacie wrote:
> Question on faceting. We have a dynamicField that we want to facet
> on. Below is the field and the type of information that field generates.
>
>
>
> cid:image001.png@01D3E22D.DE028870
>
This image is not available. This mailing list will almos
On 5/2/2018 10:58 AM, Greenhorn Techie wrote:
> The current hardware profile for our production cluster is 20 nodes, each
> with 24cores and 256GB memory. Data being indexed is very structured in
> nature and is about 30 columns or so, out of which half of them are
> categorical with a defined list
This is a problem that we’ve noted too.
This blog post discusses the underlying cause
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
Hope that helps
On Wed, May 2, 2018 at 3:07 PM Chris Wilt wrote:
> I began with a 7.2.1 solr instance using the techpr
Hi all,
I've encountered a reproducible and confusing issue with our Solr 6.6
cluster. (Updating to 7.x is an option, but not an immediate one.) This is
in our staging environment, running on AWS. To save money, we scale our
entire stack down to zero instances every night and spin it back up every
Hi,
Question on faceting. We have a dynamicField that we want to facet on. Below
is the field and the type of information that field generates.
[cid:image001.png@01D3E22D.DE028870]
"customMetadata":["{\"controlledContent\":{\"metadata\":{\"programs\":[\"program1\"],\"departments\":[\"
All,
percentiles only work with numbers, not dates.
If I use the ms function, I can get the number of milliseconds between NOW and
the import date. Then we can use that result in calculating the median age of
the documents using percentiles.
rows=0&stats=true&stats.field={!tag=piv1 percentiles=
Hello,
Anyone here to reproduce this oddity? It shows up in all our collections once
we enable the stats page to show filterCache entries.
Is this normal? Am i completely missing something?
Thanks,
Markus
-Original message-
> From:Markus Jelsma
> Sent: Tuesday 1st May 2018 17:32
>
Thanks Shawn for the inputs, which will definitely help us to scale our
cluster better.
Regards
On 2 May 2018 at 18:15:12, Shawn Heisey (apa...@elyograg.org) wrote:
On 5/1/2018 5:33 PM, Greenhorn Techie wrote:
> Wondering what are the considerations to be aware to arrive at an optimal
> heap si
Thanks Walter and Erick for the valuable suggestions. We shall try out
various values for shards and as well other tuning metrics I discussed in
various threads earlier.
Kind Regards
On 2 May 2018 at 18:24:31, Erick Erickson (erickerick...@gmail.com) wrote:
I've seen 1.5 M docs/second. Basicall
I began with a 7.2.1 solr instance using the techproducts sample data. Next, I
added "a" as a stopword (there were originally no stopwords).
I tried two queries: "x a b" and "x b".
Here is the raw query parameters:
q=x b&fl=id,score,price&sort=score desc&qf=name^0.75 manu cat^3.0
features^
Is there a way to configure the DataImportHandler to use bind variables for
the entity queries? To improve database performance.
Thanks,
Mike
Youcan turn on "infostream", but that is _very_ voluminous. The
regular Solr logs at INFO level should show commits though
On Wed, May 2, 2018 at 10:45 AM, Patrick Recchia
wrote:
> Swawn,
> thanks you very much for your answer.
>
>
> On Wed, May 2, 2018 at 6:27 PM, Shawn Heisey wrote:
>
>> On
Swawn,
thanks you very much for your answer.
On Wed, May 2, 2018 at 6:27 PM, Shawn Heisey wrote:
> On 5/2/2018 4:54 AM, Patrick Recchia wrote:
> > I'm seeing way too many commits on our solr cluster, and I don't know
> why.
>
> Are you sure there are commits happening? Do you have logs actuall
You can always increase the maximum segment size. For large indexes
that should reduce the number of segments. But watch your indexing
stats, I can't predict the consequences of bumping it to 100G for
instance. I'd _expect_ bursty I/O whne those large segments started
to be created or merged
I've seen 1.5 M docs/second. Basically the indexing throughput is gated
by two things:
1> the number of shards. Indexing throughput essentially scales up
reasonably linearly with the number of shards.
2> the indexing program that pushes data to Solr. Before thinking Solr
is the bottleneck, check ho
On 5/1/2018 5:33 PM, Greenhorn Techie wrote:
> Wondering what are the considerations to be aware to arrive at an optimal
> heap size for Solr JVM? Though I did discuss this on the IRC, I am still
> unclear on how Solr uses the JVM heap space. Are there any pointers to
> understand this aspect bette
We have a similar sized cluster, 32 nodes with 36 processors and 60 Gb RAM each
(EC2 C4.8xlarge). The collection is 24 million documents with four shards. The
cluster
is Solr 6.6.2. All storage is SSD EBS.
We built a simple batch loader in Java. We get about one million documents per
minute
with
Hi,
The current hardware profile for our production cluster is 20 nodes, each
with 24cores and 256GB memory. Data being indexed is very structured in
nature and is about 30 columns or so, out of which half of them are
categorical with a defined list of values. The expected peak indexing
throughput
Figured out that offset is used as part of the grouping patch which I applied
(SOLR-8776) :
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
+ if (query instanceof AbstractReRankQuery){
+topNGroups = cmd.getOffset() +
((AbstractReRankQuery)query).getReRankDocs()
On 5/2/2018 4:54 AM, Patrick Recchia wrote:
> I'm seeing way too many commits on our solr cluster, and I don't know why.
Are you sure there are commits happening? Do you have logs actually
saying that a commit is occurring? The creation of a new segment does
not necessarily mean a commit happene
The main reason we go this route is that after awhile (with default
settings) we end up with hundreds of shards and performance of course
drops abysmally as a result. By using a stepped optimize a) we don't run
into the we need the 3x+ head room issue, b) optimize performance
penalty during opt
Sounds just like it, i will check it out!
Thanks both!
Markus
-Original message-
> From:Erick Erickson
> Sent: Wednesday 2nd May 2018 17:21
> To: solr-user
> Subject: Re: Collection reload leaves dangling SolrCore instances
>
> Markus:
>
> You may well be hitting SOLR-11882.
>
>
That's a pretty open-ended question. The short form
is when the replica switches back to "active" (or green
on the admin UI) then it's been caught up.
This is all about NRT replicas.
PULL and TLOG replicas pull the segments from the
leader so the idea of "sending a doc to the replica"
doesn't rea
Hi Erick
What will happen after replica recovered ? Is leader continuously
checks status of replica and send again after recovered or replica will
pull document for indexing after recovering ?
Please clarify this behavior for all of Replica types i.e. NRT, TLOG and
PULL. (i have implemented solr
1> When the replica fails, the leader tries to resend it, and if the
resends fail,
then the follower goes into recovery which will eventually get the document
caught up.
2> Yes, the client will get a failure indication.
Best,
Erick
On Wed, May 2, 2018 at 3:03 AM, Greenhorn Techie
wrot
Two possibilities:
1> you have multiple replicas in the same JVM and are seeing commits
happen withall of them.
2> ramBufferSizeMB. when you index docs, segments are flushed when the
in-memory structures exceed this limit, is this perhaps what you're
seeing?
Best,
Erick
On Wed, May 2, 2018 at 3:
Markus:
You may well be hitting SOLR-11882.
On Wed, May 2, 2018 at 8:18 AM, Shawn Heisey wrote:
> On 5/2/2018 4:40 AM, Markus Jelsma wrote:
>> One of our collections, that is heavy with tons of TokenFilters using large
>> dictionaries, has a lot of trouble dealing with collection reload. I remo
1> You have to prototype, see:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
2> No. It could be done, but it'd take some very careful work.
Basically you'd have to merge "adjacent" shards where "adjacent" is
measured by the shard range of e
On 5/2/2018 4:40 AM, Markus Jelsma wrote:
> One of our collections, that is heavy with tons of TokenFilters using large
> dictionaries, has a lot of trouble dealing with collection reload. I removed
> all custom plugins from solrconfig, dumbed the schema down and removed all
> custom filters and
And if you _do_ have a uniqueKey ("id" by default), subsequent records
will overwrite older records with the same key.
The tip from Annameneni is the first thing I'd try though, make sure
you've issued a commit.
Best,
Erick
On Wed, May 2, 2018 at 7:09 AM, ANNAMANENI RAVEENDRA
wrote:
> Possible
Just what it says. Solr/Lucene like lots of file handles, I regularly
see several thousand. If you run out of file handles Solr stops
working.
Ditto processes. Solr in particular spawns a lot of threads,
particularly when handling many incoming requests through Jetty. If
you exceed the limit, requ
Hi ,
I am trying to upgrade solr from 7.1.0 to 7.3.0 .
While trying to start the solr process the below warnings are observed:
*** [WARN] *** Your open file limit is currently 1024.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_
On 5/2/2018 3:13 AM, Mohan Cheema wrote:
> We are using Solr to index our data. The data contains £ symbol within the
> text and for currency. When data is exported from the source system data
> contains £ symbol, however, when the data is imported into the Solr £ symbol
> is converted to �.
>
>
Possible cases can be
If you don’t have unique key then there are high chances that you will see
less data
Try hard commit or check your commit times (hard/soft)
On Wed, May 2, 2018 at 9:30 AM Srinivas Kashyap <
srini...@tradestonesoftware.com> wrote:
> Hi,
>
> I have standalone solr index serv
BlendedInfixLookupFactory is not returning terms, but returns the field
value. If I change to FuzzyLookupFactory it works fine. Am I doing something
wrong?
default
BlendedInfixLookupFactory
position_linear
DocumentDictionaryFactory
weight
text_suggest
language
Hi,
I have standalone solr index server 5.2.1 and have a core with 15 fields(all
indexed and stored).
Through DIH I'm indexing the data (around 65million records). The index process
took 6hours to complete. But after the completion when I checked through Solr
admin query console(*:*), numfound
Hi,
I have few questions on sharding in a SolrCloud setup:
1. How to know the optimal number of shards required for a SolrCloud setup?
What are the factors to consider to decide on the value for *numShards*
parameter?
2. In case if over sharding has been done i.e. if numShards has been set to
a v
A very high rate of indexing documents could cause heap usage to go high
(all temporary objects getting created are in JVM memory and with very high
rate heap utilization may go high)
Having Cache's not sized/set correctly would also return in high JVM usage
since as searches are happening, it wil
Take a look at https://wiki.apache.org/solr/SolrPerformanceProblems. The
section "how much heap do i need" talks about that.
Cache also goes to JVM so take a look how much you need/allocating for
different cache's.
Thnx
On Tue, May 1, 2018 at 7:33 PM, Greenhorn Techie
wrote:
> Hi,
>
> Wonderi
I need to use autocomplete with edismax (ngrams,edgegrams) to return shingled
suggestions. Field value "new york city" needs to return on query "ne" ->
"new","new york","new york city". With suggester this is easy. But im forced
to use edismax because I need to apply mutliple filter queries.
What
Hello,
I'm seeing way too many commits on our solr cluster, and I don't know why.
Here is the landscape:
- Each collection we create (one per day) is created with 10 shards with 2
replicas each.
- we send live data, 2B records / day. so on average 200M records/shard per
day - for a size of approx
Hello,
One of our collections, that is heavy with tons of TokenFilters using large
dictionaries, has a lot of trouble dealing with collection reload. I removed
all custom plugins from solrconfig, dumbed the schema down and removed all
custom filters and replaced a customized decompounder with L
Hi,
Good Morning!!
In the case of a SolrCloud setup with sharing and replication in place,
when a document is sent for indexing, what happens when only the shard
leader has indexed the document, but the replicas failed, for whatever
reason. Will the document be resent by the leader to the replica
Hi There,
We are using Solr to index our data. The data contains £ symbol within the text
and for currency. When data is exported from the source system data contains £
symbol, however, when the data is imported into the Solr £ symbol is converted
to �.
How can we keep the £ symbol as is when
Hi Alessandro,
Thanks for responding.
Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:
{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "
Hi Alessandro,
Thanks for responding.
Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:
{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" :
Hi Alessandro,
Thanks for responding.
Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:
{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" :
Hi Alessandro,
Thanks for responding.
Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:
{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
56 matches
Mail list logo