On 5/3/2018 12:55 PM, Satya Marivada wrote:
> We have a solr (6.3.0) index which is being re-indexed every night, it
> takes about 6-7 hours for the indexing to complete. During the time of
> re-indexing, the index becomes flaky and would serve inconsistent count of
> documents 70,000 at times and
Yes, we are doing clean and full import. Is it not supposed to serve
old(existing) index till the new index is built and then do a cleanup,
replace old index after new index is built?
Would a full import without clean not give this problem?
Thanks Erick, this would be useful.
On Thu, May 3, 2018
The short for is that different replicas in a shard have different
commit point if you go by wall-clock time. So during heavy indexing,
you can happen to catch the different counts. That really shouldn't
happen, though, unless you're clearing the index first on the
assumption that you're replacing
I'm not sure if that method is viable for reindexing and fetching the whole
collection at once for us, but unless there is something inherent in that
process which happens at the collection level, we could do it a few shards
at a time since it is a multi-tenant setup.
I'll see if we can setup a sm
(1) It doesn't matter whether it "affect only segments being merged".
You can't get accurate information if different segments have
different expectations.
(2) I strongly doubt it. The problem is that the "tainted" segments'
meta-data is still read when merging. If the segment consisted of
_only_
We tested the query on all replicas for the given shard, and they all have
the same issue. So deleting and adding another replica won't fix the
problem since the leader is exhibiting the behavior as well. I believe the
second replica was moved (new one added, old one deleted) between nodes and
so w
Never mind. Anything that didn't merge old segments, just threw them
away when empty (which was my idea) would possibly require as much
disk space as the index currently occupied, so doesn't help your
disk-constrained situation.
Best,
Erick
On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson wrote:
If it's _only_ on a particular replica, here's what you could do:
Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
define the "node" parameter on ADDREPLICA to get it back on the same
node. Then the normal replication process would pull the entire index
down from the leader.
My
I thought that decision would come back to bite us somehow. At the time, we
didn't have enough space available to do a fresh reindex alongside the old
collection, so the only course of action available was to index over the
old one, and the vast majority of its use worked as expected.
We're planni
bq: ...but the collection wasn't emptied first
This is what I'd suspect is the problem. Here's the issue: Segments
aren't merged identically on all replicas. So at some point you had
this field indexed without docValues, changed that and re-indexed. But
the segment merging could "read" the fir
I’m not sure of the root cause for your problem.
Solr is built to stay in sync automatically, so there is no need to script
anything in that regard.
There may be something with your environement, network, ZooKeeper setup or
similar that caused the state you were in. I would need to dig further in
Hi,
I did as you said, now it is coming ok.
And what are the things to look for while checking about these kind of
issues, such as mismatch count, lukerequest not returning all the fields
etc. The doc sync is one, how can I programmatically use the info and
sync them ? Is there any method
Hi,
There is clearly something wrong when your two replicas are not in sync. Could
you go to the “Cloud->Tree” tab of admin UI and look in the overseer queue
whether you find signs of stuck jobs or something?
Btw - what warnings do you see in the logs? Anything repeatedly popping up?
I would al
Hi,
a.) Yes index is static, not updated live. We index new documents over
old documents by this sequesce, deleteall docs, add 10 freshly fetched
from db, after adding all the docs to cloud instance, commit. Commit
happens only once per collection,
b.) I took one shard and below are the result
Could it be that your cluster is not in sync, so that when Solr picks three
nodes, results will vary depending on what replica answers?
A few questions:
a) Is your index static, i.e. not being updated live?
b) Can you try to go directly to the core menu of both replicas for each shard,
and comp
Wire shark should show you what HTTP request actually looks like. So, a
definite reference.
I still recommend double checking that equivalence first. It is just sanity
check before doing any more expensive digging.
You can also enable trace logging in the admin ui to see low level request
details
Hi Alexandre,
I am sure I am firing the same queries with the
same collection everytime.
How do WireShark will help ? I am sorry not experienced with that tool.
On 13/08/16 17:37, Alexandre Rafalovitch wrote:
Are you sure you are issuing the same queries to the same colle
Are you sure you are issuing the same queries to the same collections and
the same request handlers.
I would verify that before all else. Using network sniffers (Wireshark) if
necessary.
Regards,
Alex
On 13 Aug 2016 8:11 PM, "Pranaya Behera" wrote:
Hi,
I am running solr 6.1.0 with solr
Hi,
I am using Java client i.e. SorlJ.
On 13/08/16 16:31, GW wrote:
No offense intended, but you are looking at a problem with your work. You
need to explain what you are doing not what is happening.
If you are trying to use PHP and the latest PECL/PEAR, it does not work so
well. It is con
No offense intended, but you are looking at a problem with your work. You
need to explain what you are doing not what is happening.
If you are trying to use PHP and the latest PECL/PEAR, it does not work so
well. It is considerably older than Solr 6.1.
This was the only issue I ran into with 6.1.
That idea was short lived. I excluded the document. The cluster isn't
syncing even after shutting everything down and restarting.
On Sun, Mar 18, 2012 at 2:58 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:
> I had tried importing data from Manifold, and one document threw a Tika
> Exc
I had tried importing data from Manifold, and one document threw a Tika
Exception.
If I shut everything down and restart SOLR cloud, the system sync'd on
startup.
Could extraction errors be the issue?
On Sun, Mar 18, 2012 at 2:50 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:
> I h
I have nodes running on ports: 8081-8084
A couple of the other SOLR cloud nodes we complaining about not being talk
with 8081, which is the first node brought up in the cluster.
The startup process is:
1. start 3 zookeeper nodes
2. wait until complete
3. start first solr node.
4. wait until c
I think he's asking if all the nodes (same machine or not) return a
response. Presumably you have different ports for each node since they
are on the same machine.
On Sun, 2012-03-18 at 14:44 -0400, Matthew Parker wrote:
> The cluster is running on one machine.
>
> On Sun, Mar 18, 2012 at 2:07 PM
The cluster is running on one machine.
On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller wrote:
> From every node in your cluster you can hit http://MACHINE1:8084/solr in
> your browser and get a response?
>
> On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
>
> > My cloud instance finally tried to
From every node in your cluster you can hit http://MACHINE1:8084/solr in your
browser and get a response?
On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
> My cloud instance finally tried to sync. It looks like it's having connection
> issues, but I can bring the SOLR instance up in the brow
This might explain another thing I'm seeing. If I take a node down,
clusterstate.json still shows it as active. Also if I'm running 4 nodes,
take one down and assign it a new port, clusterstate.json will show 5 nodes
running.
On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller wrote:
> Nodes talk to Z
Nodes talk to ZooKeeper as well as to each other. You can see the addresses
they are trying to use to communicate with each other in the 'cloud' view of
the Solr Admin UI. Sometimes you have to override these, as the detected
default may not be an address that other nodes can reach. As a limited
I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?
On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:
> All,
>
> I
All,
I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.
This must be related to my environment somehow. Tha
I've ensured the SOLR data subdirectories and files were completed cleaned
out, but the issue still occurs.
On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson wrote:
> Matt:
>
> Just for paranoia's sake, when I was playing around with this (the
> _version_ thing was one of my problems too) I removed
Matt:
Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog
> I assuming the windows configuration looked correct?
Yeah, so far I can not spot any smoking gun...I'm confounded at the moment.
I'll re read through everything once more...
- Mark
I reindex every time I change something.
I also delete any zookeeper data too.
I assuming the windows configuration looked correct?
On Thu, Mar 1, 2012 at 3:39 PM, Mark Miller wrote:
> P.S. FYI you will have to reindex after adding _version_ back the schema...
>
> On Mar 1, 2012, at 3:35 PM, M
I tried publishing to /update/extract request handler using manifold, but
got the same result.
I also tried swapping out the replication handlers too, but that didn't do
anything.
Otherwise, that's it.
On Thu, Mar 1, 2012 at 3:35 PM, Mark Miller wrote:
> Any other customizations you are making
P.S. FYI you will have to reindex after adding _version_ back the schema...
On Mar 1, 2012, at 3:35 PM, Mark Miller wrote:
> Any other customizations you are making to solrconfig?
>
> On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote:
>
>> Added it back in. I still get the same result.
>>
>> On
Any other customizations you are making to solrconfig?
On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote:
> Added it back in. I still get the same result.
>
> On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller wrote:
> Do you have a _version_ field in your schema? I actually just came back to
> this
Added it back in. I still get the same result.
On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller wrote:
> Do you have a _version_ field in your schema? I actually just came back to
> this thread with that thought and then saw your error - so that remains my
> guess.
>
> I'm going to improve the doc
Do you have a _version_ field in your schema? I actually just came back to
this thread with that thought and then saw your error - so that remains my
guess.
I'm going to improve the doc on the wiki around what needs to be defined
for SolrCloud - so far we have things in the example defaults, but i
Mark/Sami
I ran the system with 3 zookeeper nodes, 2 solr cloud nodes, and left
numShards set to its default value (i.e. 1)
I looks like it finally sync'd with the other one after quite a while, but
it's throwing lots of errors like the following:
org.apache.solr.common.SolrException: missing _v
Sami,
I have the latest as of the 26th. My system is running on a standalone
network so it's not easy to get code updates without a wave of paperwork.
I installed as per the detailed instructions I laid out a couple of
messages ago from today (2/29/2012).
I'm running the following query:
http:/
On Wed, Feb 29, 2012 at 7:03 PM, Matthew Parker
wrote:
> I also took out my requestHandler and used the standard /update/extract
> handler. Same result.
How did you install/start the system this time? The same way as
earlier? What kind of queries do you run?
Would it be possible for you to check
I also took out my requestHandler and used the standard /update/extract
handler. Same result.
On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:
> I tried running SOLR Cloud with the default number of shards (i.e. 1), and
> I get the same results.
>
> On Wed,
I tried running SOLR Cloud with the default number of shards (i.e. 1), and
I get the same results.
On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:
> Mark,
>
> Nothing appears to be wrong in the logs. I wiped the indexes and imported
> 37 files from SharePo
Mark,
Nothing appears to be wrong in the logs. I wiped the indexes and imported
37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
has issues with the results being inconsistent.
Let me run my setup by you, and see whether that is the issue?
On one machine, I have three z
Hmm...this is very strange - there is nothing interesting in any of the logs?
In clusterstate.json, all of the shards have an active state?
There are quite a few of us doing exactly this setup recently, so there must be
something we are missing here...
Any info you can offer might help.
- Mar
Mark,
I got the codebase from the 2/26/2012, and I got the same inconsistent
results.
I have solr running on four ports 8081-8084
8081 and 8082 are the leaders for shard 1, and shard 2, respectively
8083 - is assigned to shard 1
8084 - is assigned to shard 2
queries come in and sometime it see
I'll have to check on the commit situation. We have been pushing data from
SharePoint the last week or so. Would that somehow block the documents
moving between the solr instances?
I'll try another version tomorrow. Thanks for the suggestions.
On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller wrote:
Hmmm...all of that looks pretty normal...
Did a commit somehow fail on the other machine? When you view the stats for the
update handler, are there a lot of pending adds for on of the nodes? Do the
commit counts match across nodes?
You can also query an individual node with distrib=false to che
Here is most of the cluster state:
Connected to Zookeeper
localhost:2181, localhost: 2182, localhost:2183
/(v=0 children=7) ""
/CONFIGS(v=0, children=1)
/CONFIGURATION(v=0 children=25)
< all the configuration files, velocity info, xslt, etc.
/NODE_STATES(v=0 child
I was trying to use the new interface. I see it using the old admin page.
Is there a piece of it you're interested in? I don't have access to the
Internet where it exists so it would mean transcribing it.
On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller wrote:
>
> On Feb 27, 2012, at 2:22 PM, Matth
On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
> Thanks for your reply Mark.
>
> I believe the build was towards the begining of the month. The
> solr.spec.version is 4.0.0.2012.01.10.38.09
>
> I cannot access the clusterstate.json contents. I clicked on it a couple of
> times, but nothing
Thanks for your reply Mark.
I believe the build was towards the begining of the month. The
solr.spec.version is 4.0.0.2012.01.10.38.09
I cannot access the clusterstate.json contents. I clicked on it a couple of
times, but nothing happens. Is that stored on disk somewhere?
I configured a custom r
Hey Matt - is your build recent?
Can you visit the cloud/zookeeper page in the admin and send the contents of
the clusterstate.json node?
Are you using a custom index chain or anything out of the ordinary?
- Mark
On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> TWIMC:
>
> Environment
>
I think the key here is you are a bit confused about what
the multiValued thing is all about. The fq clause says,
essentially, "restrict all my search results to the documents
where 1213206 occurs in sou_codeMetier.
That's *all* the fq clause does.
Now, by saying facet.field=sou_codeMetier you're
My interpretation of your results are that your FQ found 1281 documents
with 1213206 value in sou_codeMetier field. Of those results, 476 also
had 1212104 as a value...and so on. Since ALL the results will have
the field value in your FQ, then I would expect the "other" values to
be equal or less
Pravesh,
Not exactly. Here is the search I do, in more details (different field name,
but same issue).
I want to get a count for a specific value of the sou_codeMetier field,
which is multivalued. I expressed this by including a fq clause :
/select/?q=*:*&facet=true&facet.field=sou_codeMetier&fq
Could u clarify on below:
>>When I make a search on facet.qua_code=1234567 ??
Are u trying to say, when u fire a fresh search for a facet item, like;
q=qua_code:1234567??
This this would fetch for documents where qua_code fields contains either
the terms 1234567 OR both terms (1234567 & 9384738..
s d wrote:
Hi,I use SOLR with standard handler and when i send the same exact query to
solr i get different results every time (i.e. refresh the page with the
query and get different results).
Any ideas?
Thx,
what is the query? are there any changes to the index in between?
Check the logs to
I fixed that problem with reconfiguring schema.xml.
Thanks for your help.
Jak
Grant Ingersoll yazmış:
Have you setup your Analyzers, etc. so they correspond to the exact
ones that you were using in Lucene? Under the Solr Admin you can try
the analysis tool to see how your index and queries are
Have you setup your Analyzers, etc. so they correspond to the exact
ones that you were using in Lucene? Under the Solr Admin you can try
the analysis tool to see how your index and queries are treated. What
happens if you do a *:* query from the Admin query screen?
If your index is reason
61 matches
Mail list logo