solr nested multivalued fields
I would like to produce the following result in a Solr search result but not sure it is possible to do? (Using Solr 3.6) John Darby Sue Berger However, i cant seem to manage getting this Tree like structre in my results. At best I can get something to look like the following which is not even close: John Darby Sue Berger There are two problem here. Firstly, I cannot seem to "group" these people into a meaningful tag structure as per the top example. Second, I cant for the life of me get the tags to display an attribute name like "lastName" or "firstName" when inside an array? In my project I am pulling this data using a DIH and from the example above one can see that this is a on-to-many relationship between groups and users. I really would appreciate it is someone has some suggestions or alternative thoughts. Any assistance would be greatly appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/solr-nested-multivalued-fields-tp3989114.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr nested multivalued fields
Thanks, From all the material i have looked at and searched I am inclined to believe that those are indeed my options, any others are still welcome... -- View this message in context: http://lucene.472066.n3.nabble.com/solr-nested-multivalued-fields-tp3989114p3989260.html Sent from the Solr - User mailing list archive at Nabble.com.
Same query, inconsistent result in SolrCloud
Hi! I'm facing a problem. I'm using SolrCloud 4.10.3, with 2 shards, each shard have 2 replicas. After index data to the collection, and run the same query, http://localhost:8983/solr/catalog/select?q=a&wt=json&indent=true Sometimes, it return the right, { "responseHeader":{ "status":0, "QTime":19, "params":{ "indent":"true", "q":"a", "wt":"json"}}, "response":{"numFound":5,"start":0,"maxScore":0.43969032,"docs":[ {},{},... ] } } But, when I re-run the same query, it return : { "responseHeader":{ "status":0, "QTime":14, "params":{ "indent":"true", "q":"a", "wt":"json"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }, "highlighting":{}} Just some short word will show this kind of problem. Do anyone know what's going on? Thanks Regards, Jerome
Re: Same query, inconsistent result in SolrCloud
Dear Erick, Thank you, I fond it's the problem of my text segmentation setting. Anyway, thanks. Regards, Jerome 2015-06-21 0:43 GMT+08:00 Erick Erickson : > Just that this _shouldn't_ be going on at all. Either > 1> you've done something when setting up this collection that's > producing this incorrect result > or > 2> you've found something really bad. > > So: > 1> How did you create your collection? Did you use the Collections API > or try to define individual cores with the Core Admin api? If the latter, > you likely missed some innocent-seeming property on the core and your > collection isn't correctly configured. Please do NOT use the "core admin > API" > in SolrCloud unless you know the code very, very well. Use the > Collections API always. > > 2> Try querying each replica with &distrib=false. That'll return only the > docs on the particular replica you fire the query at. Do you have > replicas in the _same_ shard with different numbers of docs? If so, > what can you see that's different about those cores? > > 3> What does your clusterstate.json file show? > > 4> how did you index documents? > > Best, > Erick > > On Fri, Jun 19, 2015 at 8:07 PM, Jerome Yang > wrote: > > Hi! > > > > I'm facing a problem. > > I'm using SolrCloud 4.10.3, with 2 shards, each shard have 2 replicas. > > > > After index data to the collection, and run the same query, > > > > http://localhost:8983/solr/catalog/select?q=a&wt=json&indent=true > > > > Sometimes, it return the right, > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":19, > > "params":{ > > "indent":"true", > > "q":"a", > > "wt":"json"}}, > > "response":{"numFound":5,"start":0,"maxScore":0.43969032,"docs":[ > > {},{},... > > > > ] > > > > } > > > > } > > > > But, when I re-run the same query, it return : > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":14, > > "params":{ > > "indent":"true", > > "q":"a", > > "wt":"json"}}, > > "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] > > }, > > "highlighting":{}} > > > > > > Just some short word will show this kind of problem. > > > > Do anyone know what's going on? > > > > Thanks > > > > Regards, > > > > Jerome >
Send kill -9 to a node and can not delete down replicas with onlyIfDown.
Hi all, Here's the situation. I'm using solr5.3 in cloud mode. I have 4 nodes. After use "kill -9 pid-solr-node" to kill 2 nodes. These replicas in the two nodes still are "ACTIVE" in zookeeper's state.json. The problem is, when I try to delete these down replicas with parameter onlyIfDown='true'. It says, "Delete replica failed: Attempted to remove replica : demo.public.tbl/shard0/core_node4 with onlyIfDown='true', but state is 'active'." >From this link: <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE> http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE It says: *NOTE*: when the node the replica is hosted on crashes, the replica's state may remain ACTIVE in ZK. To determine if the replica is truly active, you must also verify that its node <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.html#getNodeName--> is under /live_nodes in ZK (or use ClusterState.liveNodesContain(String) <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/ClusterState.html#liveNodesContain-java.lang.String-> ). So, is this a bug? Regards, Jerome
Re: Send kill -9 to a node and can not delete down replicas with onlyIfDown.
What I'm doing is to simulate host crashed situation. Consider this, a host is not connected to the cluster. So, if a host crashed, I can not delete the down replicas by using onlyIfDown='true'. But in solr admin ui, it shows down for these replicas. And whiteout "onlyIfDown", it still show a failure: Delete replica failed: Attempted to remove replica : demo.public.tbl/shard0/core_node4 with onlyIfDown='true', but state is 'active'. Is this the right behavior? If a hosts gone, I can not delete replicas in this host? Regards, Jerome On Wed, Jul 20, 2016 at 1:58 AM, Justin Lee wrote: > Thanks for taking the time for the detailed response. I completely get what > you are saying. Makes sense. > On Tue, Jul 19, 2016 at 10:56 AM Erick Erickson > wrote: > > > Justin: > > > > Well, "kill -9" just makes it harder. The original question > > was whether a replica being "active" was a bug, and it's > > not when you kill -9; the Solr node has no chance to > > tell Zookeeper it's going away. ZK does modify > > the live_nodes by itself, thus there are checks as > > necessary when a replica's state is referenced > > whether the node is also in live_nodes. And an > > overwhelming amount of the time this is OK, Solr > > recovers just fine. > > > > As far as the write locks are concerned, those are > > a Lucene level issue so if you kill Solr at just the > > wrong time it's possible that that'll be left over. The > > write locks are held for as short a period as possible > > by Lucene, but occasionally they can linger if you kill > > -9. > > > > When a replica comes up, if there is a write lock already, it > > doesn't just take over; it fails to load instead. > > > > A kill -9 won't bring the cluster down by itself except > > if there are several coincidences. Just don't make > > it a habit. For instance, consider if you kill -9 on > > two Solrs that happen to contain all of the replicas > > for a shard1 for collection1. And you _happen_ to > > kill them both at just the wrong time and they both > > leave Lucene write locks for those replicas. Now > > no replica will come up for shard1 and the collection > > is unusable. > > > > So the shorter form is that using "kill -9" is a poor practice > > that exposes you to some risk. The hard-core Solr > > guys work extremely had to compensate for this kind > > of thing, but kill -9 is a harsh, last-resort option and > > shouldn't be part of your regular process. And you should > > expect some "interesting" states when you do. And > > you should use the bin/solr script to stop Solr > > gracefully. > > > > Best, > > Erick > > > > > > On Tue, Jul 19, 2016 at 9:29 AM, Justin Lee > > wrote: > > > Pardon me for hijacking the thread, but I'm curious about something you > > > said, Erick. I always thought that the point (in part) of going > through > > > the pain of using zookeeper and creating replicas was so that the > system > > > could seamlessly recover from catastrophic failures. Wouldn't an OOM > > > condition have a similar effect (or maybe java is better at cleanup on > > that > > > kind of error)? The reason I ask is that I'm trying to set up a solr > > > system that is highly available and I'm a little bit surprised that a > > kill > > > -9 on one process on one machine could put the entire system in a bad > > > state. Is it common to have to address problems like this with manual > > > intervention in production systems? Ideally, I'd hope to be able to > set > > up > > > a system where a single node dying a horrible death would never require > > > intervention. > > > > > > On Tue, Jul 19, 2016 at 8:54 AM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > >> First of all, killing with -9 is A Very Bad Idea. You can > > >> leave write lock files laying around. You can leave > > >> the state in an "interesting" place. You haven't given > > >> Solr a chance to tell Zookeeper that it's going away. > > >> (which would set the state to "down"). In short > > >> when you do this you have to deal with the consequences > > >> yourself, one of which is this mismatch between > > >> cluster state and live_nodes. > > >> > > >> Now, that rant done the bin/solr script tries to stop Solr > >
Re: Send kill -9 to a node and can not delete down replicas with onlyIfDown.
Thanks a lot everyone! By setting onlyIfDown=false, it did remove the replica. But still return a failure message. That confuse me. Anyway, thanks Erick and Chris. Regards, Jerome On Thu, Jul 21, 2016 at 5:47 AM, Chris Hostetter wrote: > > Maybe the problem here is some confusion/ambuguity about the meaning of > "down" ? > > TL;DR: think of "onlyIfDown" as "onlyIfShutDownCleanly" > > > IIUC, the purpose of the 'onlyIfDown' is a safety valve so (by default) > the cluster will prevent you from removing a replica that wasn't shutdown > *cleanly* and is officially in a "down" state -- as recorded in the > ClusterState for the collection (either the collections state.json or the > global clusterstate.json if you have an older solr instance) > > when you kill -9 a solr node, the replicas that were hosted on that node > will typically still be listed in the cluster state as "active" -- but it > will *not* be in live_nodes, which is how solr knows that replica can't > currently be used (and leader recovery happens as needed, etc...). > > If, however, you shut the node down cleanly (or if -- for whatever reason > -- the node is up, but the replica's SolrCore is not active) then the > cluster state will record that replica as "down" > > Where things unfortunately get confusing, is that the CLUSTERSTATUS api > call -- aparently in an attempt to try and implify things -- changes the > recorded status of any replica to "down" if that replica is hosted on a > node which is not in live_nodes. > > I suspect that since hte UI uses the CLUSTERSTATUS api to get it's state > information, it doesn't display much diff between a replica shut down > cleanly and a replica that is hosted on a node which died abruptly. > > I suspect that's where your confusion is coming from? > > > Ultimately, what onlyIfDown is trying to do is help ensure that you don't > accidently delete a replica that you didn't mean to. the opertaing > assumption is that the only replicas you will (typically) delete are > replicas that you shut down cleanly ... if a replica is down because of a > hard crash, then that is an exceptional situation and presumibly you will > either: a) try to bring the replica back up; b) delete the replica using > onlyIfDown=false to indicate that you know the replica you are deleting > isn't 'down' intentionally, but you want do delete it anyway. > > > > > > On Wed, 20 Jul 2016, Erick Erickson wrote: > > : Date: Wed, 20 Jul 2016 08:26:32 -0700 > : From: Erick Erickson > : Reply-To: solr-user@lucene.apache.org > : To: solr-user > : Subject: Re: Send kill -9 to a node and can not delete down replicas with > : onlyIfDown. > : > : Yes, it's the intended behavior. The whole point of the > : onlyIfDown flag was as a safety valve for those > : who wanted to be cautious and guard against typos > : and the like. > : > : If you specify onlyIfDown=false and the node still > : isn't removed from ZK, it's not right. > : > : Best, > : Erick > : > : On Tue, Jul 19, 2016 at 10:41 PM, Jerome Yang wrote: > : > What I'm doing is to simulate host crashed situation. > : > > : > Consider this, a host is not connected to the cluster. > : > > : > So, if a host crashed, I can not delete the down replicas by using > : > onlyIfDown='true'. > : > But in solr admin ui, it shows down for these replicas. > : > And whiteout "onlyIfDown", it still show a failure: > : > Delete replica failed: Attempted to remove replica : > : > demo.public.tbl/shard0/core_node4 with onlyIfDown='true', but state is > : > 'active'. > : > > : > Is this the right behavior? If a hosts gone, I can not delete replicas > in > : > this host? > : > > : > Regards, > : > Jerome > : > > : > On Wed, Jul 20, 2016 at 1:58 AM, Justin Lee > wrote: > : > > : >> Thanks for taking the time for the detailed response. I completely > get what > : >> you are saying. Makes sense. > : >> On Tue, Jul 19, 2016 at 10:56 AM Erick Erickson < > erickerick...@gmail.com> > : >> wrote: > : >> > : >> > Justin: > : >> > > : >> > Well, "kill -9" just makes it harder. The original question > : >> > was whether a replica being "active" was a bug, and it's > : >> > not when you kill -9; the Solr node has no chance to > : >> > tell Zookeeper it's going away. ZK does modify > : >> > the live_nodes by itself, thu
Delete replica on down node, after start down node, the deleted replica comes back.
Hi all, I run into a strange behavior. Both on solr6.1 and solr5.3. For example, there are 4 nodes in cloud mode, one of them is stopped. Then I delete a replica on the down node. After that I start the down node. The deleted replica comes back. Is this a normal behavior? Same situation. 4 nodes, 1 node is down. And I delete a collection. After start the down node. Replicas in the down node of that collection come back again. And I can not use collection api DELETE to delete it. It says that collection is not exist. But if I use CREATE action to create a same name collection, it says collection is already exist. The only way is to make things right is to clean it manually from zookeeper and data directory. How to prevent this happen? Regards, Jerome
In cloud mode, using implicit router. Leader changed, not available to index data, and no error occurred.
Hi all, The situation is: Three hosts, host1, host2, host3. Solr version 6.1 in cloud mode. 8 solr nodes on each host. Create a collection using implicit router. Execute index and delete index. The collection works fine. Then kill 3 nodes, some of shards change leader. Then index data to new leaders of shards, and commit. But some of shards still has 0 documents. And no error occurred. By checking the log on that leader replica, it did receive the update request and processed. No error found in the log. After restart all nodes, everything works fine. This is a serious bug I think. Can you confirm it's a bug or not? Regards, Jerome
Re: In cloud mode, using implicit router. Leader changed, not available to index data, and no error occurred.
I'm sure I send documents to that shard. And execute commit. I also use curl to index, but not error occurred and no documents are indexed. On Mon, Sep 19, 2016 at 11:27 PM, Erick Erickson wrote: > Are all the documents in the collection? By using implicit router, you are > assuming control of what shard each document ends up on. So my > guess is that you are not routing the docs to each shard. > > If you want Solr to automatically assign the shard to a doc, you should > be using the default compositeId routing scheme. > > If you index docs and not all of them are somewhere in the collection, > that's a problem, assuming you are routing them properly when using > the implicit router. > > Best, > Erick > > On Sun, Sep 18, 2016 at 8:04 PM, Jerome Yang wrote: > > Hi all, > > > > The situation is: > > Three hosts, host1, host2, host3. Solr version 6.1 in cloud mode. 8 solr > > nodes on each host. > > > > Create a collection using implicit router. Execute index and delete > index. > > The collection works fine. > > Then kill 3 nodes, some of shards change leader. > > Then index data to new leaders of shards, and commit. But some of shards > > still has 0 documents. And no error occurred. > > By checking the log on that leader replica, it did receive the update > > request and processed. No error found in the log. > > > > After restart all nodes, everything works fine. > > > > This is a serious bug I think. > > Can you confirm it's a bug or not? > > > > Regards, > > Jerome >
Re: In cloud mode, using implicit router. Leader changed, not available to index data, and no error occurred.
That shard did receive update request, because it shows in the log. And also commit request. But no documents indexed. On Tue, Sep 20, 2016 at 2:26 PM, Jerome Yang wrote: > I'm sure I send documents to that shard. And execute commit. > > I also use curl to index, but not error occurred and no documents are > indexed. > > On Mon, Sep 19, 2016 at 11:27 PM, Erick Erickson > wrote: > >> Are all the documents in the collection? By using implicit router, you are >> assuming control of what shard each document ends up on. So my >> guess is that you are not routing the docs to each shard. >> >> If you want Solr to automatically assign the shard to a doc, you should >> be using the default compositeId routing scheme. >> >> If you index docs and not all of them are somewhere in the collection, >> that's a problem, assuming you are routing them properly when using >> the implicit router. >> >> Best, >> Erick >> >> On Sun, Sep 18, 2016 at 8:04 PM, Jerome Yang wrote: >> > Hi all, >> > >> > The situation is: >> > Three hosts, host1, host2, host3. Solr version 6.1 in cloud mode. 8 solr >> > nodes on each host. >> > >> > Create a collection using implicit router. Execute index and delete >> index. >> > The collection works fine. >> > Then kill 3 nodes, some of shards change leader. >> > Then index data to new leaders of shards, and commit. But some of shards >> > still has 0 documents. And no error occurred. >> > By checking the log on that leader replica, it did receive the update >> > request and processed. No error found in the log. >> > >> > After restart all nodes, everything works fine. >> > >> > This is a serious bug I think. >> > Can you confirm it's a bug or not? >> > >> > Regards, >> > Jerome >> > >
Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
Hi all, I'm facing a strange problem. Here's a solrcloud on a single machine which has 2 solr nodes, version: solr6.1. I create a collection with 2 shards and replica factor is 3 with default router called "test_collection". Index some documents and commit. Then I backup this collection. After that, I restore from the backup and name the restored collection "restore_test_collection". Query from "restore_test_collection". It works fine and data is consistent. Then, I index some new documents, and commit. I find that the documents are all indexed in shard1 and the leader of shard1 don't have these new documents but other replicas do have these new documents. Anyone have this issue? Really need your help. Regards, Jerome
Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
Using curl do some tests. curl 'http://localhost:8983/solr/restore_test_collection/update? *commit=true*&wt=json' --data-binary @test.json -H 'Content-type:application/json' The leader don't have new documents, but other replicas have. curl 'http://localhost:8983/solr/restore_test_collection/update? *commitWithin**=1000*&wt=json' --data-binary @test.json -H 'Content-type:application/json' All replicas in shard1 have new documents include leader, and all new documents route to shard1. On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang wrote: > Hi all, > > I'm facing a strange problem. > > Here's a solrcloud on a single machine which has 2 solr nodes, version: > solr6.1. > > I create a collection with 2 shards and replica factor is 3 with default > router called "test_collection". > Index some documents and commit. Then I backup this collection. > After that, I restore from the backup and name the restored collection > "restore_test_collection". > Query from "restore_test_collection". It works fine and data is consistent. > > Then, I index some new documents, and commit. > I find that the documents are all indexed in shard1 and the leader of > shard1 don't have these new documents but other replicas do have these new > documents. > > Anyone have this issue? > Really need your help. > > Regards, > Jerome >
Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
@Mark Miller Please help~ On Tue, Oct 11, 2016 at 5:32 PM, Jerome Yang wrote: > Using curl do some tests. > > curl 'http://localhost:8983/solr/restore_test_collection/update? > *commit=true*&wt=json' --data-binary @test.json -H > 'Content-type:application/json' > > The leader don't have new documents, but other replicas have. > > curl 'http://localhost:8983/solr/restore_test_collection/update? > *commitWithin**=1000*&wt=json' --data-binary @test.json -H > 'Content-type:application/json' > All replicas in shard1 have new documents include leader, and all new > documents route to shard1. > > On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang wrote: > >> Hi all, >> >> I'm facing a strange problem. >> >> Here's a solrcloud on a single machine which has 2 solr nodes, version: >> solr6.1. >> >> I create a collection with 2 shards and replica factor is 3 with default >> router called "test_collection". >> Index some documents and commit. Then I backup this collection. >> After that, I restore from the backup and name the restored collection >> "restore_test_collection". >> Query from "restore_test_collection". It works fine and data is >> consistent. >> >> Then, I index some new documents, and commit. >> I find that the documents are all indexed in shard1 and the leader of >> shard1 don't have these new documents but other replicas do have these new >> documents. >> >> Anyone have this issue? >> Really need your help. >> >> Regards, >> Jerome >> > >
Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
Hi Shawn, I just check the clusterstate.json <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fclusterstate.json> which is restored for "restore_test_collection". The router is "router":{"name":"compositeId"}, not implicit. So, it's a very serious bug I think. Should this bug go into jira? Please help! Regards, Jerome On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey wrote: > On 10/11/2016 3:27 AM, Jerome Yang wrote: > > Then, I index some new documents, and commit. I find that the > > documents are all indexed in shard1 and the leader of shard1 don't > > have these new documents but other replicas do have these new documents. > > Not sure why the leader would be missing the documents but other > replicas have them, but I do have a theory about why they are only in > shard1. Testing that theory will involve obtaining some information > from your system: > > What is the router on the restored collection? You can see this in the > admin UI by going to Cloud->Tree, opening "collections", and clicking on > the collection. In the right-hand side, there will be some info from > zookeeper, with some JSON below it that should mention the router. I > suspect that the router on the new collection may have been configured > as implicit, instead of compositeId. > > Thanks, > Shawn > >
Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
@Erick Please helpđ On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang wrote: > Hi Shawn, > > I just check the clusterstate.json > <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fclusterstate.json> > which > is restored for "restore_test_collection". > The router is "router":{"name":"compositeId"}, > not implicit. > > So, it's a very serious bug I think. > Should this bug go into jira? > > Please help! > > Regards, > Jerome > > > On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey wrote: > >> On 10/11/2016 3:27 AM, Jerome Yang wrote: >> > Then, I index some new documents, and commit. I find that the >> > documents are all indexed in shard1 and the leader of shard1 don't >> > have these new documents but other replicas do have these new documents. >> >> Not sure why the leader would be missing the documents but other >> replicas have them, but I do have a theory about why they are only in >> shard1. Testing that theory will involve obtaining some information >> from your system: >> >> What is the router on the restored collection? You can see this in the >> admin UI by going to Cloud->Tree, opening "collections", and clicking on >> the collection. In the right-hand side, there will be some info from >> zookeeper, with some JSON below it that should mention the router. I >> suspect that the router on the new collection may have been configured >> as implicit, instead of compositeId. >> >> Thanks, >> Shawn >> >> >
Reload schema or configs failed then drop index, can not recreate that index.
Hi all, Here's my situation: In cloud mode. 1. I created a collection called "test" and then modified the managed-schemaI got an error as shown in picture 2. 2. To get enough error message, I checked solr logs and get message shown in picture 3. 3. If I corrected the managed-schema, everything would be fine. But I dropped the index. The index couldn't be created it again, like picture 4. I restarted gptext using "gptext-start -r" and recreated the index, it was created successfully like picture 5.
Re: Reload schema or configs failed then drop index, can not recreate that index.
Sorry, wrong message. To correct. In cloud mode. 1. I created a collection called "test" and then modified the managed-schemaI, write something wrong, for example "id", then reload collection would failed. 2. Then I drop the collection "test" and delete configs form zookeeper. It works fine. The collection is removed both from zookeeper and hard disk. 3. Upload the right configs with the same name as before, try to create collection as name "test", it would failed and the error is "core with name '*' already exists". But actually not. 4. The restart the whole cluster, do the create again, everything works fine. I think when doing the delete collection, there's something still hold in somewhere not deleted. Please have a look Regards, Jerome On Wed, Nov 23, 2016 at 10:16 AM, Jerome Yang wrote: > Hi all, > > > Here's my situation: > > In cloud mode. > >1. I created a collection called "test" and then modified the >managed-schemaI got an error as shown in picture 2. >2. To get enough error message, I checked solr logs and get message >shown in picture 3. >3. If I corrected the managed-schema, everything would be fine. But I >dropped the index. The index couldn't be created it again, like picture 4. >I restarted gptext using "gptext-start -r" and recreated the index, it was >created successfully like picture 5. > >
Re: Solr 6 Performance Suggestions
Have you run IndexUpgrader? Index Format Changes Solr 6 has no support for reading Lucene/Solr 4.x and earlier indexes. Be sure to run the Lucene IndexUpgrader included with Solr 5.5 if you might still have old 4x formatted segments in your index. Alternatively: fully optimize your index with Solr 5.5 to make sure it consists only of one up-to-date index segment. Regards, Jerome On Tue, Nov 22, 2016 at 10:48 PM, Yonik Seeley wrote: > It depends highly on what your requests look like, and which ones are > slower. > If you're request mix is heterogeneous, find the types of requests > that seem to have the largest slowdown and let us know what they look > like. > > -Yonik > > > On Tue, Nov 22, 2016 at 8:54 AM, Max Bridgewater > wrote: > > I migrated an application from Solr 4 to Solr 6. solrconfig.xml and > > schema.xml are sensibly the same. The JVM params are also pretty much > > similar. The indicces have each about 2 million documents. No particular > > tuning was done to Solr 6 beyond the default settings. Solr 4 is running > in > > Tomcat 7. > > > > Early results seem to show Solr 4 outperforming Solr 6. The first shows > an > > average response time of 280 ms while the second averages at 430 ms. The > > test cases were exactly the same, the machines where exactly the same and > > heap settings exactly the same (Xms24g, Xmx24g). Requests were sent with > > Jmeter with 50 concurrent threads for 2h. > > > > I know that this is not enough information to claim that Solr 4 generally > > outperforms Solr 6. I also know that this pretty much depends on what the > > application does. So I am not claiming anything general. All I want to do > > is get some input before I start digging. > > > > What are some things I could tune to improve the numbers for Solr 6? Have > > you guys experienced such discrepancies? > > > > Thanks, > > Max. >
Re: Reload schema or configs failed then drop index, can not recreate that index.
It's solr 6.1, cloud mode. Please ignore the first message. Just take check my second email. I mean if I modify an existing collections's managed-schema and the modification makes reload collection failed. Then I delete the collection, and delete the configs from zookeeper. After that upload an configs as the same name as before, and the managed-schema is the not modified version. Then recreate the collection, it will throw an error, "core already exists". But actually it's not. After restart the whole cluster, recreate collection will success. Regards, Jerome On Wed, Nov 23, 2016 at 3:26 PM, Erick Erickson wrote: > The mail server is pretty heavy-handed at deleting attachments, none of > your > (presumably) screenshots came through. > > You also haven't told us what version of Solr you're using. > > Best, > Erick > > On Tue, Nov 22, 2016 at 6:25 PM, Jerome Yang wrote: > > Sorry, wrong message. > > To correct. > > > > In cloud mode. > > > >1. I created a collection called "test" and then modified the > >managed-schemaI, write something wrong, for example > >"id", then reload collection would failed. > >2. Then I drop the collection "test" and delete configs form > zookeeper. > >It works fine. The collection is removed both from zookeeper and hard > disk. > >3. Upload the right configs with the same name as before, try to > create > >collection as name "test", it would failed and the error is "core > with name > >'*' already exists". But actually not. > >4. The restart the whole cluster, do the create again, everything > works > >fine. > > > > > > I think when doing the delete collection, there's something still hold in > > somewhere not deleted. > > Please have a look > > > > Regards, > > Jerome > > > > On Wed, Nov 23, 2016 at 10:16 AM, Jerome Yang wrote: > > > >> Hi all, > >> > >> > >> Here's my situation: > >> > >> In cloud mode. > >> > >>1. I created a collection called "test" and then modified the > >>managed-schemaI got an error as shown in picture 2. > >>2. To get enough error message, I checked solr logs and get message > >>shown in picture 3. > >>3. If I corrected the managed-schema, everything would be fine. But I > >>dropped the index. The index couldn't be created it again, like > picture 4. > >>I restarted gptext using "gptext-start -r" and recreated the index, > it was > >>created successfully like picture 5. > >> > >> >
Re: SolrCloud -Distribued Indexing
Hi, 1. You can usr solr collections api to create collection with "*implicit*" router. Please check, CREATE https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1 2. There's several ways to indicate which collection you want send request to. a> setDefaultCollection b> sendRequest(SolrRequest request, String collection) Please check https://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html Regards, Jerome On Wed, Nov 23, 2016 at 6:43 PM, Udit Tyagi wrote: > Hi, > > I am a solr user, I am using solr-6.3.0 version, I have some doubts for > Distributed indexing and sharding in SolrCloud pease clarify, > > 1. How can I index documents to a specific shard(I heard about document > routing not documentation is not proper for that). > > I am using solr create command from terminal to create collection i don't > have any option to specify router name while creating collection from > terminal so how can i implement implicit router for my collection. > > 2.In documentation of Solr-6.3.0 for client API solrj the way to connect to > solrcloud is specified as > > String zkHostString = "zkServerA:2181,zkServerB:2181,zkServerC:2181/solr"; > SolrClient solr = new > CloudSolrClient.Builder().withZkHost(zkHostString).build(); > > please update documentation or reply back how can i specify collection name > to > query after connecting to zookeeper. > > Any help will be appreciated,Thanks > > Regards, > Udit Tyagi >
Re: Reload schema or configs failed then drop index, can not recreate that index.
Thanks Erick! On Fri, Nov 25, 2016 at 1:38 AM, Erick Erickson wrote: > This is arguably a bug. I raised a JIRA, see: > > https://issues.apache.org/jira/browse/SOLR-9799 > > Managed schema is not necessary to show this problem, generically if > you upload a bad config by whatever means, then > RELOAD/DELETE/correct/CREATE it fails. The steps I outlined > in the JIRA force the same replica to be created on the same Solr instance > to insure it can be reproduced at will. > > In the meantime, you can keep from having to restart Solr by: > - correcting the schema > - pushing it to Zookeeper (managed schema API does this for you) > - RELOAD the collection (do NOT delete it first). > > Since you can just RELOAD, I doubt this will be a high priority though. > > Thanks for reporting! > Erick > > > On Wed, Nov 23, 2016 at 6:37 PM, Jerome Yang wrote: > > It's solr 6.1, cloud mode. > > > > Please ignore the first message. Just take check my second email. > > > > I mean if I modify an existing collections's managed-schema and the > > modification makes reload collection failed. > > Then I delete the collection, and delete the configs from zookeeper. > > After that upload an configs as the same name as before, and the > > managed-schema is the not modified version. > > Then recreate the collection, it will throw an error, "core already > > exists". But actually it's not. > > After restart the whole cluster, recreate collection will success. > > > > Regards, > > Jerome > > > > > > On Wed, Nov 23, 2016 at 3:26 PM, Erick Erickson > > > wrote: > > > >> The mail server is pretty heavy-handed at deleting attachments, none of > >> your > >> (presumably) screenshots came through. > >> > >> You also haven't told us what version of Solr you're using. > >> > >> Best, > >> Erick > >> > >> On Tue, Nov 22, 2016 at 6:25 PM, Jerome Yang wrote: > >> > Sorry, wrong message. > >> > To correct. > >> > > >> > In cloud mode. > >> > > >> >1. I created a collection called "test" and then modified the > >> >managed-schemaI, write something wrong, for example > >> >"id", then reload collection would failed. > >> >2. Then I drop the collection "test" and delete configs form > >> zookeeper. > >> >It works fine. The collection is removed both from zookeeper and > hard > >> disk. > >> >3. Upload the right configs with the same name as before, try to > >> create > >> >collection as name "test", it would failed and the error is "core > >> with name > >> >'*' already exists". But actually not. > >> >4. The restart the whole cluster, do the create again, everything > >> works > >> >fine. > >> > > >> > > >> > I think when doing the delete collection, there's something still > hold in > >> > somewhere not deleted. > >> > Please have a look > >> > > >> > Regards, > >> > Jerome > >> > > >> > On Wed, Nov 23, 2016 at 10:16 AM, Jerome Yang > wrote: > >> > > >> >> Hi all, > >> >> > >> >> > >> >> Here's my situation: > >> >> > >> >> In cloud mode. > >> >> > >> >>1. I created a collection called "test" and then modified the > >> >>managed-schemaI got an error as shown in picture 2. > >> >>2. To get enough error message, I checked solr logs and get > message > >> >>shown in picture 3. > >> >>3. If I corrected the managed-schema, everything would be fine. > But I > >> >>dropped the index. The index couldn't be created it again, like > >> picture 4. > >> >>I restarted gptext using "gptext-start -r" and recreated the > index, > >> it was > >> >>created successfully like picture 5. > >> >> > >> >> > >> >
DIH doucments not indexed because of loss in xsl transformation.
Hello I'm indexing xml files with xpathEntityProcessor, and for some hundreads documents on 12 millions are not processed. When I tried to index only one of the KO documents it doesn't either index. So it's not a matter of big number of documents. We tried to do the xslt transformation externaly, to catch the xml transformed and to index it in SOLR, it worked. So the doc seems OK. I looked on the doc, it was big, so I commented a part, it has been indexed in solr with xsl transform. So I downloaded the dih code and I debugged the execution of these lines, which launch the xsl transformation, to see what was happening exactly SimpleCharArrayReader caw = new SimpleCharArrayReader(); xslTransformer.transform(new StreamSource(data), new StreamResult(caw)); data = caw.getReader(); It appeared that the caw missed data, so the xsltTransformer didn't work correctly. Digging further in TransformerImpl code, I see the content of my xml file in some buffer but somewhere something goes wrong, that I don't understand ( it's getting very tricky for me). xslTransformer is from class com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl Is there a mean to change the xslt transformer class, or is there a known limitation of size in this xmltransformer, which can be increased? I've work in solr 4.2 and then in solr 4.6. Thank in advance Regards JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Astérix à la BnF ! - du 16 octobre 2013 au 19 janvier 2014 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
using facet enum et fc in the same query.
Hello, I have a solr index (12 M docs, 45Go) with facets, and I'm trying to improve facet queries performances. 1/ I tried to use docvalue on facet fields, it didn't work well 2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 15s to 2s for longest queries) 3/ I'm trying to use facet.method=enum. It's supposed to improve the performance for facets fileds with few differents values. (type of documents, things like that) My problem is that I don't know if there is a way to specifiy enum method for some facets (3 to 5000 different values), and fc method the some others (up to 12M different values) and the same query? Is it possible with something like MyFacet..facet.method=enum ? Thanks in advance for the answer. --- JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement.
RE: RE: using facet enum et fc in the same query.
First Thanks very much for your answers, and Alan's one >> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to improve facet queries performances. >> 1/ I tried to use docvalue on facet fields, it didn't work well > That was surprising, as the normal result of switching to DocValues is positive. Can you elaborate on what you did and how it failed? When I said it failed, I just meant I was a little bit slower. >> 2/ I tried facet.threads=-1 in my queries, and worked perfectely (from more >> 15s to 2s for longest queries) > That tells us that your primary problem is not IO. If your usage is normally single-threaded > that can work, but it also means that you have a lot of CPU cores standing idle most of the > time. How many fields are you using for faceting and how many of them are large (more unique > values than the 5000 you mention)? The "slow" request corresponds to our website search query. It for our book catalog: some facets are for type of documents, author, title subjets, location of the book, dates... In this request we have now 35 facets. About unique value, for the "slow" query: 1 facet goes up to 4M unique values (authors), 1 facet has 250.000 uniques values 1 have 5 1 have 6700 4 have between 300 and 1000 5 have between 100 and 160 16 have less than 65 >> 3/ I'm trying to use facet.method=enum. It's supposed to improve the >> performance for facets fileds with few differents values. (type of >> documents, things like that) > Having a mix of facet methods seems like a fine idea, although my personal experience is that > enums gets slower than fc quite earlier than the 5000 unique values mark. As Alan states, > the call is f.myfacetfield.facet.method=enum (Remember the 'facet.'-part. See > https://wiki.apache.org/solr/SimpleFacetParameters#Parameters >for details). >Or you could try Sparse Faceting (Disclaimer: I am the author), which seems to fit your setup >very well: http://tokee.github.io/lucene-solr/ Right now we use solr 4.6, and we soon deliver our relsease, and I'm afraid I won't have time to try this time, but I can try for next release (next month I think). Thanks very much again Jerome Dupont jerome.dupont_at#bnf.fr Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement.
[SOLR 4.4 or 4.2] indexing with dih and solrcloud
Hello, I'm trying to index documents with Data import handler and solrcloud at the same time. (huge collection, need to make parallel indexing) First I had a dih configuration whichs works with solr standalone. (Indexing for two month every week) I've transformed my configuration to "cloudify" it with one shard at the begining (adding config file + launching with zkrun option) I see my solr admin interface with the cloud panels (tree view, 1 shard connected and active ...), so it seems to work. When I indexusing DIH, it looks like it was working, the entry xml files are read but no documents are stored in the index, exactly as I would have put commit argument to false. This is the answer of dih request { "responseHeader":{ "status":0, "QTime":32871}, "initArgs":[ "defaults",[ "config","mnb-data-config.xml"]], "command":"full-import", "mode":"debug", "documents":[], "verbose-output":[ "entity:noticebib",[ "entity:processorDocument",[], ... "entity:processorDocument",[], null,"--- row #1-", "CHEMINRELATIF","3/7/000/37000143.xml", null,"-", ... "status":"idle", "importResponse":"", "statusMessages":{ "Total Requests made to DataSource":"16", "Total Rows Fetched":"15", "Total Documents Skipped":"0", "Full Dump Started":"2013-08-29 12:08:48", "Total Documents Processed":"0", "Time taken":"0:0:32.684"}, In the logs (see above), I see PRE_UPDATE FINISH message And after, some debug messages about "Could not retrieve configuration" coming from zookeeper. So my question, what can be wrong in my config? _ something about synchro in zookeeper (could not retrieve message) _ A step missing in data import handler I don't see how to diagnose that point? DEBUG 2013-08-29 12:09:21,411 http-8080-1 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/000/37000190.xml DEBUG 2013-08-29 12:09:21,520 http-8080-1 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/000/37000190.xml DEBUG 2013-08-29 12:09:21,520 http-8080-1 fr.bnf.solr.BnfDateTransformer (696) - NN=37000190 INFO 2013-08-29 12:09:21,520 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (267) - Time taken = 0:0:32.684 DEBUG 2013-08-29 12:09:21,536 http-8080-1 org.apache.solr.update.processor.LogUpdateProcessor (178) - PRE_UPDATE FINISH {{params (optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults (config=mnb-data-config.xml)}} INFO 2013-08-29 12:09:21,536 http-8080-1 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/dataimportMNb params= {optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5} {} 0 32871 DEBUG 2013-08-29 12:09:21,583 http-8080-1 org.apache.solr.servlet.SolrDispatchFilter (388) - Closing out SolrRequest: {{params (optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults (config=mnb-data-config.xml)}} DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 SyncThread:0 org.apache.zookeeper.server.FinalRequestProcessor (88) - Processing request:: sessionid:0x140c98bbe43 type:getData cxid:0x39d zxid:0xfffe txntype:unknown reqpath:/overseer_elect/leader DEBUG 2013-08-29 12:09:21,833 SyncThread:0 org.apache.zookeeper.server.FinalRequestProcessor (160) - sessionid:0x140c98bbe43 type:getData cxid:0x39d zxid:0xfffe txntype:unknown reqpath:/overseer_elect/leader DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion PS: At the begining, I was in solr 4.2.1 and I tried with 4.0.0, but I have the same problem. Re
Re :Re: [SOLR 4.4 or 4.2] indexing with dih and solrcloud
Hello again Finally, I found the problem. It seems that _ The indexation request was done with an http GET and not with POST, because I was lauching it from a favorite in my navigator. Launching indexation on my documents by the admin interface made indexation work. _ Antoher problem was that some documents are not indexed (in particular the firsts of the list) for some reason (due to our configuration), So when I was trying on the ten first documents, it couldn't owrk. Now I will try with 2 shards... Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
solr cloud and DIH, indexation runs only on one shard.
Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The other with the second shard. (port 9180) In my admin interface, I see 2 shards, each one is leader When I launch the dih, documents are indexed. But only the shard1 is working. http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-import&entity=noticebib&optimize=true&indent=true&clean=true&commit=true&verbose=false&debug=false&wt=json&rows=1000 In my first shard, I see messages coming from my indexation process: DEBUG 2013-09-03 11:48:57,801 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002118.xml DEBUG 2013-09-03 11:48:57,832 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer (696) - NN=37002120 In the second instance, I just have this kind of logs, at it was receiving notifications from zookeeper of new updates INFO 2013-09-03 11:48:57,323 http-9180-7 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/update params= {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/&update.distrib=TOLEADER&wt=javabin&version=2} {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 (1445149264891084800), 37001819 (1445149264896327680), 37001837 (1445149264900521984), 37001861 (1445149264903667712), 37001869 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 I supposed there was a confusion between cores names and collection name, and I tried to change the name of the collection, but it solved nothing. When I come to dih interfaces, in shard1, I see indexation processing, and on shard 2 "no information available" Is there something specia to do to distributre indexation process? Should I run zookeeper on both instances (even if it's not mandatory? ... Regards Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
Re: solr cloud and DIH, indexation runs only on one shard.
It works I've done what you said: _ In my request to get list of documents, I add a where clause filtering on the select getting the documents to index: where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}'" _ And I called my dih on each shard with the parameter suffixeNotice=2 or suffixeNotice=1 Each shard indexed its part on the same time. (more or less 1000 do each one). When I execute a select on the collection, I get more or less 2000 documents. No my goad is to merge indexes, but that's another story. Another possiblity would have been to play with rows and start parameters, but it supooses 2 things _ to know the number of documents _ add an order by clause to make sure the subsets of document are disjoints (and even in that case, I'm not completly sure, because the source database can change) Thanks very much !! JerÎme Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
[DIH] Logging skipped documents
Hello, I have a question, I index documents and a small part them are skipped, (I am in onError="skip" mode) I'm trying to get a list of them, in order to analyse what's worng with these documents Is there a mean to get the list of skipped documents, and some more information (my onError="skip" is in an XPathEntityProcessor, the name of the file processed would be OK) Cordialement, --- JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à l'environnement.
error while indexing huge filesystem with data import handler and FileListEntityProcessor
Hello, We are trying to use data import handler and particularly on a collection which contains many file (one xml per document) Our configuration works for a small amount of files, but dataimport fails with OutofMemory Error when running it on 10M files (in several directories...) This is it the content of our config.xml: When we try it on a directory which contains 10 subdirectoies each subdir containing 1000 subdirectories, each one containing 1000 xml files (10M files, so), indexation process doesn't work anymore, We have a java.outofmemory excpetion (even with 512 Mo and 1GB memory) ERROR 2013-05-24 15:26:25,733 http-9145-2 org.apache.solr.handler.dataimport.DataImporter (96) - Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to java.lang.Exception at org.apache.solr.handler.dataimport.DocBuilder.execute (DocBuilder.java:266) at org.apache.solr.handler.dataimport.DataImporter.doFullImport (DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd (DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody (DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) Monitoring the jvm with visualvm, I've seen that most of time is taken by the method FileListEntityProcessor.accept (called by getFolderFiles), so I assumed that the error occured when filling list of files to be indexed: Indeed the list of files is done by this method which called by getFolderFiles. Basically, the list of files to index is done by getFolderFiles, itself called at first call to nextRow(). The indexation itself starts only after that. org/apache/solr/handler/dataimport/FileListEntityProcessor.java   private void [More ...] getFolderFiles(File dir, final List> fileDetails) { I found back the variable fileDetails which contains the list of my xml files. It contains 611345 entries (for approximatively 500 Mo of memory). And I have 10M xml files (more or less...). That why I think it's not finished yet. To get the entire list I guess I need something between 5 and 10 Go for my process. So I have several questions : _ Is it possible to have severalFileListEntityProcessor attached to only one XPathEntityProcessor in the data-config.xml : Like this I can do it in ten times, with my 10 directories of first level. _ Is there a roadmap to optimize this method, for example by not doing the list of all file in the first time, but each 1000 documents, for instance? _ Or to store the file list in a temporary file in order to save some memory? Regards, --- JérÎme Dupont --- Exposition Jean de Gonet, relieur - jusqu'au 21 juillet 2013 - BnF - François-Mitterrand / Galerie François 1 er Jean de Gonet dédicacera le catalogue de l'exposition le samedi 25 mai de 16h30 à 18 heures à l'entrée de l'exposition. Avant d'imprimer, pensez à l'environnement.
Re: Re: error while indexing huge filesystem with data import handler and FileListEntityProcessor
The configuraiton works with LineEntityProcessor, with few documents (havn (t test with many documents yet. For information this the config ... fields defintion file:///D:/jed/noticesBib/listeNotices.txt contains the follwing lines jed/noticesBib/3/4/307/34307035.xml jed/noticesBib/3/4/307/34307082.xml jed/noticesBib/3/4/307/34307110.xml jed/noticesBib/3/4/307/34307197.xml jed/noticesBib/3/4/307/34307350.xml jed/noticesBib/3/4/307/34307399.xml ... (Could have containes all the location with the beginning, but I wanted to test the concatenation of filename. That works fine, thanks for the help!! Next step, the same without using a file. (I'll write it in another post). Regards, JérÎme Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
[DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/3001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. "statusMessages":{ "Total Requests made to DataSource":"0", "Total Rows Fetched":"0", "Total Documents Processed":"0", "Total Documents Skipped":"0", "":"Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.", "Committed":"2013-05-30 10:23:30", "Optimized":"2013-05-30 10:23:30", And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: I'm trying to inde Cordialement, --- JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hi, Thanks for your anwser, it made me go ahead. The name of the entity was not good, not consistent with schema Now the first entity works fine: the query is done to the database and returns the good result. The problem is that the second entity, which is a XPathEntityProcessor entity, doesn't read the file specified in url attribute, but tries to execute it as an sql query on my database. I tried to put a fake query (select 1 from dual) but it changes nothing. It's like the XPathEntityProcessor entity behaved like an SqlEntityProcessor, using url attribute instead of query attrbute. I've forgotten to say which version I use: SOLR 4.2.1 (can be changed, it's just the beginning of the developpement) See next the config, and the return message: The verbose output: "verbose-output":[ "entity:noticebib",[ "query","select DISTINCT SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' ||to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebibwhere numnoticebib = '3001'", "time-taken","0:0:0.141", null,"--- row #1-", "CHEMINRELATIF","3/0/000/3001.xml", null,"-", "entity:processorDocument",[ "document#1",[ "query","file:///D:/jed/noticesbib/3/0/000/3001.xml", "EXCEPTION","org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: file:///D:/jed/noticesbib/3/0/000/3001.xml Processing Document # 1\r\n\tat org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow (DataImportHandlerException.java:71)\r\n\tat ... oracle.jdbc.driver.OracleStatementWrapper.execute (OracleStatementWrapper.java:1203)\r\n\tat org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator. (JdbcDataSource.java:246)\r\n\t... 32 more\r\n", "time-taken","0:0:0.124", This is the configuration Cordialement, --- JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
RE: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Thanks very much, it works, with dataSource (capital S) !!! Finally, I didn't have to define a "CHEMINRELATIF" field in the configuration, it's working without it. This is the definive working configuration: Thanks again! --- JérÎme Dupont BibliothÚque Nationale de France Département des SystÚmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Weird behaviour with phrase queries
Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type "text" which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: I use the following request handler: explicit 0.01 meta_text meta_text 1<1 2<-1 5<-2 7<60% 100 *:* Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing "é" and "Ú" chars like "académie" or "ingénieur". As you will see the filter used in the "text" type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- JérÎme
Re: Weird behaviour with phrase queries
Hi Em, Erick thanks for your feedback. Em : yes Here is the stopwords.txt I use : - http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson wrote: > Try submitting your query from the admin page with &debugQuery=on and see > if that helps. The output is pretty dense, so feel free to cut-paste the > results for > help. > > Your stemmers have English as the language, which could also be > "interesting". > > Yes, I noticed that this will be fixed. > As Em says, the analysis page may help here, but I'd start by taking out > WordDelimiterFilterFactory, SnowballPorterFilterFactory and > StopFilterFactory > and build back up if you really need them. Although, again, the analysis > page > that's accessible from the admin page may help greatly (check "debug" in > both > index and query). > > You will find attached two xml files one with no results (noresult.xml.gz) and one with a lot of results (withresults.xml.gz). You will also find attached two screenshots showing there is a highlighted section in the "Index analyzer" section when analysing text. > Oh, and you MUST re-index after changing your schema to have a true test. > > Yes, the problem is that reindexing takes around 12 hours which makes it really hard for testing :/ Thanks in advance for your feedback. Best Regards, -- JérÎme noresult.xml.gz Description: GNU Zip compressed data withresults.xml.gz Description: GNU Zip compressed data
Re: Weird behaviour with phrase queries
Erick, On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson wrote: > Hmmm, I don't see any screen shots. Several things: > 1> If your stopword file has comments, I'm not sure what the effect would > be. > Ha, I thought comments were supported in stopwords.txt > 2> Something's not right here, or I'm being fooled again. Your withresults > xml has this line: > +DisjunctionMaxQuery((meta_text:"ecol d > ingenieur")~0.01) () > and your noresults has this line: > +DisjunctionMaxQuery((meta_text:"academi > charpenti")~0.01) DisjunctionMaxQuery((meta_text:"academi > charpenti"~100)~0.01) > > the empty () in the first one often means you're NOT going to your > configured dismax parser in solrconfig.xml. Yet that doesn't square with > your custom qt, so I'm puzzled. > > Could we see your raw query string on the way in? It's almost as if you > defined qt in one and defType in the other, which are not equivalent. > You are right I fixed this problem (my bad). 3> It may take 12 hours to index, but you could experiment with a smaller > subset. You say you know that the noresults one should return documents, > what proof do > you have? If there's a single document that you know should match this, > just > index it and a few others and you should be able to make many runs until > you > get > to the bottom of this... > > I could but I always thought I had to fully re-index after updating schema.xml. If I update only few documents will that take the changes into account without breaking the rest ? > And obviously your stemming is happening on the query, are you sure it's > happening at index time too? > > Since you did not get the screenshots you will find attached the full output of the analysis for a phrase that works and for another that does not. Thanks for your support Best Regards, -- JérÎme analysis-noresults.html.gz Description: GNU Zip compressed data analysis-withresults.html.gz Description: GNU Zip compressed data
Re: Weird behaviour with phrase queries
Hi Erick, On Tue, Jan 25, 2011 at 1:38 PM, Erick Erickson wrote: > Frankly, this puzzles me. It *looks* like it should be OK. One warning, the > analysis page sometimes is a bit misleading, so beware of that. > > But the output of your queries make it look like the query is parsing as > you > expect, which leaves the question of whether your index contains what > you think it does. You might get a copy of Luke, which allows you to > examine > what's actually in your index instead of what you think is in there. > Sometimes > there are surprises here! > > Bingo ! Some data were not in the index. Indexing them obviously fixed the problem. > I didn't mean to re-index your whole corpus, I was thinking that you could > just index a few documents in a test index so you have something small to > look at. > > Sorry I can't spot what's happening right away. > > No worries, thanks for your support :) -- JérÎme
Data not always returned
Hi all, I have a problem with my index. Even though I always index the same data over and over again, whenever I try a couple of searches (they are always the same as they are issued by a unit test suite) I do not get the same results, sometimes I get 3 successes and 2 failures and sometimes it is the other way around it is unpredictable. Here is what I am trying to do: I created a new Solr core with its specific solrconfig.xml and schema.xml This core stores a list of towns which I plan to use with an auto-suggestion system, using ngrams (no Suggester) The indexing process is always the same : 1. the import script deletes all documents in the core : *:* and 2. the import script fetches date from postgres, 100 rows at a time 2. the import script adds these 100 documents and sends a 3. once all the rows (around 40 000) have been imported the script send an query Here is what happens: I run the indexer once and search for 'foo' I get results I expect but if I search for 'bar' I get nothing I reindex once again and search for 'foo' I get nothing, but if I search for 'bar' I get results The search is made on the "name" field which is a pretty common TextField with ngrams. I tried to physically remove the index (rm -rf path/to/index) and reindex everything as well and not all searches work, sometimes the 'foo' search work, sometimes the 'bar' one. I tried a lot of differents things but now I am running out of ideas. This is why I am asking for help. Some useful informations : Solr version : 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Java 1.5.0_24 on Mac Os X solrconfig.xml and schema.xml are attached Thanks in advance for your help. schema.xml.gz Description: GNU Zip compressed data solrconfig.xml.gz Description: GNU Zip compressed data
Re: Data not always returned
Hi Erick On Tue, Jun 7, 2011 at 11:42 PM, Erick Erickson wrote: > Well, this is odd. Several questions > > 1> what do your logs show? I'm wondering if somehow some data is getting >   rejected. I have no idea why that would be, but if you're seeing indexing >   exceptions that would explain it. > 2> on the admin/stats page, are maxDocs and numDocs the same in the success >   /failure case? And are they equal to 40,000? > 3> what does &debugQuery=on show in the two cases? I'd expect it to be > identical, but... > 4> admin/schema browser. Look at your three fields and see if things > like unique-terms are >   identical. > 5> are the rows being returned before indexing in the same order? I'm > wondering if somehow >   you're getting documents overwritten by having the same id (uniqueKey). > 6> Have you poked around with Luke to see what, if anything, is dissimilar? > > These are shots in the dark, but my supposition is that somehow you're > not indexing what > you expect, the questions above might give us a clue where to look next. > You were right, I found a nasty problem with the indexer and postgres which prevented some documents to be indexed. Once I fixed this problem everything worked fine. Thanks a lot for your support. Best Regards, -- JérÎme
Issue with dataimport xml validation with dtd and jetty: conflict of use for user.dir variable
Hello, I use solr and dataimport to index xml files with a dtd. The dtd is referenced like this Previously we were using solr4 in a tomcat container. During the import process, solr tries to validate the xml file with the dtd. To find it we were defining -Duser.dir=pathToDtD and solr could find te dtd and validation was working Now, we are migrating to solr7 (and jetty embedded) When we start solr with -a "-Duser.dir=pathToDtd", solr doesn't start and returns an error: Cannot find jetty main class So I removed the a "-Duser.dir=pathToDtd" option, and solr starts. BUT Now solr cannot anymore open xml file, because it doesn't find the dtd during validation stage. Is there a way to: - activate an xml catalog file to indicate where the dtd is? (Seems it would be the better way, fat I didn't find how to do) - disable dtd validation Regards, --- JĂ©rĂŽme Dupont BibliothĂšque Nationale de France DĂ©partement des SystĂšmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 tĂ©lĂ©phone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Pass BnF lecture/culture : bibliothĂšques, expositions, confĂ©rences, concerts en illimitĂ© pour 15 ⏠/ an â Acheter en ligne Avant d'imprimer, pensez Ă l'environnement.
Re: Solr OpenNLP named entity extraction
Hi guys, In Solrcloud mode, where to put the OpenNLP models? Upload to zookeeper? As I test on solr 7.3.1, seems absolute path on local host is not working. And can not upload into zookeeper if the model size exceed 1M. Regards, Jerome On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe wrote: > Hi Alexey, > > First, thanks for moving the conversation to the mailing list. Discussion > of usage problems should take place here rather than in JIRA. > > I locally set up Solr 7.3 similarly to you and was able to get things to > work. > > Problems with your setup: > > 1. Your update chain is missing the Log and Run update processors at the > end (I see these are missing from the example in the javadocs for the > OpenNLP NER update processor; Iâll fix that): > > > > >The Log update processor isnât strictly necessary, but, from < > https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain > >: > >Do not forget to add RunUpdateProcessorFactory at the end of any >chains you define in solrconfig.xml. Otherwise update requests >processed by that chain will not actually affect the indexed data. > > 2. Your example document is missing an âidâ field. > > 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesnât > extract anything from text âThis is Steve Jobs 2â. It will extract âSteve > Jobsâ from text âThis is Steve Jobs in whiteâ e.g. though. > > 4. (Not a problem necessarily) You may want to use a multi-valued âstringâ > field for the âdestâ field in your update chain, e.g. âpeople_strâ (â*_strâ > in the default configset is so configured). > > -- > Steve > www.lucidworks.com > > > On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko > wrote: > > > > Hi once more I am trying to implement named entities extraction using > this > > manual > > > https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html > > > > I am modified solrconfig.xml like this: > > > > > >class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"> > > opennlp/en-ner-person.bin > > text_opennlp > > description_en > > content > > > > > > > > But when I was trying to add data using: > > > > *request:* > > > > POST > > > http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract > > > > This is Steve Jobs 2 > > This is text 2 > name="content">This is text for content 2 > > > > *response* > > > > > > > > > >0 > >3 > > > > > > > > But I don't see any data inserted to *content* field and in any other > field. > > > > *If you need some additional data I can provide it.* > > > > Can you help me? What have I done wrong? > > -- Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
Re: Solr OpenNLP named entity extraction
Thanks Steve! On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe wrote: > Hi Jerome, > > See the ref guide[1] for a writeup of how to enable uploading files larger > than 1MB into ZooKeeper. > > Local storage should also work - have you tried placing OpenNLP model > files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node. > > [1] > https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit > > -- > Steve > www.lucidworks.com > > > On Jul 9, 2018, at 12:50 AM, Jerome Yang wrote: > > > > Hi guys, > > > > In Solrcloud mode, where to put the OpenNLP models? > > Upload to zookeeper? > > As I test on solr 7.3.1, seems absolute path on local host is not > working. > > And can not upload into zookeeper if the model size exceed 1M. > > > > Regards, > > Jerome > > > > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe wrote: > > > >> Hi Alexey, > >> > >> First, thanks for moving the conversation to the mailing list. > Discussion > >> of usage problems should take place here rather than in JIRA. > >> > >> I locally set up Solr 7.3 similarly to you and was able to get things to > >> work. > >> > >> Problems with your setup: > >> > >> 1. Your update chain is missing the Log and Run update processors at the > >> end (I see these are missing from the example in the javadocs for the > >> OpenNLP NER update processor; Iâll fix that): > >> > >> > >> > >> > >> The Log update processor isnât strictly necessary, but, from < > >> > https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain > >>> : > >> > >> Do not forget to add RunUpdateProcessorFactory at the end of any > >> chains you define in solrconfig.xml. Otherwise update requests > >> processed by that chain will not actually affect the indexed data. > >> > >> 2. Your example document is missing an âidâ field. > >> > >> 3. For whatever reason, the pre-trained model "en-ner-person.bin" > doesnât > >> extract anything from text âThis is Steve Jobs 2â. It will extract > âSteve > >> Jobsâ from text âThis is Steve Jobs in whiteâ e.g. though. > >> > >> 4. (Not a problem necessarily) You may want to use a multi-valued > âstringâ > >> field for the âdestâ field in your update chain, e.g. âpeople_strâ > (â*_strâ > >> in the default configset is so configured). > >> > >> -- > >> Steve > >> www.lucidworks.com > >> > >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko < > alex1989s...@gmail.com> > >> wrote: > >>> > >>> Hi once more I am trying to implement named entities extraction using > >> this > >>> manual > >>> > >> > https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html > >>> > >>> I am modified solrconfig.xml like this: > >>> > >>> > >>> >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"> > >>>opennlp/en-ner-person.bin > >>>text_opennlp > >>>description_en > >>>content > >>> > >>> > >>> > >>> But when I was trying to add data using: > >>> > >>> *request:* > >>> > >>> POST > >>> > >> > http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract > >>> > >>> This is Steve Jobs 2 > >>> This is text 2 >>> name="content">This is text for content 2 > >>> > >>> *response* > >>> > >>> > >>> > >>> > >>> 0 > >>> 3 > >>> > >>> > >>> > >>> But I don't see any data inserted to *content* field and in any other > >> field. > >>> > >>> *If you need some additional data I can provide it.* > >>> > >>> Can you help me? What have I done wrong? > >> > >> > > > > -- > > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/> > > -- Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
Re: Solr OpenNLP named entity extraction
Hi Steve, Put models under " ${solr.solr.home}/lib/ " is not working. I check the "ZkSolrResourceLoader" seems it will first try to find modes in config set. If not find, then it uses class loader to load from resources. Regards, Jerome On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang wrote: > Thanks Steve! > > > On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe wrote: > >> Hi Jerome, >> >> See the ref guide[1] for a writeup of how to enable uploading files >> larger than 1MB into ZooKeeper. >> >> Local storage should also work - have you tried placing OpenNLP model >> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node. >> >> [1] >> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit >> >> -- >> Steve >> www.lucidworks.com >> >> > On Jul 9, 2018, at 12:50 AM, Jerome Yang wrote: >> > >> > Hi guys, >> > >> > In Solrcloud mode, where to put the OpenNLP models? >> > Upload to zookeeper? >> > As I test on solr 7.3.1, seems absolute path on local host is not >> working. >> > And can not upload into zookeeper if the model size exceed 1M. >> > >> > Regards, >> > Jerome >> > >> > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe wrote: >> > >> >> Hi Alexey, >> >> >> >> First, thanks for moving the conversation to the mailing list. >> Discussion >> >> of usage problems should take place here rather than in JIRA. >> >> >> >> I locally set up Solr 7.3 similarly to you and was able to get things >> to >> >> work. >> >> >> >> Problems with your setup: >> >> >> >> 1. Your update chain is missing the Log and Run update processors at >> the >> >> end (I see these are missing from the example in the javadocs for the >> >> OpenNLP NER update processor; Iâll fix that): >> >> >> >> >> >> >> >> >> >> The Log update processor isnât strictly necessary, but, from < >> >> >> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain >> >>> : >> >> >> >> Do not forget to add RunUpdateProcessorFactory at the end of any >> >> chains you define in solrconfig.xml. Otherwise update requests >> >> processed by that chain will not actually affect the indexed >> data. >> >> >> >> 2. Your example document is missing an âidâ field. >> >> >> >> 3. For whatever reason, the pre-trained model "en-ner-person.bin" >> doesnât >> >> extract anything from text âThis is Steve Jobs 2â. It will extract >> âSteve >> >> Jobsâ from text âThis is Steve Jobs in whiteâ e.g. though. >> >> >> >> 4. (Not a problem necessarily) You may want to use a multi-valued >> âstringâ >> >> field for the âdestâ field in your update chain, e.g. âpeople_strâ >> (â*_strâ >> >> in the default configset is so configured). >> >> >> >> -- >> >> Steve >> >> www.lucidworks.com >> >> >> >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko < >> alex1989s...@gmail.com> >> >> wrote: >> >>> >> >>> Hi once more I am trying to implement named entities extraction using >> >> this >> >>> manual >> >>> >> >> >> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html >> >>> >> >>> I am modified solrconfig.xml like this: >> >>> >> >>> >> >>> > >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"> >> >>>opennlp/en-ner-person.bin >> >>>text_opennlp >> >>>description_en >> >>>content >> >>> >> >>> >> >>> >> >>> But when I was trying to add data using: >> >>> >> >>> *request:* >> >>> >> >>> POST >> >>> >> >> >> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract >> >>> >> >>> This is Steve Jobs 2 >> >>> This is text 2> >>> name="content">This is text for content 2 >> >>> >> >>> *response* >> >>> >> >>> >> >>> >> >>> >> >>> 0 >> >>> 3 >> >>> >> >>> >> >>> >> >>> But I don't see any data inserted to *content* field and in any other >> >> field. >> >>> >> >>> *If you need some additional data I can provide it.* >> >>> >> >>> Can you help me? What have I done wrong? >> >> >> >> >> > >> > -- >> > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/> >> >> > > -- > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/> > > -- Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
Re: Solr OpenNLP named entity extraction
Thanks a lot Steve! On Wed, Jul 11, 2018 at 10:24 AM Steve Rowe wrote: > Hi Jerome, > > I was able to setup a configset to perform OpenNLP NER, loading the model > files from local storage. > > There is a trick though[1]: the model files must be located *in a jar* or > *in a subdirectory* under ${solr.solr.home}/lib/ or under a directory > specified via a solrconfig.xml directive. > > I tested with the bin/solr cloud example, and put model files under the > two solr home directories, at example/cloud/node1/solr/lib/opennlp/ and > example/cloud/node1/solr/lib/opennlp/. The âopennlp/â subdirectory is > required, though its name can be anything else you choose. > > [1] As you noted, ZkSolrResourceLoader delegates to its parent classloader > when it canât find resources in a configset, and the parent classloader is > set up to load from subdirectories and jar files under > ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml > directive. These directories themselves are not included in the set > of directories from which resources are loaded; only their children are. > > -- > Steve > www.lucidworks.com > > > On Jul 9, 2018, at 10:10 PM, Jerome Yang wrote: > > > > Hi Steve, > > > > Put models under " ${solr.solr.home}/lib/ " is not working. > > I check the "ZkSolrResourceLoader" seems it will first try to find modes > in > > config set. > > If not find, then it uses class loader to load from resources. > > > > Regards, > > Jerome > > > > On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang wrote: > > > >> Thanks Steve! > >> > >> > >> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe wrote: > >> > >>> Hi Jerome, > >>> > >>> See the ref guide[1] for a writeup of how to enable uploading files > >>> larger than 1MB into ZooKeeper. > >>> > >>> Local storage should also work - have you tried placing OpenNLP model > >>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each > node. > >>> > >>> [1] > >>> > https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit > >>> > >>> -- > >>> Steve > >>> www.lucidworks.com > >>> > >>>> On Jul 9, 2018, at 12:50 AM, Jerome Yang wrote: > >>>> > >>>> Hi guys, > >>>> > >>>> In Solrcloud mode, where to put the OpenNLP models? > >>>> Upload to zookeeper? > >>>> As I test on solr 7.3.1, seems absolute path on local host is not > >>> working. > >>>> And can not upload into zookeeper if the model size exceed 1M. > >>>> > >>>> Regards, > >>>> Jerome > >>>> > >>>> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe wrote: > >>>> > >>>>> Hi Alexey, > >>>>> > >>>>> First, thanks for moving the conversation to the mailing list. > >>> Discussion > >>>>> of usage problems should take place here rather than in JIRA. > >>>>> > >>>>> I locally set up Solr 7.3 similarly to you and was able to get things > >>> to > >>>>> work. > >>>>> > >>>>> Problems with your setup: > >>>>> > >>>>> 1. Your update chain is missing the Log and Run update processors at > >>> the > >>>>> end (I see these are missing from the example in the javadocs for the > >>>>> OpenNLP NER update processor; Iâll fix that): > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> The Log update processor isnât strictly necessary, but, from < > >>>>> > >>> > https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain > >>>>>> : > >>>>> > >>>>> Do not forget to add RunUpdateProcessorFactory at the end of any > >>>>> chains you define in solrconfig.xml. Otherwise update requests > >>>>> processed by that chain will not actually affect the indexed > >>> data. > >>>>> > >>>>> 2. Your example document is missing an âidâ field. > >>>>> > >>>>> 3. For whatever reason, the pre-trained model "en-ner-person.bin&
Solr 1.3 query and index perf tank during optimize
Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance running inside tomcat 6, so no replication. Merge factor is the default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. autoCommit is set at 3 sec. We continually push new data into the index, at somewhere between 1-10 docs every 10 sec or so. Solr is running on a quad-core 3.0GHz server. under IBM java 1.6. The index is sitting on a local 15K scsi disk. There's nothing else of substance running on the box. Optimizing the index takes about 65 min. As long as I'm not optimizing, search and indexing times are satisfactory. When I start the optimize, I see massive problems with timeouts pushing new docs into the index, and search times balloon. A typical search while optimizing takes about 1 min instead of a few seconds. Can anyone offer me help with fixing the problem? Thanks, Jerry Quinn
Re: Solr 1.3 query and index perf tank during optimize
Mark Miller wrote on 11/12/2009 07:18:03 PM: > Ah, the pains of optimization. Its kind of just how it is. One solution > is to use two boxes and replication - optimize on the master, and then > queries only hit the slave. Out of reach for some though, and adds many > complications. Yes, in my use case 2 boxes isn't a great option. > Another kind of option is to use the partial optimize feature: > > > > Using this, you can optimize down to n segments and take a shorter hit > each time. Is this a 1.4 feature? I'm planning to migrate to 1.4, but it'll take a while since I have to port custom code forward, including a query parser. > Also, if optimizing is so painful, you might lower the merge factor > amortize that pain better. Thats another way to slowly get there - if > you lower the merge factor, as merging takes place, the new merge factor > will be respected, and semgents will merge down. A merge factor of 2 > (the lowest) will make it so you only ever have 2 segments. Sometimes > that works reasonably well - you could try 3-6 or something as well. > Then when you do your partial optimizes (and eventually a full optimize > perhaps), you want have so far to go. So this will slow down indexing but speed up optimize somewhat? Unfortunately right now I lose docs I'm indexing, as well slowing searching to a crawl. Ugh. I've got plenty of CPU horsepower. This is where having the ability to optimize on another filesystem would be useful. Would it perhaps make sense to set up a master/slave on the same machine? Then I suppose I can have an index being optimized that might not clobber the search. Would new indexed items still be dropped on the floor? Thanks, Jerry
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot parts of the index being lost by that extra IO activity. > Of course the latter would lead to the former, but without that OS > disk cache, the searches may be too slow even w/o the extra IO. Is there a way to configure things so that search and new data indexing get cached under the control of solr/lucene? Then we'd be less reliant on the OS behavior. Alternatively if there are OS params I can tweak (RHEL/Centos 5) to solve the problem, that's an option for me. Would you know if 1.4 is better behaved than 1.3? Thanks, Jerry
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot parts of the index being lost by that extra IO activity. > Of course the latter would lead to the former, but without that OS > disk cache, the searches may be too slow even w/o the extra IO. On linux there's the ionice command to try to throttle processes. Would it be possible and make sense to have a separate process for optimizing that had ionice set it to idle? Can the index be shared this way? Thanks, Jerry
Re: Solr 1.3 query and index perf tank during optimize
Lance Norskog wrote on 11/13/2009 11:18:42 PM: > The 'maxSegments' feature is new with 1.4. I'm not sure that it will > cause any less disk I/O during optimize. It could still be useful to manage the "too many open files" problem that rears its ugly head on occasion. > The 'mergeFactor=2' idea is not what you think: in this case the index > is always "mostly optimized", so you never need to run optimize. > Indexing is always slower, because you amortize the optimize time into > little continuous chunks during indexing. You never stop indexing. You > should not lose documents. Is the space taken by deleted documents recovered in this case? Jerry
Re: Solr 1.3 query and index perf tank during optimize
Otis Gospodnetic wrote on 11/13/2009 11:15:43 PM: > Let's take a step back. Why do you need to optimize? You said: "As > long as I'm not optimizing, search and indexing times are satisfactory." :) > > You don't need to optimize just because you are continuously adding > and deleting documents. On the contrary! That's a fair question. Basically, search entries are keyed to other documents. We have finite storage, so we purge old documents. My understanding was that deleted documents still take space until an optimize is done. Therefore, if I don't optimize, the index size on disk will grow without bound. Am I mistaken? If I don't ever have to optimize, it would make my life easier. Thanks, Jerry
Plans for 1.3.1?
Hi, all. Are there any plans for putting together a bugfix release? I'm not looking for particular bugs, but would like to know if bug fixes are only going to be done mixed in with new features. Thanks, Jerry Quinn
Help with Solr 1.3 lockups?
Hi, all. I'm running solr 1.3 inside Tomcat 6.0.18. I'm running a modified query parser, tokenizer, highlighter, and have a CustomScoreQuery for dates. After some amount of time, I see solr stop responding to update requests. When crawling through the logs, I see the following pattern: Jan 12, 2009 7:27:42 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Jan 12, 2009 7:28:11 PM org.apache.solr.common.SolrException log SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@ce0f92b9:java.lang.OutOfMemoryError at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term (SegmentTermEnum.java:167) at org.apache.lucene.index.SegmentMergeInfo.next (SegmentMergeInfo.java:66) at org.apache.lucene.index.MultiSegmentReader$MultiTermEnum.next (MultiSegmentReader.java:492) at org.apache.lucene.search.FieldCacheImpl$7.createValue (FieldCacheImpl.java:267) at org.apache.lucene.search.FieldCacheImpl$Cache.get (FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getInts (FieldCacheImpl.java:245) at org.apache.solr.search.function.IntFieldSource.getValues (IntFieldSource.java:50) at org.apache.solr.search.function.SimpleFloatFunction.getValues (SimpleFloatFunction.java:41) at org.apache.solr.search.function.BoostedQuery$CustomScorer. (BoostedQuery.java:111) at org.apache.solr.search.function.BoostedQuery$CustomScorer. (BoostedQuery.java:97) at org.apache.solr.search.function.BoostedQuery $BoostedWeight.scorer(BoostedQuery.java:88) at org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC (SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC (SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.access$000 (SolrIndexSearcher.java:56) at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem (SolrIndexSearcher.java:260) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm (SolrIndexSearcher.java:1518) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1018) at java.util.concurrent.FutureTask$Sync.innerRun (FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:896) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:735) Jan 12, 2009 7:28:11 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run SEVERE: Socket accept failed Throwable occurred: java.lang.OutOfMemoryError at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:414) at java.net.ServerSocket.implAccept(ServerSocket.java:464) at java.net.ServerSocket.accept(ServerSocket.java:432) at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket (DefaultServerSocketFactory.java:61) at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run (JIoEndpoint.java:310) at java.lang.Thread.run(Thread.java:735) <<<> << Java dumps core and heap at this point >> <<<> Jan 12, 2009 7:28:21 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:938) at org.apache.solr.update.SolrIndexWriter. (SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter (UpdateHandler.java:122) at org.apache.solr.update.DirectUpdateHandler2.openWriter (DirectUpdateHandler2.java:167) at org.apache.solr.update.DirectUpdateHandler2.addDoc (DirectUpdateHandler2.java:221) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd (RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate (XmlUpdateRequestHandler.java:196) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute (
Re: I get SEVERE: Lock obtain timed out
Julian Davchev wrote on 01/20/2009 10:07:48 AM: > Julian Davchev > 01/20/2009 10:07 AM > > I get SEVERE: Lock obtain timed out > > Hi, > Any documents or something I can read on how locks work and how I can > controll it. When do they occur etc. > Cause only way I got out of this mess was restarting tomcat > > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed out: SingleInstanceLock: write.lock I've seen this with my customized setup. Before I saw the write.lock messages, I had an OutOfMemoryError, but the container didn't shut down. After that Solr spewed write lock messages and I had to restart. So, you might want to search backwards in your logs and see if you can find when the write lock problems and if there is some identifiable problem preceding that. Jerry Quinn
Re: Help with Solr 1.3 lockups?
Hi and thanks for looking at the problem ... Mark Miller wrote on 01/15/2009 02:58:24 PM: > Mark Miller > 01/15/2009 02:58 PM > > Re: Help with Solr 1.3 lockups? > > How much RAM are you giving the JVM? Thats running out of memory loading > a FieldCache, which can be a more memory intensive data structure. It > pretty much points to the JVM not having enough RAM to do what you want. > How many fields do you sort on? How many fields do you facet on? How > much RAM do you have available and how much have you given Solr? How > many documents are you working with? I'm using the stock tomcat and JVM settings. I see the VM footprint sitting at 877M right now. It hasn't locked up yet this time around. There are 2 fields we facet and 1 that we sort on. The machine has 16G of memory, and the index is currently sitting at 38G, though I haven't run an optimize in a while. There are about 1 million docs in the index, though we have 3 full copies of the data stored in different fields and processed in different ways. I do a commit every 10 docs or 3 seconds, whichever comes first. We're approximating real-time updating. The index is currently sitting on NFS, which I know isn't great for performance. I didn't think it could cause reliability issues though. > As far as rebooting a failed server, the best technique is generally > external. I would recommend a script/program on another machine that > hits the Solr instance with a simple query every now and again. If you > don't get a valid response within a reasonable amount of time, or after > a reasonable number of tries, fire off alert emails and issue a command > to that server to reboot the JVM. Or something to that effect. I suspect I'll add a watchdog, no matter what's causing the problem here. > However, you should figure out why you are running out of memory. You > don't want to use more resources than you have available if you can help it. Definitely. That's on the agenda :-) Thanks, Jerry > - Mark > > Jerome L Quinn wrote: > > Hi, all. > > > > I'm running solr 1.3 inside Tomcat 6.0.18. I'm running a modified query > > parser, tokenizer, highlighter, and have a CustomScoreQuery for dates. > > > > After some amount of time, I see solr stop responding to update requests. > > When crawling through the logs, I see the following pattern: > > > > Jan 12, 2009 7:27:42 PM org.apache.solr.update.DirectUpdateHandler2 commit > > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) > > Jan 12, 2009 7:28:11 PM org.apache.solr.common.SolrException log > > SEVERE: Error during auto-warming of > > > key:org.apache.solr.search.queryresult...@ce0f92b9:java.lang.OutOfMemoryError > > at org.apache.lucene.index.TermBuffer.toTerm (TermBuffer.java:122) > > at org.apache.lucene.index.SegmentTermEnum.term > > (SegmentTermEnum.java:167) > > at org.apache.lucene.index.SegmentMergeInfo.next > > (SegmentMergeInfo.java:66) > > at org.apache.lucene.index.MultiSegmentReader $MultiTermEnum.next > > (MultiSegmentReader.java:492) > > at org.apache.lucene.search.FieldCacheImpl$7.createValue > > (FieldCacheImpl.java:267) > > at org.apache.lucene.search.FieldCacheImpl$Cache.get > > (FieldCacheImpl.java:72) > > at org.apache.lucene.search.FieldCacheImpl.getInts > > (FieldCacheImpl.java:245) > > at org.apache.solr.search.function.IntFieldSource.getValues > > (IntFieldSource.java:50) > > at org.apache.solr.search.function.SimpleFloatFunction.getValues > > (SimpleFloatFunction.java:41) > > at org.apache.solr.search.function.BoostedQuery $CustomScorer. > > (BoostedQuery.java:111) > > at org.apache.solr.search.function.BoostedQuery $CustomScorer. > > (BoostedQuery.java:97) > > at org.apache.solr.search.function.BoostedQuery > > $BoostedWeight.scorer(BoostedQuery.java:88) > > at org.apache.lucene.search.IndexSearcher.search > > (IndexSearcher.java:132) > > at org.apache.lucene.search.Searcher.search(Searcher.java:126) > > at org.apache.lucene.search.Searcher.search(Searcher.java:105) > > at org.apache.solr.search.SolrIndexSearcher.getDocListNC > > (SolrIndexSearcher.java:966) > > at org.apache.solr.search.SolrIndexSearcher.getDocListC > > (SolrIndexSearcher.java:838) > > at org.apache.solr.search.SolrIndexSearcher.access$000 > > (SolrIndexSearcher.java:56) > > at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem > > (SolrIndexSearcher.java:260) > > at org.apache.solr
Re: Help with Solr 1.3 lockups?
"Lance Norskog" wrote on 01/20/2009 02:16:47 AM: > "Lance Norskog" > 01/20/2009 02:16 AM > Java 1.5 has thread-locking bugs. Switching to Java 1.6 may cure this > problem. Thanks for taking time to look at the problem. Unfortunately, this is happening on Java 1.6, so I can't put the blame there. Thanks, Jerry
Re: Help with Solr 1.3 lockups?
Mark Miller wrote on 01/26/2009 04:30:00 PM: > Just a point or I missed: with such a large index (not doc size large, > but content wise), I imagine a lot of your 16GB of RAM is being used by > the system disk cache - which is good. Another reason you don't want to > give too much RAM to the JVM. But you still want to give it enough to > avoid the OOM :) Assuming you are using the RAM you are legitimately. > And I don't yet have a reason to think you are not. I've bumped the JVM max memory to 2G. Hopefully that is enough. I'll be keeping an eye on it. > Also, there has been a report of or two of a lockup that didn't appear > to involve an OOM, so this is not guaranteed to solve that. However, > seeing that the lockup comes after the OOM, its the likely first thing > to fix. Once the memory problems are taken care of, the locking issue > can be addressed if you find it still remains. My bet is that fixing the > OOM will clear it up. I've gone through my code looking for possible leaks and didn't find anything. That doesn't mean they're not there of course. I ran an analyzer on the heap dump from the last OOM event. These were the likely items it identified: org/apache/catalina/connector/Connector java/util/WeakHashMap $Entry399,913,269 bytes org/apache/catalina/connector/Connector java/lang/Object[ ] 197,256,078 bytes org/apache/lucene/search/ExtendedFieldCachejava/util/WeakHashMap$Entry [ ] 177,893,021 bytes org/apache/lucene/search/ExtendedFieldCachejava/util/HashMap$Entry[ ] 42,490,720 bytes org/apache/lucene/search/ExtendedFieldCachejava/util/HashMap$Entry[ ] 42,490,656 bytes I'm not sure what to make of this, though. > > You also might lower the max warming searchers setting if that makes > > sense. I'm using the default setting of 2. I have seen an error about too many warming searchers once or twice, but not often. Thanks, Jerry
[1.3] help with update timeout issue?
Hi, folks, I am using Solr 1.3 pretty successfully, but am running into an issue that hits once in a long while. I'm still using 1.3 since I have some custom code I will have to port forward to 1.4. My basic setup is that I have data sources continually pushing data into Solr, around 20K adds per day. The index is currently around 100G, stored on local disk on a fast linux server. I'm trying to make new docs searchable as quickly as possible, so I currently have autocommit set to 15s. I originally had 3s but that seems to be a little too unstable. I never optimize the index since optimize will lock things up solid for 2 hours, dropping docs until the optimize completes. I'm using the default segment merging settings. Every once in a while I'm getting a socket timeout when trying to add a document. I traced it to a 20s timeout and then found the corresponding point in the Solr log. Jan 13, 2010 2:59:15 PM org.apache.solr.core.SolrCore execute INFO: [tales] webapp=/solr path=/update params={} status=0 QTime=2 Jan 13, 2010 2:59:15 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Jan 13, 2010 2:59:56 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@26e926e9 main Jan 13, 2010 2:59:56 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Solr locked up for 41 seconds here while doing some of the commit work. So, I have a few questions. Is this related to GC? Does Solr always lock up when merging segments and I just have to live with losing the doc I want to add? Is there a timeout that would guarantee me a write success? Should I just retry in this situation? If so, how do I distinguish between this and Solr just being down? I already have had issues in the past with too many files open, so increasing the merge factor isn't an option. On a related note, I had previously asked about optimizing and was told that segment merging would take care of cleaning up deleted docs. However, I have the following stats for my index: numDocs : 2791091 maxDoc : 4811416 My understanding is that numDocs is the docs being searched and maxDoc is the number of docs including ones that will disappear after optimization. How do I get this cleanup without using optimize, since it locks up Solr for multiple hours. I'm deleting old docs daily as well. Thanks for all the help, Jerry
Re: [1.3] help with update timeout issue?
Otis Gospodnetic wrote on 01/14/2010 10:07:15 PM: > See those "waitFlush=true,waitSearcher=true" ? Do things improve if > you make them false? (not sure how with autocommit without looking > at the config and not sure if this makes a difference when > autocommit triggers commits) Looking at DirectUpdateHandler2, it appears that those values are hardwired to true for autocommit. Unless there's another mechanism for changing that. > Re deleted docs, they are probably getting expunged, it's just that > you always have more deleted docs, so those 2 numbers will never be > the same without optimize. I can accept that they will always be different, but that's a large difference. Hmm, a couple weeks ago, I manually deleted a bunch of docs that had associated data get corrupted. Normally, I'd only be deleting a day's worth of docs at a time. Is there a time I could expect the old stuff to get cleaned up by without optimizing? Thanks, Jerry
Re: [1.3] help with update timeout issue?
Lance Norskog wrote on 01/16/2010 12:43:09 AM: > If your indexing software does not have the ability to retry after a > failure, you might with to change the timeout from 20 seconds to, say, > 5 minutes. I can make it retry, but I have somewhat real-time processes doing these updates. Does anyone push updates into a temporary file and then have an async process push the updates so that it can survive the lockups without worry? This seems like a real hack, but I don't want a long timeout like that in the program that currently pushes the data. One thing that worries me is that solr may not respond to searches in these windows. I'm basing that on the observation that search does not respond when solr is optimizing. Can anyone offer me insight on why these delays happen? Thanks, Jerry
Re: solr blocking on commit
ysee...@gmail.com wrote on 01/19/2010 06:05:45 PM: > On Tue, Jan 19, 2010 at 5:57 PM, Steve Conover wrote: > > I'm using latest solr 1.4 with java 1.6 on linux. Â I have a 3M > > document index that's 10+GB. Â We currently give solr 12GB of ram to > > play in and our machine has 32GB total. > > > > We're seeing a problem where solr blocks during commit - it won't > > server /select requests - in some cases for more than 15-30 seconds. > > We'd like to somehow configure things such that there's no > > interruption in /select service. > > A commit shouldn't cause searches to block. > Could this perhaps be a stop-the-word GC pause that coincides with the commit? This is essentially the same problem I'm fighting with. Once in a while, commit causes everything to freeze, causing add commands to timeout. My large index sees pauses on the order of 50 seconds once every day or two. I have a small index of 700M on disk that sees 20 second pauses once in a while. I'm using the IBM 1.6 jvm on linux. Jerry
Re: solr blocking on commit
ysee...@gmail.com wrote on 01/20/2010 02:24:04 PM: > On Wed, Jan 20, 2010 at 2:18 PM, Jerome L Quinn wrote: > > This is essentially the same problem I'm fighting with. Â Once in a while, > > commit > > causes everything to freeze, causing add commands to timeout. > > This could be a bit different. Commits do currently block other > update operations such as adds, but not searches. Ah, this is good to know. Is there any logging in solr 1.3 I could turn on to verify that this is indeed what's happening for me? Thanks, Jerry
Re: solr blocking on commit
ysee...@gmail.com wrote on 01/20/2010 02:24:04 PM: > On Wed, Jan 20, 2010 at 2:18 PM, Jerome L Quinn wrote: > > This is essentially the same problem I'm fighting with. Â Once in a while, > > commit > > causes everything to freeze, causing add commands to timeout. > > This could be a bit different. Commits do currently block other > update operations such as adds, but not searches. How solr organized so that search can continue when a commit has closed the index? Also, looking at lucene docs, commit causes a system fsync(). Won't search also get blocked by the IO traffic generated? Thanks, Jerry
Re: solr blocking on commit
Otis Gospodnetic wrote on 01/22/2010 12:20:45 AM: > I'm missing the bigger context of this thread here, but from the > snippet below - sure, commits cause in-memory index to get written > to disk, that causes some IO, and that *could* affect search *if* > queries are running on the same box. When index and/or query volume > is high, one typically puts indexing and searching on different servers. After some more research, I realize that what we're trying to do is essentially near-real-time processing. I have data collection that is near-real-time and I'm trying to avoid arbitrary delays pushing the data into the index so that the data collection doesn't stall. On the search side, we don't have a lot of search traffic but would like it to be responsive when it comes in. We also dynamically purge old data to keep the storage requirements within limits. So, basically I'm trying to have the system tuned so that this all works well :-) I'm trying to keep search on a single system to keep the costs down as well. One thing I'm trying now is to put an intermediary in so that updates can be asynchronous. Then my data collection processes can continue without waiting for unpredictable index merges. Thanks, Jerry
Re: SolrJ commit options
Shalin Shekhar Mangar wrote on 02/25/2010 07:38:39 AM: > On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata wrote: > > > > > We are using SolrJ to handle commits to our solr server.. All runs fine.. > > But whenever the commit happens, the server becomes slow and stops > > responding.. therby resulting in TimeOut errors on our production. We are > > using the default commit with waitFlush = true, waitSearcher = true... > > > > Can I change there values so that the requests coming to solr dont block on > > recent commit?? Also, what will be the impact of changing these values?? > > > > Solr does not block reads during a commit/optimize. Write operations are > queued up but they are still accepted. Are you using the same Solr server > for reads as well as writes? I've seen similar things with Solr 1.3 (not using SolrJ). If I try to optimize the index, queries will take much longer - easily a minute or more, resulting in timeouts. Jerry