Re: spellcheck causing Core Reload to hang
Hi, Basically, it hangs only on "core Reload" and not during queries. Furthermore, there is never any error reported in the logs, in fact the log only records until Core-Reload call. If I shut down and restart Solr, the next time it wont start, and still no errors in the log. On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter wrote: > > : after a lot of investigation today, I found that its the spellcheck > : component which is causing the issue. If its turned off, all will run > well > : and core can easily reload. However, when the spellcheck is on, the core > : wont reload instead hang forever. > > Can you take some stack traces while the server is hung? > > Do you have any firstSearcher or newSearcher warming queries configured? > If so can you try adding "spellcheck=false" to those warming queries and > see if it eliminates the problem? > > Smells like this thread... > > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E > > > ...would be good to get a jira open with a reproducible set of configs > that demonstrates the problem semi-reliably.. > > > -Hoss > -- Regards, Raheel Hasan
Re: spellcheck causing Core Reload to hang
Yes I have tried Spellcheck=false and with that everything works just fine. But I do need Spell check component so I cant just leave it off. On Mon, Sep 16, 2013 at 12:24 PM, Raheel Hasan wrote: > Hi, > > Basically, it hangs only on "core Reload" and not during queries. > Furthermore, there is never any error reported in the logs, in fact the log > only records until Core-Reload call. If I shut down and restart Solr, the > next time it wont start, and still no errors in the log. > > > > > On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter > wrote: > >> >> : after a lot of investigation today, I found that its the spellcheck >> : component which is causing the issue. If its turned off, all will run >> well >> : and core can easily reload. However, when the spellcheck is on, the core >> : wont reload instead hang forever. >> >> Can you take some stack traces while the server is hung? >> >> Do you have any firstSearcher or newSearcher warming queries configured? >> If so can you try adding "spellcheck=false" to those warming queries and >> see if it eliminates the problem? >> >> Smells like this thread... >> >> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E >> >> >> ...would be good to get a jira open with a reproducible set of configs >> that demonstrates the problem semi-reliably.. >> >> >> -Hoss >> > > > > -- > Regards, > Raheel Hasan > -- Regards, Raheel Hasan
Re: what does "UnInvertedField; UnInverted multi-valued field" means and how to fix it
Hay, thanks for the reply. So after a full day spent only on trying to figure this out, I have found the cause (spellcheck component)... but not the solution. Se my other post with the subject "*spellcheck causing Core Reload to hang*". I have explained it there. Thanks a lot. On Sun, Sep 15, 2013 at 2:35 AM, Erick Erickson wrote: > This is totally weird. Can you give us the exact > command you are using? > > Best > Erick > > > On Fri, Sep 13, 2013 at 8:15 AM, Raheel Hasan >wrote: > > > Hi guyz, > > > > I have an issue here in between Solr Core and Data Indexing: > > > > When I build some index from fresh setup, everything is fine: all queries > > and additional/update indexing, everything runs is fine. But when I > reload > > the Core, the solr stops from that point onward forever. > > > > All i get is this line as the last line of the solr log after the issue > as > > occurred: > > > > UnInvertedField; UnInverted multi-valued field > > > > > {field=prod_cited_id,memSize=4880,tindexSize=40,time=4,phase1=4,nTerms=35,bigTerms=4,termInstances=36,uses=0} > > > > Furthermore, the only way to get things working again, would be to delete > > the "data" folder inside "solr/{myCore}/"... > > > > > > So can anyone help me beat this issue and get things working again? I > cant > > afford this issue when the system is LIVE.. > > > > Thanks a lot. > > > > -- > > Regards, > > Raheel Hasan > > > -- Regards, Raheel Hasan
Re: spellcheck causing Core Reload to hang
Please see the log (after solr restart) in the other msg I posted on this forum with the subject: "*Unable to connect" to "http://localhost:8983/solr/ *" Thanks. On Mon, Sep 16, 2013 at 12:25 PM, Raheel Hasan wrote: > Yes I have tried Spellcheck=false and with that everything works just > fine. But I do need Spell check component so I cant just leave it off. > > > On Mon, Sep 16, 2013 at 12:24 PM, Raheel Hasan > wrote: > >> Hi, >> >> Basically, it hangs only on "core Reload" and not during queries. >> Furthermore, there is never any error reported in the logs, in fact the log >> only records until Core-Reload call. If I shut down and restart Solr, the >> next time it wont start, and still no errors in the log. >> >> >> >> >> On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter < >> hossman_luc...@fucit.org> wrote: >> >>> >>> : after a lot of investigation today, I found that its the spellcheck >>> : component which is causing the issue. If its turned off, all will run >>> well >>> : and core can easily reload. However, when the spellcheck is on, the >>> core >>> : wont reload instead hang forever. >>> >>> Can you take some stack traces while the server is hung? >>> >>> Do you have any firstSearcher or newSearcher warming queries configured? >>> If so can you try adding "spellcheck=false" to those warming queries and >>> see if it eliminates the problem? >>> >>> Smells like this thread... >>> >>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E >>> >>> >>> ...would be good to get a jira open with a reproducible set of configs >>> that demonstrates the problem semi-reliably.. >>> >>> >>> -Hoss >>> >> >> >> >> -- >> Regards, >> Raheel Hasan >> > > > > -- > Regards, > Raheel Hasan > -- Regards, Raheel Hasan
Re: solr/document/select not available
If you have two cores, then the core name should be in your URL. Http://host:8983/solr//select?q=blah Or you can set a default core in solr.xml. Upayavira On Sun, Sep 15, 2013, at 12:16 PM, Nutan wrote: > I get this error : solr/select not available.I am using two cores > document > and contract.Solrconfig.xml of document core is : > > > > LUCENE_42 > ${solr.collection1.data.dir:} > /requestDispatcher> > > > >default="true"> > > >explicit >20 >* >2.1 > > > > > class="solr.extraction.ExtractingRequestHandler" > > > last_modified > contents > true > ignored_ > > > > > *:* > > > > I have defined standard request handler but still why do i get this > error? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-document-select-not-available-tp4090171.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing/indexing speed drops quickly
On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote: > Put your OS and Solr itself on regular disks in RAID1 and your Solr data > on the SSD. Due to the eventual decay caused by writes, SSD will > eventually die, so be ready for SSD failures to take out shard replicas. One of the very useful properties of wear-levelling on SSD's is the wear status of the drive can be queried. When the drive nears its EOL, replace it. As Lucene mainly uses bulk writes when updating the index, I will add that the chances of wearing out a SSD by using it primarily for Lucene/Solr is pretty hard to do, unless one constructs a pathological setup. Your failure argument is thus really a claim that SSDs are not reliable technology. That is a fair argument as there has been some really rotten apples among the offerings. This is coupled with the fact that is is still a very rapidly changing technology, which makes it hard to pick an older proven drive that is not markedly surpassed by the bleeding edge. > So far I'm not aware of any RAID solutions that offer TRIM support, > and without TRIM support, an SSD eventually has performance problems. Search speed is not affected as "only" write performance suffers without trim, but index update speed will be affected. Also, while it is possible to get TRIM in RAID, there is currently only a single hardware option: http://www.anandtech.com/show/6161/intel-brings-trim-to-raid0-ssd-arrays-on-7series-motherboards-we-test-it Regards, - Toke Eskildsen, State and University Library, Denmark
Stop zookeeper from batch
Hi, We have setup solrcloud with zookeeper and 2 tomcats . we are using a batch file to start the zookeeper, uplink config files and start tomcats. Now, i need to stop zookeeper from the batch file. How is this possible. Im using Windows server. Zookeeper 3.4.5 version. Pls help. Thanks, Prasi
How to make Solr complex Join Query patch in java
Hi, Can anyone have any idea how to write a patch in java which will support for complex join query in solr. I have the solr source code. If you have any sample code for the same, please share with me. Thanks Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-Solr-complex-Join-Query-patch-in-java-tp4090314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from "spellcheck.maxCollationTries"... Even with 1 it hangs forewver. The only way to restart is to stop solr, delete "data" folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com.
2 question about solr and lucene
Hi, guys: I met two questions about solr and lucene, wish people to help out. use payload query but can NOT with numerical field type. for example: I implemented my own requesthandler, refer to http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ I query in solr:sinaTag:operate solr response: "numFound": 2, "start": 0,"maxScore": 99,"docs": [ {"id": "1628209010", "followersCount": 752, "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94", "score": 99 }, {"id": "1900546410", "followersCount": 1002, "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 Media|94 AD|93 Vehicle|92 ", "score": 96 } This work well. But query with combined with other numberical condition, such as: sinaTag:operate and followersCount:[752 TO 752] {"responseHeader": {"status": 0,"QTime": 40 }, "response": {"numFound": 0,"start": 0, "maxScore": 0,"docs": [] } } According these dataset, the first record should be responsed rather than NOT FOUND. I not know why. 2. About string field fuzzy match filtering, how to get the score? what the formula is? When I used two or several string fuzzy match, probable AND or OR, how to get the score? what the formula is? Might I implement myself score formula class which interface or abstract class to extend ? Thanks in advance.
Re-Ranking results based on DocValues with custom function.
Hi! I'm having quite an index with a lot of text and some binary data in the documents (numeric vectors of arbitrary size with associated dissimilarity functions). What I want to do is to search using common text search and then (optionally) re-rank using some custom function like http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc I've seen that there are hooks in solrconfig.xml, but I did not find an example or some documentation. I'd be most grateful if anyone could either point me to one or give me a hint for another way to go :) Btw. Using just the DocValues for search is handled by a custom RequestHandler, which works great, but using text as a main search feature, and my DocValues for re-ranking, I'd rather just add a function for sorting and use the current, stable and well performing request handler. cheers, Mathias ps. a demo of the current system is available at: http://demo-itec.uni-klu.ac.at/liredemo/ -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec
RE: Spellcheck compounded words
Which version of Solr are you running? (the post you replied to was about Solr 3.3, but the latest version now is 4.4.) Please provide configuration details and the query you are running that causes the problem. Also explain exactly what the problem is (query never returns?). Also explain why you have to delete the "data" dir when you restart. With a little background information, maybe someone can help. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rah1x [mailto:raheel_itst...@yahoo.com] Sent: Monday, September 16, 2013 5:47 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from "spellcheck.maxCollationTries"... Even with 1 it hangs forewver. The only way to restart is to stop solr, delete "data" folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Frequent softCommits leading to high faceting times?
Soft commits are not free, they invalidate certain caches which then have to be reloaded. I suspect you're hitting this big time. The question is always "do you really, really _need_ 1 second latency?". Set the soft commit interval to be as long as your application can stand IMO. And it may have nothing to do with facets, since things like your filterCache autowarming is done on soft commit. Ditto for the other caches you've configured in solrconfig.xml. But one thing to watch is the size of your tlog. Transaction logs are only truncated on hard commits, and can get replayed on restart. So you're risking long restart times here. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ FWIW, Erick On Mon, Sep 16, 2013 at 1:43 AM, Rohit Kumar wrote: > Hi, > > We are running *SOLR 4.3* with 8 Gb of index on > > Ubuntu 12.04 64 bits > Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Single core. > 16GB RAM > > > We just started using the autoSoftCommit feature and noticed the facet > queries slowed down from milliseconds taking earlier to a minute. We have > *8 > facet fields*. > > We add close to 300 documents per second during peak interval. > > > 60 > false > > > > 1000 > > > > Here is some information i got with debugQuery. Please note that *facet > time is more than 50 seconds.* > > > 50779.0 > > 0.0 > > > 41.0 > > * > 50590.0 > * > > 0.0 > > > 0.0 > > > 0.0 > > > 5.0 > > > 143.0 > > > > Please help. > > Thanks, > Rohit Kumar >
how soft-commit works
Can anyone explain me the following things about soft-commit? -For searches o access new documents I think a new searcher is opened after a soft commit. How does the near realtime requirement for soft commit match with the potentially long time taken to warm up caches for the new searcher? -Is it a good idea to set openSearcher=false in auto commit and rely on soft auto commit to see new data in searches? thanks Matteo Grolla
Re: How to make Solr complex Join Query patch in java
I'd start by taking a very hard look at my data model and seeing if I can make redefine the model (as translated from the DB you may be coming from). Solr does not excel at what RDBMSs are designed to do. Really. I predict that if you just try to make Solr into a RDMBS, you'll expend a lot of effort and not be satisfied with the results. For instance, do you expect to support joins across shards, i.e. distributed support? What about "block joins"? Sub-selects (again if so, what about distributed)? Grouping? But if you absolutely insist on trying this, look at the existing Join code. Take a look through any classes in the Solr/Lucene source tree starting with either Join or BlockJoin. Note that they are two very different capabilities so be aware of that as you look through them. Best, Erick On Mon, Sep 16, 2013 at 6:18 AM, ashimbose wrote: > Hi, > > Can anyone have any idea how to write a patch in java which will support > for > complex join query in solr. I have the solr source code. If you have any > sample code for the same, please share with me. > > Thanks > Ashim > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-make-Solr-complex-Join-Query-patch-in-java-tp4090314.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: 2 question about solr and lucene
Are you saying you're trying to put payloads on the numeric data? If that's the case I don't know how that works. But a couple of things: sinaTag:operate and followersCount:[752 TO 752] is incorrect, you must capitalize the and as sinaTag:operate AND followersCount:[752 TO 752] Your syntax _should_ work since [] is inclusive, but I'd just try with 751 TO 753] once to be sure. Attach &debug=all to your query and you'll see exactly how the query is parsed. You'll also see exactly how the scores are calculated in a long, complex bit of output. I _think_ that fuzzy and wildcards do a "constant score query", so don't be surprised if the calculations show you that the fuzzy matches don't change the score. Best, Erick On Mon, Sep 16, 2013 at 3:08 AM, Robin Wei wrote: > Hi, guys: > I met two questions about solr and lucene, wish people to help out. > > use payload query but can NOT with numerical field type. for example: > I implemented my own requesthandler, refer to > http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ > I query in solr:sinaTag:operate > solr response: > > "numFound": 2, >"start": 0,"maxScore": 99,"docs": [ {"id": > "1628209010", > "followersCount": 752, > "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 > E-commercial|94", > "score": 99 > >}, > > {"id": "1900546410", > "followersCount": 1002, > "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 > Activity|95 Media|94 AD|93 Vehicle|92 ", > "score": 96 > >} > > This work well. > But query with combined with other numberical condition, such as: > sinaTag:operate and followersCount:[752 TO 752] > {"responseHeader": {"status": 0,"QTime": > 40 }, "response": {"numFound": 0,"start": 0, >"maxScore": 0,"docs": [] } >} >According these dataset, the first record should be responsed rather > than NOT FOUND. >I not know why. > > > 2. About string field fuzzy match filtering, how to get the score? what > the formula is? > When I used two or several string fuzzy match, probable AND or OR, > how to get the score? what the formula is? > Might I implement myself score formula class which interface or > abstract class to extend ? > > > > > > Thanks in advance. > > > > >
Slow query at first time
Hi, I´m trying to make a search with Solr 4.4, but in the first time the search is too slow. I have studied about pre-warm queries, but the query response is the same after putting it. Can anyone help me? Here´s a piece of solrconfig.xml: codigoRoteiro:95240816 0 20 in the schema.xml: codigoRoteiro When I start Solr, the following message is shown: $ java -server -Xms2048m -Xmx4096m -Dsolr.solr.home="./oracleCore/solr" -jar start.jar . . . 8233 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore û QuerySenderListener done. 8235 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore û [db] Registered new searcher Searcher@30b6b67dmain{StandardDirectoryReader(segments_6:34 _f(4.4):C420060)} And here´s my solrj sample code: SolrServer solrServer = new HttpSolrServer(solrServerUrl); SolrQuery query = new SolrQuery(); query.setQuery("codigoRoteiro:95240816"); query.set("start", "0"); query.set("rows", "20"); query.addField("codigoRoteiro"); query.addField("rowidString"); query.addField("descricaoRoteiro"); query.addField("numeroDias"); query.addField("numeroNoites"); query.addField("dataSaida"); Date initialTime = new Date(); QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); Date finalTime = new Date(); System.out.println("Total timel: " + (finalTime.getTime()-initialTime.getTime()) + " ms"); The response time is arround 200 ms. If I remove the prewarm query, the response time doesn´t change. Shouldn´t the response time be minor when using pre-warm query? Thanks in advance, -- Sergio Stateri Jr. stat...@gmail.com
Re: Solr Java Client
On 16 September 2013 02:47, Baskar Sikkayan wrote: [...] > Have a question now. > > I know in solr its flat file system and the data will be in denormalized > form. > > My question : > > Have 3 tables, > > 1) user (userid, firstname, lastname, ...) > 2) master (masterid, skills, ...) > 3) child (childid, masterid, userid, ...) > > In solr, i have added all these field for each document. > > Example, > > childid,masterid,userid,skills,firstname,lastname > > Real Data Example, > > 1(childid),1(masterid),1(userid),"java,jsp","baskar","sks" > 2(childid),1(masterid),1(userid),"java,jsp","baskar","sks" > 3(childid),1(masterid),1(userid),"java,jsp","baskar","sks" As people have already advised you, the best way to decide how to organise your data in the Solr index depends on the searches that you want to make. This is not entirely clear from your description above. The flattening sample that you show above would be suitable if the user is to search by 'child' attributes, but can be simplified otherwise. > The above data sample is from solr document. > In my search result, i will have to show all these fields. > > User may change the name at any time.The same has to be updated in solr. > > In this case, i need to find all the child id that belongs to the user and > update the username with those child ids. > > Please tell me if there is any other better approach than this. How would you know that the user name has been changed? Is there a modification date for that table. If so, it would make sense to check that against the last time indexing to Solr was done. A DIH delta-import makes this straightforward. Updates as you suggest above would be the normal way to handle things. You should batch your updates, say by running an update script at periodic intervals. Regards, Gora
Re: Spellcheck compounded words
Hi, I m running 4.3.. I have posted all the details in another threat... do you want me to copy it here? or could you see that? The subject is "*spellcheck causing Core Reload to hang*". On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James wrote: > Which version of Solr are you running? (the post you replied to was about > Solr 3.3, but the latest version now is 4.4.) Please provide configuration > details and the query you are running that causes the problem. Also > explain exactly what the problem is (query never returns?). Also explain > why you have to delete the "data" dir when you restart. With a little > background information, maybe someone can help. > > James Dyer > Ingram Content Group > (615) 213-4311 > > -Original Message- > From: Rah1x [mailto:raheel_itst...@yahoo.com] > Sent: Monday, September 16, 2013 5:47 AM > To: solr-user@lucene.apache.org > Subject: Re: Spellcheck compounded words > > Hi guyz, > > Did anyone solve this issue? > > I am having it also, it took me 3 days to exactly figure it out that its > coming from "spellcheck.maxCollationTries"... > > Even with 1 it hangs > forewver. The only way to restart is to stop solr, delete "data" folder and > then start solr again (i.e. index lost !). > > Regards, > Raheel > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Regards, Raheel Hasan
Re: Solr Java Client
Hi Gora, Thanks a lot for your reply. "As people have already advised you, the best way to decide how to organise your data in the Solr index depends on the searches that you want to make. This is not entirely clear from your description above. The flattening sample that you show above would be suitable if the user is to search by 'child' attributes, but can be simplified otherwise." *Yes, the search is based on the child attributes.* "How would you know that the user name has been changed? Is there a modification date for that table. If so, it would make sense to check that against the last time indexing to Solr was done. A DIH delta-import makes this straightforward." *As of now, there is no special column to know if the username has been changed. * *But, whenever the user update his name, i can track that in my java code and send the update to Solr. * *Here, I am planning to use Solr java client. * *But, all the above things are possible with Java client and also with delta-import. * *I am looking for changing the solr data whenever there is a change in the database.* *Even there is a small delay i am fine with that. * *Which one you will suggest? * *Solr Java client or DIH delta-import * *My application running on server A, database on server B and solr will be on server C. * *If i am supposed to use, Solr Java client, i may need to hit the database sometimes to get some parent data and then need to send the same to Solr. * *Guess, its a unnecessary trip. * *So confused here, if i need to go with Java client or DIH delta import. * Thanks, Baskar.S On Mon, Sep 16, 2013 at 9:23 AM, Gora Mohanty wrote: > On 16 September 2013 02:47, Baskar Sikkayan wrote: > [...] > > Have a question now. > > > > I know in solr its flat file system and the data will be in denormalized > > form. > > > > My question : > > > > Have 3 tables, > > > > 1) user (userid, firstname, lastname, ...) > > 2) master (masterid, skills, ...) > > 3) child (childid, masterid, userid, ...) > > > > In solr, i have added all these field for each document. > > > > Example, > > > > childid,masterid,userid,skills,firstname,lastname > > > > Real Data Example, > > > > 1(childid),1(masterid),1(userid),"java,jsp","baskar","sks" > > 2(childid),1(masterid),1(userid),"java,jsp","baskar","sks" > > 3(childid),1(masterid),1(userid),"java,jsp","baskar","sks" > > As people have already advised you, the best way to decide > how to organise your data in the Solr index depends on the > searches that you want to make. This is not entirely clear > from your description above. The flattening sample that you > show above would be suitable if the user is to search by > 'child' attributes, but can be simplified otherwise. > > > The above data sample is from solr document. > > In my search result, i will have to show all these fields. > > > > User may change the name at any time.The same has to be updated in solr. > > > > In this case, i need to find all the child id that belongs to the user > and > > update the username with those child ids. > > > > Please tell me if there is any other better approach than this. > > How would you know that the user name has been changed? > Is there a modification date for that table. If so, it would make > sense to check that against the last time indexing to Solr was > done. A DIH delta-import makes this straightforward. > > Updates as you suggest above would be the normal way to handle > things. You should batch your updates, say by running an update > script at periodic intervals. > > Regards, > Gora >
RE: Spellcheck compounded words
I would investigate Hoss's suggestion and look at warming queries. In some cases I've seen "maxCollationTries" in warming queries to cause a hang. Unless you're trying to build your spellcheck dictionary during warming, you can safely turn spellcheck off for all warming queries. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Raheel Hasan [mailto:raheelhasan@gmail.com] Sent: Monday, September 16, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi, I m running 4.3.. I have posted all the details in another threat... do you want me to copy it here? or could you see that? The subject is "*spellcheck causing Core Reload to hang*". On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James wrote: > Which version of Solr are you running? (the post you replied to was about > Solr 3.3, but the latest version now is 4.4.) Please provide configuration > details and the query you are running that causes the problem. Also > explain exactly what the problem is (query never returns?). Also explain > why you have to delete the "data" dir when you restart. With a little > background information, maybe someone can help. > > James Dyer > Ingram Content Group > (615) 213-4311 > > -Original Message- > From: Rah1x [mailto:raheel_itst...@yahoo.com] > Sent: Monday, September 16, 2013 5:47 AM > To: solr-user@lucene.apache.org > Subject: Re: Spellcheck compounded words > > Hi guyz, > > Did anyone solve this issue? > > I am having it also, it took me 3 days to exactly figure it out that its > coming from "spellcheck.maxCollationTries"... > > Even with 1 it hangs > forewver. The only way to restart is to stop solr, delete "data" folder and > then start solr again (i.e. index lost !). > > Regards, > Raheel > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Regards, Raheel Hasan
Re: Spellcheck compounded words
I am building it on Commit.. true Please see my other thread for all Logs and Schema + Solrconfig settings. On Mon, Sep 16, 2013 at 7:03 PM, Dyer, James wrote: > I would investigate Hoss's suggestion and look at warming queries. In > some cases I've seen "maxCollationTries" in warming queries to cause a > hang. Unless you're trying to build your spellcheck dictionary during > warming, you can safely turn spellcheck off for all warming queries. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Raheel Hasan [mailto:raheelhasan@gmail.com] > Sent: Monday, September 16, 2013 8:29 AM > To: solr-user@lucene.apache.org > Subject: Re: Spellcheck compounded words > > Hi, > > I m running 4.3.. > > I have posted all the details in another threat... do you want me to copy > it here? or could you see that? The subject is "*spellcheck causing Core > Reload to hang*". > > > > > On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James > wrote: > > > Which version of Solr are you running? (the post you replied to was about > > Solr 3.3, but the latest version now is 4.4.) Please provide > configuration > > details and the query you are running that causes the problem. Also > > explain exactly what the problem is (query never returns?). Also explain > > why you have to delete the "data" dir when you restart. With a little > > background information, maybe someone can help. > > > > James Dyer > > Ingram Content Group > > (615) 213-4311 > > > > -Original Message- > > From: Rah1x [mailto:raheel_itst...@yahoo.com] > > Sent: Monday, September 16, 2013 5:47 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Spellcheck compounded words > > > > Hi guyz, > > > > Did anyone solve this issue? > > > > I am having it also, it took me 3 days to exactly figure it out that its > > coming from "spellcheck.maxCollationTries"... > > > > Even with 1 it hangs > > forewver. The only way to restart is to stop solr, delete "data" folder > and > > then start solr again (i.e. index lost !). > > > > Regards, > > Raheel > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > -- > Regards, > Raheel Hasan > > -- Regards, Raheel Hasan
Re: sorting using org.apache.solr.client.solrj.SolrQuery not working
Shawn, I am doing exactly same. Data output is not sorting on "LAST_NAME" column , but it is always sorting on different column "CLAIM_NUM", and I am not adding this sorting condition( sort on CLAIM_NUM). solrQuery.setQuery("*:*"); solrQuery.setSort("LAST_NAM",SolrQuery.ORDER.asc); solrQuery.setFilterQueries("String Query"); In the log i see the sorting column as "LAST_NAM". Is there a difference between "LAST_NAM asc" and "LAST_NAM+asc"...I see only this diff? "params={sort=LAST_NAM+asc&start=0&q=*:*&wt=javabin&fq=(LAST_NAM:*D*)+AND++-CLAI_RISK_MNGT_FLG+:+Y+&version=2&rows=30} hits=196 status=0 QTime=2 " CLAI_IDN I also tried addOrUpdateSort and addSort, But it is always sorting on CLAI_CLM_NUM, not sure why? -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-org-apache-solr-client-solrj-SolrQuery-not-working-tp4089985p4090364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr PingQuery
I want to add one more thing for Shawn about Zookeeper. In order to have quorum, you need to have half the servers plus one available. Because of that let's assume you have 4 machine of Zookeeper and two of them communicating within them and other two of them communicating within them. Assume that this two zookeeper sets (each of them has two zookeeper node) can not communicate with each other. This will result with a brain split. So the rule is simple. In order to have quorum, you need to have half the servers plus one available because there can not be two different sets at any time that has a number of half the servers plus one. There can be only one. 2013/9/15 Shawn Heisey > On 9/14/2013 6:57 AM, Prasi S wrote: > > I use SolrPingResponse.getStatus method to start indexing to solr. I use > > SolrCloud with external zookeeper > > > > If i send it to the Zookeeper, if zookeeper is down, it returns NOTOK. > > > > But if one of my solr is up and second solr is down, the Ping returns OK > > status. > > If your zookeeper is completely down or does not have quorum, then > SolrCloud isn't going to work right, so a ping response of NOTOK is > correct. > > A fully redundant zookeeper ensemble is at least three machines, > preferably an odd number. You can run zookeeper on the same hardware as > Solr, but it is recommended that it be a standalone process. You should > not run the solr embedded zookeeper (-DzkRun) for production, because > when you shutdown or restart Solr, the embedded zookeeper also goes down. > > With three machines in the zookeeper ensemble, you can have one of them > go down and everything keeps working perfectly. > > If you want to know why an odd number is recommended, consider a > scenario with four zookeepers instead of three. In order to have > quorum, you need to have half the servers plus one available. On a > four-server ensemble, that works out so that three of them have to be > running. You are no better off than if you have three servers, because > in either scenario you can only have one failure. On top of that, you > have an extra possible point of failure and you're using more resources, > like switchports and power. With five servers, two can go down and > quorum will be maintained. > > If you only have two zookeepers, they both must be operational in order > to have quorum. If one of them were to fail, quorum would be lost and > SolrCloud would stop working correctly. > > SolrCloud itself is also designed to deal with a failure of a single > machine. A replicationFactor of at least two is required for that to > work correctly. > > Thanks, > Shawn > >
Re: sorting using org.apache.solr.client.solrj.SolrQuery not working
: In the log i see the sorting column as "LAST_NAM". : Is there a difference between "LAST_NAM asc" and "LAST_NAM+asc"...I see only : this diff? the log message you are looking at is showing you the request params recieved by the handler fro mthe client, with URL escaping -- so the "+" you see is the url escaing of the " " sent by the client. can you show us the declaration for the handler name you are using? If you are seeing theresults sorted by a differnet field then the one you specified in the client, then it has to be specified somehwere -- I'm guessing since it's not explicit in that log message that it's "/select" but it could also be whatever you have configured asthe default="true". my best guess is that the requestHandler has some init params that set the sot option as an invariant so that you can't override it. : "params={sort=LAST_NAM+asc&start=0&q=*:*&wt=javabin&fq=(LAST_NAM:*D*)+AND++-CLAI_RISK_MNGT_FLG+:+Y+&version=2&rows=30} : hits=196 status=0 QTime=2 " one other thing to sanity check: try loading your requestHandler, with all ofthose params (except for "wt=javabin") in a browser window, and double check which order the results come back -- just to verify that the results really are getting sorted incorrectly on the solr side and that the problem isn't some other bit of javacode you have re-sorting the results that get returned. if you load the URL in your browser, yo ucan also add echoParams=all to see every param used in the request, even if it is an invariant specified in the requestHandler config. -Hoss
SOLR 4.2, slaves replicating reporting higher version number than master
Having a strange intermittent issue with my 1 master, 3 slave solr 4.2 setup. On occasion, after indexing the master and replicating across the three slaves, each slave will start reporting they are one generation ahead (525 vs. 524 on the master) and thus out of sync. Replication runs appear to do nothing, and it seems to not really be affecting performance, it's just tickling my admin nerves. Any suggestions of what to look at? Just upgrade solr perhaps? 4.2 might be getting rather old...
Re: SOLR 4.2, slaves replicating reporting higher version number than master
Sounds like perhaps you are getting confused by this... https://issues.apache.org/jira/browse/SOLR-4661 ...if that is the situation then it's not a bug you need to worry about, just a confusion in how the ReplicaitonHandler reports it's stats -- the newer UI makes it more clear what numbers you are looking at. If that doesn't looke like the problem you are seeing, then more detail on how to reproduce what you are seeing would be helpful (replicaiton configs, logs from amster & slave, etc...) -Hoss
Re: how soft-commit works
On 9/16/2013 7:01 AM, Matteo Grolla wrote: > Can anyone explain me the following things about soft-commit? > -For searches o access new documents I think a new searcher is opened after a > soft commit. > How does the near realtime requirement for soft commit match with the > potentially long time taken to warm up caches for the new searcher? > -Is it a good idea to set > openSearcher=false in auto commit > and rely on soft auto commit to see new data in searches? That is a very common way for installs requiring NRT updates to get configured. NRTCachingDirectoryFactory, which is the directory class used in the example since 4.0, is a wrapper around MMapDirectoryFactory, which is the old default in 3.x. For soft commits, the NRT directory keeps small commits in RAM rather than writing it to the disk, which makes the process of opening a new searcher happen a lot faster. http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html If your index rate is very fast or you index large amounts of data, the NRT directory doesn't gain you much over MMap, but because we made it the default in the example, it probably doesn't have any performance detriment. Thanks, Shawn
Re: Slow query at first time
What is query time of your search? I mean as like that: QueryResponse solrResponse = query(solrParams); solrResponse.getQTime(); 2013/9/16 Sergio Stateri > Hi, > > I´m trying to make a search with Solr 4.4, but in the first time the search > is too slow. I have studied about pre-warm queries, but the query response > is the same after putting it. Can anyone help me? Here´s a piece of > solrconfig.xml: > > > > > codigoRoteiro:95240816 > 0 > 20 > > > > > in the schema.xml: > > required="true" multiValued="false" /> > > > codigoRoteiro > > When I start Solr, the following message is shown: > > $ java -server -Xms2048m -Xmx4096m -Dsolr.solr.home="./oracleCore/solr" > -jar start.jar > . > . > . > 8233 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore û > QuerySenderListener done. > 8235 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore û > [db] Registered new searcher > Searcher@30b6b67dmain{StandardDirectoryReader(segments_6:34 > _f(4.4):C420060)} > > And here´s my solrj sample code: > > SolrServer solrServer = new HttpSolrServer(solrServerUrl); > > SolrQuery query = new SolrQuery(); > query.setQuery("codigoRoteiro:95240816"); > > query.set("start", "0"); query.set("rows", "20"); > query.addField("codigoRoteiro"); query.addField("rowidString"); > query.addField("descricaoRoteiro"); query.addField("numeroDias"); > query.addField("numeroNoites"); query.addField("dataSaida"); > > Date initialTime = new Date(); QueryResponse rsp = server.query( query ); > SolrDocumentList docs = rsp.getResults(); Date finalTime = new Date(); > System.out.println("Total timel: " + > (finalTime.getTime()-initialTime.getTime()) + " ms"); > > > The response time is arround 200 ms. If I remove the prewarm query, the > response time doesn´t change. Shouldn´t the response time be minor when > using pre-warm query? > > > Thanks in advance, > > -- > Sergio Stateri Jr. > stat...@gmail.com >
Re: Slow query at first time
On 9/16/2013 7:15 AM, Sergio Stateri wrote: > I´m trying to make a search with Solr 4.4, but in the first time the search > is too slow. I have studied about pre-warm queries, but the query response > is the same after putting it. Can anyone help me? Here´s a piece of > solrconfig.xml: > > You've configured a firstSearcher. Basically what this means is that this query will be run when Solr first starts up, and never run again after that. Make it a newSearcher instead of firstSearcher, and it will get run every time a new searcher gets created, and it might solve your problem. For further troublsehooting if the change above doesn't help, how big is your index, and how much RAM does the machine have? We already know what your java heap is (2GB minimum, 4GB maximum). Thanks, Shawn
Dynamic row sizing for documents via UpdateCSV
Hello, I am using UpdateCSV to load data in solr. Currently I load this schema with a static set of values: userid,name,age,location john8322,John,32,CA tom22,Tom,30,NY But now I have this usecase where john8322 might have a state specific dynamic field for example: userid,name,age,location, ca_count_i john8322,John,32,CA, 7 And tom22 might have different dynamic fields: userid,name,age,location, ny_count_i,oh_count_i tom22,Tom,30,NY, 981,11 So is it possible to pass different columns sizes for each row, something like this: john8322,John,32,CA,ca_count_i:7 tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11 I understand that the above syntax is not possible, but is there any other way of solving this problem? -- Thanks, -Utkarsh
dih delete doc per $deleteDocById
i am using dih and want to delete indexed documents by xml-file with ids. i have seen $deleteDocById used in data-config.xml: xml-file: 2345
Re: SOLR 4.2, slaves replicating reporting higher version number than master
Looks much like what I'm encountering. Guessing that will go away once I update solr, just wanted to make sure it wasn't a real bug. Entirely possible we are getting some "empty commits" given the nature of the index maintenance. Thanks for the pointer! On Mon, Sep 16, 2013 at 2:00 PM, Chris Hostetter wrote: > > Sounds like perhaps you are getting confused by this... > > https://issues.apache.org/jira/browse/SOLR-4661 > > ...if that is the situation then it's not a bug you need to worry about, > just a confusion in how the ReplicaitonHandler reports it's stats -- the > newer UI makes it more clear what numbers you are looking at. > > If that doesn't looke like the problem you are seeing, then more detail on > how to reproduce what you are seeing would be helpful (replicaiton > configs, logs from amster & slave, etc...) > > > -Hoss >
Re: Best configuration for 2 servers
At the moment I can't think of any reason why queries could not be served w/o ZK up and running. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Solr Performance Monitoring -- http://sematext.com/spm On Mon, Sep 16, 2013 at 4:58 PM, Branham, Jeremy [HR] wrote: > I may be interpreting this incorrectly, but shouldn't the cloud still serve > requests if ZK crashes? > > http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble > > " The problem with example B is that while there are enough Solr servers to > survive any one of them crashing, there is only one zookeeper server that > contains the state of the cluster. If that zookeeper server crashes, > distributed queries will still work since the solr servers remember the state > of the cluster last reported by zookeeper. The problem is that no new servers > or clients will be able to discover the cluster state, and no changes to the > cluster state will be possible." > > > > > > Jeremy D. Branham > Performance Technologist II > Sprint University Performance Support > Fort Worth, TX | Tel: **DOTNET > Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627 > http://JeremyBranham.Wordpress.com > http://www.linkedin.com/in/jeremybranham > > > -Original Message- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Friday, September 13, 2013 2:40 PM > To: solr-user@lucene.apache.org > Subject: Re: Best configuration for 2 servers > > On 9/13/2013 12:50 PM, Branham, Jeremy [HR] wrote: >> Does this sound appropriate then? [assuming no 3rd server] >> >> Server A: >> Zoo Keeper >> SOLR with 1 shard >> >> Server B: >> SOLR with ZK Host parameter set to Server A > > Yes, that will work, but if the ZK on server A goes down, the entire cloud is > down. > > When you create a collection with replicationFactor=2, one replica will be on > server A and one replica will be on server B. > > If you want to break the index up into multiple shards, you can, you'll also > need the maxShardsPerNode parameter when you create the collection, and all > shards will have replicas on both machines. > > A note about zookeeper and redundancy, and an explanation about why 3 hosts > are required: To form a quorum, zookeeper must have the votes of a majority > of the hosts in the ensemble. If there are only two hosts, it's not possible > for there to be a majority unless both hosts are up, so two hosts is actually > worse than one. You need to either have one ZK node or at least three, > preferably an odd number. > > Thanks, > Shawn > > > > > > This e-mail may contain Sprint proprietary information intended for the sole > use of the recipient(s). Any use by others is prohibited. If you are not the > intended recipient, please contact the sender and delete all copies of the > message. >
RE: Best configuration for 2 servers
I may be interpreting this incorrectly, but shouldn't the cloud still serve requests if ZK crashes? http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble " The problem with example B is that while there are enough Solr servers to survive any one of them crashing, there is only one zookeeper server that contains the state of the cluster. If that zookeeper server crashes, distributed queries will still work since the solr servers remember the state of the cluster last reported by zookeeper. The problem is that no new servers or clients will be able to discover the cluster state, and no changes to the cluster state will be possible." Jeremy D. Branham Performance Technologist II Sprint University Performance Support Fort Worth, TX | Tel: **DOTNET Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627 http://JeremyBranham.Wordpress.com http://www.linkedin.com/in/jeremybranham -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, September 13, 2013 2:40 PM To: solr-user@lucene.apache.org Subject: Re: Best configuration for 2 servers On 9/13/2013 12:50 PM, Branham, Jeremy [HR] wrote: > Does this sound appropriate then? [assuming no 3rd server] > > Server A: > Zoo Keeper > SOLR with 1 shard > > Server B: > SOLR with ZK Host parameter set to Server A Yes, that will work, but if the ZK on server A goes down, the entire cloud is down. When you create a collection with replicationFactor=2, one replica will be on server A and one replica will be on server B. If you want to break the index up into multiple shards, you can, you'll also need the maxShardsPerNode parameter when you create the collection, and all shards will have replicas on both machines. A note about zookeeper and redundancy, and an explanation about why 3 hosts are required: To form a quorum, zookeeper must have the votes of a majority of the hosts in the ensemble. If there are only two hosts, it's not possible for there to be a majority unless both hosts are up, so two hosts is actually worse than one. You need to either have one ZK node or at least three, preferably an odd number. Thanks, Shawn This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.
Atomic commit across shards?
Is a commit (hard or soft) atomic across shards? In other words, can I guaranty that any given search on a multi-shard collection will hit the same index generation of each shard? Thanks, Damien
Re: Re-Ranking results based on DocValues with custom function.
: dissimilarity functions). What I want to do is to search using common : text search and then (optionally) re-rank using some custom function : like : : http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc can you describe what you want your custom function to look like? it may already be possible using the existing functions provided out of hte box - just neeed to combine them to build up the mathc expression... https://wiki.apache.org/solr/FunctionQuery ...if you really want to write your own, just implement ValueSourceParser and register it in solrconfig.xml... https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser : I've seen that there are hooks in solrconfig.xml, but I did not find : an example or some documentation. I'd be most grateful if anyone could : either point me to one or give me a hint for another way to go :) when writing a custom plugin like this, the best thing to do is look at the existing examples of that plugin. almost all of hte built in ValueSourceParsers are really trivial, and can be found in tiny anonymous classes right inside the ValueSourceParser.java... For example, the function ot divide the results of two other fnctions... addParser("div", new ValueSourceParser() { @Override public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource a = fp.parseValueSource(); ValueSource b = fp.parseValueSource(); return new DivFloatFunction(a, b); } }); ..or, if you were trying to bundle that up in your own plugin jar and register it in solrconfig.xml, you might write it something like... public class DivideValueSourceParser extends ValueSourceParser { public DivideValueSourceParser() { } public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource a = fp.parseValueSource(); ValueSource b = fp.parseValueSource(); return new DivFloatFunction(a, b); } } and then register it as... depending on your needs, you may also want to write a custom ValueSource implementation (ie: instead of DivFloatFunction above) in which case, again, the best examples to look at are all of the existing ValueSource functions... https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html -Hoss
Re: requested url solr/update/extract not available on this server
: Is /solr/update working? more importantly: does "/solr/" work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F "myfile=@stump.winners.san.diego.2013.pdf" 01839 : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: solr-user@lucene.apache.org : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl "http://localhost:8080/solr/update/extract?literal.id=1&commit=true"; -F : "myfile=@cookbook.pdf" : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
TokenizerFactory from 4.2.0 to 4.3.0
TokenizerFactory changed, incompatibly with subclasses, from 4.2.0 to 4.3.0. Subclasses must now implement a different overload of create, and may not implement the old one. Has anyone got any devious strategies other than multiple copies of code to deal with this when supporting multiple versions of Solr?
CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)
Hello Solr experts, For some strange reason, collection alias does not work in my Solr instance when more than one collection is used. I would appreciate your help. # Here is my setup, which is quite simple: Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs for c1 and c2) Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included) created using remote core API calls # Symptoms: 1. Solr queries to each individual collection works fine: http://localhost:8983/solr/c1/select?q=*:* http://localhost:8983/solr/c2/select?q=*:* 2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine: http://localhost:8983/solr/cx/select?q=*:* 3. CREATEALIAS name=cx for c1 and c2 does not work: # Solr request/response to the collection alias (success): http://localhost:8983/solr/cx/select?q=*:* 5032*:*no servers hosting shard: 503 # Solr query using the alias fails with Error 503: "no servers hosting shard" curl -s "http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=cx&collections=c1,c2"; 0134 # Solr logs: 3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 3503224 [qtp724646150-11] INFO org.apache.solr.core.SolrCore ? [c1] webapp=/solr path=/select params={q=*:*} status=503 QTime=2 # solr.xml # zookeeper alias (same from solr/cloud UI): [zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json {"collection":{ "cx":"c1,c2"}} cZxid = 0x110d ctime = Fri Sep 13 17:25:18 PDT 2013 mZxid = 0x18d1 mtime = Mon Sep 16 16:31:21 PDT 2013 pZxid = 0x110d cversion = 0 dataVersion = 19 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 119 numChildren = 0 BTW, I've spent a lot of time figuring out how to make zookeeper and solr work together. The commands are not complex, but making them work sometimes requires a lot of digging online, to figure out missing jars for zkCli.sh, etc. I know a lot of things are changing since Solr 4.0, but I really hope the Solr documentation can be better maintained, so that people won't have to spend tons of hours figuring out simple steps (albeit complex under the hood) like this. Thanks! -- Regards, HaiXin = AIM : tivohtie Work : 408.914.9835 Mobile : 408.368.9289 Schedule : http://htie-linux/ = This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
Updated: CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)
Sorry but I've fixed some typos, updated text: Hello Solr experts, For some strange reason, collection alias does not work in my Solr instance when more than one collection is used. I would appreciate your help. # Here is my setup, which is quite simple: Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs for c1 and c2) Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included) created using remote core API calls # Symptoms: 1. Solr queries to each individual collection works fine: http://localhost:8983/solr/c1/select?q=*:* http://localhost:8983/solr/c2/select?q=*:* 2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine: http://localhost:8983/solr/cx/select?q=*:* 3. CREATEALIAS name=cx for c1 and c2 does not work: # Solr request/response to the collection alias (success): curl -s "http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=cx&collections=c1,c2"; 0134 # Solr query using the alias fails with Error 503: "no servers hosting shard" http://localhost:8983/solr/cx/select?q=*:* 5032*:*no servers hosting shard: 503 # Solr logs: 3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 3503224 [qtp724646150-11] INFO org.apache.solr.core.SolrCore ? [c1] webapp=/solr path=/select params={q=*:*} status=503 QTime=2 # solr.xml # zookeeper alias (same from solr/cloud UI): [zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json {"collection":{ "cx":"c1,c2"}} cZxid = 0x110d ctime = Fri Sep 13 17:25:18 PDT 2013 mZxid = 0x18d1 mtime = Mon Sep 16 16:31:21 PDT 2013 pZxid = 0x110d cversion = 0 dataVersion = 19 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 119 numChildren = 0 BTW, I've spent a lot of time figuring out how to make zookeeper and solr work together. The commands are not complex, but making them work sometimes requires a lot of digging online, to figure out missing jars for zkCli.sh, etc. I know a lot of things are changing since Solr 4.0, but I really hope the Solr documentation can be better maintained, so that people won't have to spend tons of hours figuring out simple steps (albeit complex under the hood) like this. Thanks! -- Regards, HaiXin This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
SPLITSHARD failure right before publishing the new sub-shards
Hi Solr experts, I am using Solr 4.4 with ZK 3.4.5, trying to split "shard1" of a collection named "body". There is only one core on one machine for this collection. When I call SPLITSHARD to split this collection, Solr is able to create two sub-shards, but failed with a NPE in SolrCore.java while publishing the new shards. It seems that either the updateHandler or its updateLog is null, though they work fine in the original shard: SolrCore.java if (cc != null && cc.isZooKeeperAware() && Slice.CONSTRUCTION.equals(cd.getCloudDescriptor().getShardState())) { // set update log to buffer before publishing the core 862: getUpdateHandler().getUpdateLog().bufferUpdates(); cd.getCloudDescriptor().setShardState(null); cd.getCloudDescriptor().setShardRange(null); } Here are the details. Any pointers to aid debugging this issue is greatly appreciated! # curl request/response to split the shard: curl -s "http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=body&shard=shard1"; 5002688org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'body_shard1_0_replica1': Unable to create core: body_shard1_0_replica1 Caused by: nullorg.apache.solr.common.SolrException:org.apache.solr.common.SolrException: SPLTSHARD failed to create subshard leadersSPLTSHARD failed to create subshard leaders500SPLTSHARD failed to create subshard leadersorg.apache.solr.common.SolrException: SPLTSHARD failed to create subshard leaders at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:171) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:322) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) 500 # Full solr log for the split shard call: 384779 [qtp334936591-10] INFO org.apache.solr.handler.admin.CollectionsHandler ? Splitting shard : shard=shard1&action=SPLITSHARD&collection=body 384791 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue ? Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged 384791 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue ? Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged 384797 [main-EventThread] INFO org.apache.solr.cloud.Di
how to make sure all the index docs flushed to the index files
Hi I'm using the DIH to import data from oracle database with Solr4.4 Finally I get 2.7GB index data and 4.1GB tlog data.And the number of docs was 1090. At first, I move the 2.7GB index data to another new Solr Server in tomcat7. After I start the tomcat ,I find the total number of docs was just half of the orginal number. So I thought that maybe the left docs were not commited to index files,and the tlog needed to be replayed . Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new Solr Server in tomcat7. After I start the tomcat,an exception comes up as [1]. Then it halts.I can not access the tomcat server URL. I noticed that the CPU utilization was high by using the comand: top -d 1 | grep tomcatPid. I thought solr was replaying the updatelog.And I wait a long time and it still was replaying. As results ,I give up. So I want to make sure after I finished the DIH import process ,whether the whole index was flushed into the index data files. Is there any steps I missed? How to make sure all the index were commited into the index files?. [1]-- 19380 [recoveryExecutor-6-thread-1] WARN org.apache.solr.update.UpdateLog ?.REPLAY_ERR: Exception replaying log java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:200) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:736) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:183) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:672) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)
2 question about solr and lucene
Hi, guys: I met two questions about solr and lucene, wish people to help out. use payload query but can NOT with numerical field type. for example: I implemented my own requesthandler, refer to http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ I query in solr:sinaTag:operate solr response: "numFound": 2, "start": 0, "maxScore": 99, "docs": [ { "id": "1628209010", "followersCount": 752, "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94", "score": 99 }, { "id": "1900546410", "followersCount": 1002, "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 Media|94 AD|93 Vehicle|92 ", "score": 96 } This work well. But query with combined with other numberical condition, such as: sinaTag:operate and followersCount:[752 TO 752] { "responseHeader": { "status": 0, "QTime": 40 }, "response": { "numFound": 0, "start": 0, "maxScore": 0, "docs": [] } } According these dataset, the first record should be responsed rather than NOT FOUND. I not know why. 2. About string field fuzzy match filtering, how to get the score? what the formula is? When I used two or several string fuzzy match, probable AND or OR, how to get the score? what the formula is? Might I implement myself score formula class which interface or abstract class to extend ? Thanks in advance.
Re: how to make sure all the index docs flushed to the index files
On 9/16/2013 8:26 PM, YouPeng Yang wrote: >I'm using the DIH to import data from oracle database with Solr4.4 >Finally I get 2.7GB index data and 4.1GB tlog data.And the number of > docs was 1090. > > At first, I move the 2.7GB index data to another new Solr Server in > tomcat7. After I start the tomcat ,I find the total number of docs was just > half of the orginal number. > So I thought that maybe the left docs were not commited to index > files,and the tlog needed to be replayed . You need to turn on autoCommit in your solrconfig.xml so that there are hard commits happening on a regular basis that flush all indexed data to disk and start new transaction log files. I will give you a link with some information about that below. > Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new > Solr Server in tomcat7. >After I start the tomcat,an exception comes up as [1]. >Then it halts.I can not access the tomcat server URL. > I noticed that the CPU utilization was high by using the comand: top > -d 1 | grep tomcatPid. > I thought solr was replaying the updatelog.And I wait a long time and it > still was replaying. As results ,I give up. I don't know what the exception was about, but it is likely that it WAS replaying the log. With 4.1GB of transaction log, that's going to take a LONG time, during which Solr will be unavailable. It always replays the entire transaction log. The key, as mentioned above, is in keeping that log small. Here's a wiki page about the slow startup problem and an example of how to configure autoCommit to deal with it: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup There's a lot of other good information on that page. Thanks, Shawn
Re: how to make sure all the index docs flushed to the index files
Hi Shawn Thank your very much for your reponse. I lauch the full-import task on the web page of solr/admin . And I do check the commit option. The new docs would be committed after the operation. The commit option is defferent with the autocommit,right? If the import datasets are too large that leads to poor performance or other problems ,such as [1]. The exception that indicate that -Too many open files-,we thought is because of the ulimit. [1] java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too many open files) 2013/9/17 Shawn Heisey > On 9/16/2013 8:26 PM, YouPeng Yang wrote: > >I'm using the DIH to import data from oracle database with Solr4.4 > >Finally I get 2.7GB index data and 4.1GB tlog data.And the number of > > docs was 1090. > > > > At first, I move the 2.7GB index data to another new Solr Server in > > tomcat7. After I start the tomcat ,I find the total number of docs was > just > > half of the orginal number. > > So I thought that maybe the left docs were not commited to index > > files,and the tlog needed to be replayed . > > You need to turn on autoCommit in your solrconfig.xml so that there are > hard commits happening on a regular basis that flush all indexed data to > disk and start new transaction log files. I will give you a link with > some information about that below. > > > Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new > > Solr Server in tomcat7. > >After I start the tomcat,an exception comes up as [1]. > >Then it halts.I can not access the tomcat server URL. > > I noticed that the CPU utilization was high by using the comand: > top > > -d 1 | grep tomcatPid. > > I thought solr was replaying the updatelog.And I wait a long time and it > > still was replaying. As results ,I give up. > > I don't know what the exception was about, but it is likely that it WAS > replaying the log. With 4.1GB of transaction log, that's going to take > a LONG time, during which Solr will be unavailable. It always replays > the entire transaction log. The key, as mentioned above, is in keeping > that log small. > > Here's a wiki page about the slow startup problem and an example of how > to configure autoCommit to deal with it: > > http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup > > There's a lot of other good information on that page. > > Thanks, > Shawn > >
Problem with SynonymFilter and StopFilterFactory
Hi, I have encoutered a problem applying StopFilterFactory and SynonimFilterFactory. The problem is that SynonymFilter removes the gaps that were previously put by the StopFilterFactory. I'm applying filters in query time, because users need to change synonym lists frequently. This is my schema, and an example of the issue: String: "documentacion para agentes" org.apache.solr.analysis.WhitespaceTokenizerFactory {luceneMatchVersion=LUCENE_35} position1 2 3 term text documentaciónpara agentes startOffset 0 14 19 endOffset 13 18 26 org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_35} position1 2 3 term text documentaciónpara agentes startOffset 0 14 19 endOffset 13 18 26 org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt, ignoreCase=true, enablePositionIncrements=true, luceneMatchVersion=LUCENE_35} position1 3 term text documentación agentes startOffset 0 19 endOffset 13 26 org.apache.solr.analysis.SynonymFilterFactory {synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true, luceneMatchVersion=LUCENE_35} position1 2 term text documentación agente archivo agentes typeSYNONYM SYNONYM SYNONYM SYNONYM startOffset 0 19 0 19 endOffset 1326 13 26 As you can see, the position should be 1 and 3, but SynonymFilter removes the gap and moves token from position 3 to 2 I've got the same problem with Solr 3.5 y 4.0. I don't know if it's a bug or an error with my configuration. In other schemas that I have worked with, I had always put the SynonymFilter previous to StopFilter, but in this I prefered using this order because of the big number of synonym that the list has (i.e. I don't want to generate a lot of synonyms for a word that I really wanted to remove). Thanks, David Dávila Atienza AEAT - Departamento de Informática Tributaria
Re: how to make sure all the index docs flushed to the index files
Hi Another werid problem. When we setup the autocommit properties, we suppose that the index fille will created every commited.So that the size of the index files will be large enough. We do not want to keep too many small files as [1]. How to control the size of the index files. [1] ...omited 548KBindex/_28w_Lucene41_0.doc 289KBindex/_28w_Lucene41_0.pos 1.1Mindex/_28w_Lucene41_0.tim 24Kindex/_28w_Lucene41_0.tip 2.1Mindex/_28w.fdt 766Bindex/_28w.fdx 5KBindex/_28w.fnm 40Kindex/_28w.nvd 79Kindex/_28w.nvm 364Bindex/_28w.si 518KBindex/_28x_Lucene41_0.doc 290KBindex/_28x_Lucene41_0.pos 1.2Mindex/_28x_Lucene41_0.tim 28Kindex/_28x_Lucene41_0.tip 2.1Mindex/_28x.fdt 843Bindex/_28x.fdx 5KBindex/_28x.fnm 40Kindex/_28x.nvd 79Kindex/_28x.nvm 386Bindex/_28x.si ...omited - 2013/9/17 YouPeng Yang > Hi Shawn > >Thank your very much for your reponse. > >I lauch the full-import task on the web page of solr/admin . And I do > check the commit option. > The new docs would be committed after the operation. > The commit option is defferent with the autocommit,right? If the import > datasets are too large that leads to poor performance or > other problems ,such as [1]. > >The exception that indicate that -Too many open files-,we thought is > because of the ulimit. > > > > > > [1] > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too many > open files) > > java.io.FileNotFoundException: > /data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too many > open files) > > > > 2013/9/17 Shawn Heisey > >> On 9/16/2013 8:26 PM, YouPeng Yang wrote: >> >I'm using the DIH to import data from oracle database with Solr4.4 >> >Finally I get 2.7GB index data and 4.1GB tlog data.And the number of >> > docs was 1090. >> > >> > At first, I move the 2.7GB index data to another new Solr Server in >> > tomcat7. After I start the tomcat ,I find the total number of docs was >> just >> > half of the orginal number. >> > So I thought that maybe the left docs were not commited to index >> > files,and the tlog needed to be replayed . >> >> You need to turn on autoCommit in your solrconfig.xml so that there are >> hard commits happening on a regular basis that flush all indexed data to >> disk and start new transaction log files. I will give you a link with >> some information about that below. >> >> > Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the >> new >> > Solr Server in tomcat7. >> >After I start the tomcat,an exception comes up as [1]. >> >Then it halts.I can not access the tomcat server URL. >> > I noticed that the CPU utilization was high by using the comand: >> top >> > -d 1 | grep tomcatPid. >> > I thought solr was replaying the updatelog.And I wait a long time and it >> > still was replaying. As results ,I g