Max no of solr cores supported and how to restrict a query to a particular core?
I want to know the maximum no of cores supported by Solr. 1000s or may be millions all under one solr instance ? Also I want to know how to redirect a particular query to a particular core. Actually I'm querying solr from Ajax, so I think there must be some request parameter that says which core we want to query, right? Can some one tell me how to do this, any good pointers on the same will be helpful as well. Thank you. --kk
Re: Max no of solr cores supported and how to restrict a query to a particular core?
http://wiki.apache.org/solr/CoreAdmin Best regards, Shishir On Thu, May 14, 2009 at 1:58 PM, KK wrote: > I want to know the maximum no of cores supported by Solr. 1000s or may be > millions all under one solr instance ? > Also I want to know how to redirect a particular query to a particular > core. > Actually I'm querying solr from Ajax, so I think there must be some request > parameter that says which core we want to query, right? Can some one tell > me > how to do this, any good pointers on the same will be helpful as well. > Thank you. > > --kk >
Re: Max no of solr cores supported and how to restrict a query to a particular core?
there is no hard limit on the no:of cores. it is limited by your system's ability to open files and the resources. the queries are automatically sent to appropriate core if your url is htt://host:port//select On Thu, May 14, 2009 at 1:58 PM, KK wrote: > I want to know the maximum no of cores supported by Solr. 1000s or may be > millions all under one solr instance ? > Also I want to know how to redirect a particular query to a particular core. > Actually I'm querying solr from Ajax, so I think there must be some request > parameter that says which core we want to query, right? Can some one tell me > how to do this, any good pointers on the same will be helpful as well. > Thank you. > > --kk > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Delete documents from index with dataimport
Hi Yes I'd like the document deleted from Solr and yes, there is a unique document id field in Solr. Regards Andrew Andrew 2009/5/13 Fergus McMenemie : >>Hi >> >>Is it possible, through dataimport handler to remove an existing >>document from the Solr index? >> >>I import/update from my database where the active field is true. >>However, if the client then set's active to false, the document stays >>in the Solr index and doesn't get removed. >> >>Regards >>Andrew > > Yes but only in the latest trunk. If your "active" field is false > do you want to see the document deleted? Do you have another field > which is a unique ID for the document? > > Fergus > -- > > === > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > === >
UK Solr users meeting?
I was wondering if there is an interest in a UK (South East) solr user group meeting Please let me know if you are interested. I am happy to organize. Regards, Colin
Re: UK Solr users meeting?
>I was wondering if there is an interest in a UK (South East) solr user >group meeting > >Please let me know if you are interested. I am happy to organize. > >Regards, > >Colin Yes Very interested. I am in lincolnshire. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Delete documents from index with dataimport
>Hi > >Yes I'd like the document deleted from Solr and yes, there is a unique >document id field in Solr. > I that case try the following. Create a field in the entity:- Notes. 1) the entity is assumed to have name="jc". 2) the uniqueKey field is assumed to called "id". 3) the entity needs to have transformer="RegexTransformer" > >2009/5/13 Fergus McMenemie : >>>Hi >>> >>>Is it possible, through dataimport handler to remove an existing >>>document from the Solr index? >>> >>>I import/update from my database where the active field is true. >>>However, if the client then set's active to false, the document stays >>>in the Solr index and doesn't get removed. >>> >>>Regards >>>Andrew >> >> Yes but only in the latest trunk. If your "active" field is false >> do you want to see the document deleted? Do you have another field >> which is a unique ID for the document? >> >> Fergus -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Max no of solr cores supported and how to restrict a query to a particular core?
Thank you very much. Got the point. One off the track question, can we automate the creation of new cores[it requires manually editing the solr.xml file as I know, and what about the location of core index directory, do we need to point that manually as well]. After going through the wiki what I found is we've to mention the names of cores in solr.xml. I want to automate the process in such a way that when a user registers[ on say my site for the service], we'll create a coresponding core for the same user and with a specific core id[unique for this user only] so that the user will be given a search interface that will redirect all searches for this user to http://host:port//select Will apprecite any ideas on this. Thanks, KK. 2009/5/14 Noble Paul നോബിള് नोब्ळ् > there is no hard limit on the no:of cores. it is limited by your > system's ability to open files and the resources. > the queries are automatically sent to appropriate core if your url is > > htt://host:port//select > > On Thu, May 14, 2009 at 1:58 PM, KK wrote: > > I want to know the maximum no of cores supported by Solr. 1000s or may be > > millions all under one solr instance ? > > Also I want to know how to redirect a particular query to a particular > core. > > Actually I'm querying solr from Ajax, so I think there must be some > request > > parameter that says which core we want to query, right? Can some one tell > me > > how to do this, any good pointers on the same will be helpful as well. > > Thank you. > > > > --kk > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
RE: Autocommit blocking adds? AutoCommit Speedup?
Hi all, I am also facing the same issue where autocommit blocks all other requests. I having around 1,00,000 documents with average size of 100K each. It took more than 20 hours to index. I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. Do I need more configuration changes? Also I see that memory usage goes to peak level of heap specified(6 GB in my case). Looks like Solr spends most of the time in GC. According to my understanding, fix for Solr-1155 would be that commit will run in background and new documents will be queued in the memory. But I am afraid of the memory consumption by this queue if commit takes much longer to complete. Thanks, Siddharth -Original Message- From: jayson.minard [mailto:jayson.min...@gmail.com] Sent: Saturday, May 09, 2009 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Autocommit blocking adds? AutoCommit Speedup? First cut of updated handler now in: https://issues.apache.org/jira/browse/SOLR-1155 Needs review from those that know Lucene better, and double check for errors in locking or other areas of the code. Thanks. --j jayson.minard wrote: > > Can we move this to patch files within the JIRA issue please. Will make > it easier to review and help out a as a patch to current trunk. > > --j > > > Jim Murphy wrote: >> >> >> >> Yonik Seeley-2 wrote: >>> >>> ...your code snippit elided and edited below ... >>> >> >> >> >> Don't take this code as correct (or even compiling) but is this the >> essence? I moved shared access to the writer inside the read lock and >> kept the other non-commit bits to the write lock. I'd need to rethink >> the locking in a more fundamental way but is this close to idea? >> >> >> >> public void commit(CommitUpdateCommand cmd) throws IOException { >> >> if (cmd.optimize) { >> optimizeCommands.incrementAndGet(); >> } else { >> commitCommands.incrementAndGet(); >> } >> >> Future[] waitSearcher = null; >> if (cmd.waitSearcher) { >> waitSearcher = new Future[1]; >> } >> >> boolean error=true; >> iwCommit.lock(); >> try { >> log.info("start "+cmd); >> >> if (cmd.optimize) { >> closeSearcher(); >> openWriter(); >> writer.optimize(cmd.maxOptimizeSegments); >> } >> finally { >> iwCommit.unlock(); >> } >> >> >> iwAccess.lock(); >> try >> { >> writer.commit(); >> } >> finally >> { >> iwAccess.unlock(); >> } >> >> iwCommit.lock(); >> try >> { >> callPostCommitCallbacks(); >> if (cmd.optimize) { >> callPostOptimizeCallbacks(); >> } >> // open a new searcher in the sync block to avoid opening it >> // after a deleteByQuery changed the index, or in between deletes >> // and adds of another commit being done. >> core.getSearcher(true,false,waitSearcher); >> >> // reset commit tracking >> tracker.didCommit(); >> >> log.info("end_commit_flush"); >> >> error=false; >> } >> finally { >> iwCommit.unlock(); >> addCommands.set(0); >> deleteByIdCommands.set(0); >> deleteByQueryCommands.set(0); >> numErrors.set(error ? 1 : 0); >> } >> >> // if we are supposed to wait for the searcher to be registered, then >> we should do it >> // outside of the synchronized block so that other update operations >> can proceed. >> if (waitSearcher!=null && waitSearcher[0] != null) { >>try { >> waitSearcher[0].get(); >> } catch (InterruptedException e) { >> SolrException.log(log,e); >> } catch (ExecutionException e) { >> SolrException.log(log,e); >> } >> } >> } >> >> >> >> > > -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2 3435224p23457422.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: master/slave failure scenario
Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this "hot" promotion is possible only if the slave is convigured as master also... 2009/5/14 Noble Paul നോബിള് नोब्ळ् > ideally , we don't do that. > you can just keep the master host behind a VIP so if you wish to > change the master make the VIP point to the new host > > On Wed, May 13, 2009 at 10:52 PM, nk 11 wrote: > > This is more interesting.Such a procedure would involve taking down and > > reconfiguring the slave? > > > > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot >wrote: > > > >> Or ... > >> > >> 1. Promote existing slave to new master > >> 2. Add new slave to cluster > >> > >> > >> > >> > >> -Bryan > >> > >> > >> > >> > >> > >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: > >> > >> - Migrate configuration files from old master (or backup) to new > master. > >>> - Replicate from a slave to the new master. > >>> - Resume indexing to new master. > >>> > >>> -Jay > >>> > >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 wrote: > >>> > >>> Nice. > What if the master fails permanently (like a disk crash...) and the > new > master is a clean machine? > 2009/5/13 Noble Paul നോബിള് नोब्ळ् > > On Wed, May 13, 2009 at 12:10 PM, nk 11 > wrote: > > > >> Hello > >> > >> I'm kind of new to Solr and I've read about replication, and the > fact > >> > > that a > > > >> node can act as both master and slave. > >> I a replica fails and then comes back on line I suppose that it will > >> > > resyncs > > > >> with the master. > >> > > right > > > >> > >> But what happnes if the master fails? A slave that is configured as > >> > > master > > > >> will kick in? What if that slave is not yes fully sync'ed with the > >> > > failed > > > master and has old data? > >> > > if the master fails you can't index the data. but the slaves will > > continue serving the requests with the last index. You an bring back > > the master up and resume indexing. > > > > > >> What happens when the original master comes back on line? He will > >> > > remain > > > a > > > >> slave because there is another node with the master role? > >> > >> Thank you! > >> > >> > > > > > > -- > > - > > Noble Paul | Principal Engineer| AOL | http://aol.com > > > > > > >> > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: Solr vs Sphinx
On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll wrote: > I've contacted > others in the past who have done "comparisons" and after one round of > emailing it was almost always clear that they didn't know what best > practices are for any given product and thus were doing things > sub-optimally. While I agree, one should properly match & tune all apps they are testing (for a fair comparison), we in turn must set out-of-the-box defaults (in Lucene and Solr) that get you as close to the "best practices" as possible. We don't always do that, and I think we should do better. My most recent example of this is BooleanQuery's performance. It turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable performance gain (27% on my most recent test) for OR queries. So why haven't we enabled this by default, already? (As far as I can tell it's functionally equivalent, as long as the Collector can accept out-of-order docs, which our core collectors can). We can't expect the "other camp" to discover that this obscure setting must be set, to maximize Lucene's OR query performance. Mike
Re: Solr vs Sphinx
> > > My most recent example of this is BooleanQuery's performance. It > turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable > performance gain (27% on my most recent test) for OR queries. > Mike, Can you please point me to some information concerning allowDocsOutOfOrder? What's this at all? -- Andrew Klochkov
Query syntax
Hello List, I need to search the multiple values from the same field. I am having the following syntax I am thinking of the first option. Can anyone tell me which one is correct syntax? Q=+title:=test +site_id:="22 3000676 566644" Q=+title:=test +site_id:=22 3000676 566644 Q=+title:=test +site_id:=22 +site_id=:3000676 Thanks, Radha.C
Re: Autocommit blocking adds? AutoCommit Speedup?
20+ hours? I index 3 million records in 3 hours. Is your auto commit causing a snapshot? What do you have listed in the events. Jack On 5/14/09, Gargate, Siddharth wrote: > Hi all, > I am also facing the same issue where autocommit blocks all > other requests. I having around 1,00,000 documents with average size of > 100K each. It took more than 20 hours to index. > I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. > Do I need more configuration changes? > Also I see that memory usage goes to peak level of heap specified(6 GB > in my case). Looks like Solr spends most of the time in GC. > According to my understanding, fix for Solr-1155 would be that commit > will run in background and new documents will be queued in the memory. > But I am afraid of the memory consumption by this queue if commit takes > much longer to complete. > > Thanks, > Siddharth > > -Original Message- > From: jayson.minard [mailto:jayson.min...@gmail.com] > Sent: Saturday, May 09, 2009 10:45 AM > To: solr-user@lucene.apache.org > Subject: Re: Autocommit blocking adds? AutoCommit Speedup? > > > First cut of updated handler now in: > https://issues.apache.org/jira/browse/SOLR-1155 > > Needs review from those that know Lucene better, and double check for > errors > in locking or other areas of the code. Thanks. > > --j > > > jayson.minard wrote: >> >> Can we move this to patch files within the JIRA issue please. Will > make >> it easier to review and help out a as a patch to current trunk. >> >> --j >> >> >> Jim Murphy wrote: >>> >>> >>> >>> Yonik Seeley-2 wrote: ...your code snippit elided and edited below ... >>> >>> >>> >>> Don't take this code as correct (or even compiling) but is this the >>> essence? I moved shared access to the writer inside the read lock > and >>> kept the other non-commit bits to the write lock. I'd need to > rethink >>> the locking in a more fundamental way but is this close to idea? >>> >>> >>> >>> public void commit(CommitUpdateCommand cmd) throws IOException { >>> >>> if (cmd.optimize) { >>> optimizeCommands.incrementAndGet(); >>> } else { >>> commitCommands.incrementAndGet(); >>> } >>> >>> Future[] waitSearcher = null; >>> if (cmd.waitSearcher) { >>> waitSearcher = new Future[1]; >>> } >>> >>> boolean error=true; >>> iwCommit.lock(); >>> try { >>> log.info("start "+cmd); >>> >>> if (cmd.optimize) { >>> closeSearcher(); >>> openWriter(); >>> writer.optimize(cmd.maxOptimizeSegments); >>> } >>> finally { >>> iwCommit.unlock(); >>> } >>> >>> >>> iwAccess.lock(); >>> try >>> { >>> writer.commit(); >>> } >>> finally >>> { >>> iwAccess.unlock(); >>> } >>> >>> iwCommit.lock(); >>> try >>> { >>> callPostCommitCallbacks(); >>> if (cmd.optimize) { >>> callPostOptimizeCallbacks(); >>> } >>> // open a new searcher in the sync block to avoid opening it >>> // after a deleteByQuery changed the index, or in between > deletes >>> // and adds of another commit being done. >>> core.getSearcher(true,false,waitSearcher); >>> >>> // reset commit tracking >>> tracker.didCommit(); >>> >>> log.info("end_commit_flush"); >>> >>> error=false; >>> } >>> finally { >>> iwCommit.unlock(); >>> addCommands.set(0); >>> deleteByIdCommands.set(0); >>> deleteByQueryCommands.set(0); >>> numErrors.set(error ? 1 : 0); >>> } >>> >>> // if we are supposed to wait for the searcher to be registered, > then >>> we should do it >>> // outside of the synchronized block so that other update > operations >>> can proceed. >>> if (waitSearcher!=null && waitSearcher[0] != null) { >>>try { >>> waitSearcher[0].get(); >>> } catch (InterruptedException e) { >>> SolrException.log(log,e); >>> } catch (ExecutionException e) { >>> SolrException.log(log,e); >>> } >>> } >>> } >>> >>> >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2 > 3435224p23457422.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Sent from my mobile device
Date field
Does anyone know if there is still a bug in date fields? I'm having a problem boosting documents by date in solr 1.3 Thank, Jack -- Sent from my mobile device
Re: Query syntax
On Thu, May 14, 2009 at 5:20 PM, Radha C. wrote: > I need to search the multiple values from the same field. I am having the > following syntax > > I am thinking of the first option. Can anyone tell me which one is correct > syntax? > > Q=+title:=test +site_id:="22 3000676 566644" > > Q=+title:=test +site_id:=22 3000676 566644 > > Q=+title:=test +site_id:=22 +site_id=:3000676 > > None of the above. That ":=" is not a valid syntax. The request parameter should be a lower cased "q". The "+" character signifies "must occur" similar to a boolean AND. Should title:test must match? Should all of "22", "3000676" etc be present in site_id or just one match is alright? -- Regards, Shalin Shekhar Mangar.
RE: Query syntax
Thanks for your reply. Yes by mistaken I added := in place of ":" . The title should match and the site_id should match any of these 23243455 , 245, 3457676 . _ From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Thursday, May 14, 2009 5:43 PM To: solr-user@lucene.apache.org; cra...@ceiindia.com Subject: Re: Query syntax On Thu, May 14, 2009 at 5:20 PM, Radha C. wrote: I need to search the multiple values from the same field. I am having the following syntax I am thinking of the first option. Can anyone tell me which one is correct syntax? Q=+title:=test +site_id:="22 3000676 566644" Q=+title:=test +site_id:=22 3000676 566644 Q=+title:=test +site_id:=22 +site_id=:3000676 None of the above. That ":=" is not a valid syntax. The request parameter should be a lower cased "q". The "+" character signifies "must occur" similar to a boolean AND. Should title:test must match? Should all of "22", "3000676" etc be present in site_id or just one match is alright? -- Regards, Shalin Shekhar Mangar.
Re: Query syntax
In that case, the following will work: q=+title:test +site_id:(23243455 245 3457676) On Thu, May 14, 2009 at 5:35 PM, Radha C. wrote: > Thanks for your reply. > > > > Yes by mistaken I added := in place of ":" . The title should match and the > site_id should match any of these 23243455 , 245, 3457676 . > > > > > > > > _ > > From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] > Sent: Thursday, May 14, 2009 5:43 PM > To: solr-user@lucene.apache.org; cra...@ceiindia.com > Subject: Re: Query syntax > > > > On Thu, May 14, 2009 at 5:20 PM, Radha C. wrote: > > I need to search the multiple values from the same field. I am having the > following syntax > > I am thinking of the first option. Can anyone tell me which one is correct > syntax? > > Q=+title:=test +site_id:="22 3000676 566644" > > Q=+title:=test +site_id:=22 3000676 566644 > > Q=+title:=test +site_id:=22 +site_id=:3000676 > > > > > None of the above. That ":=" is not a valid syntax. The request parameter > should be a lower cased "q". The "+" character signifies "must occur" > similar to a boolean AND. > > Should title:test must match? Should all of "22", "3000676" etc be > present in site_id or just one match is alright? > -- > Regards, > Shalin Shekhar Mangar. > > -- Regards, Shalin Shekhar Mangar.
Re: master/slave failure scenario
oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? > > 2009/5/14 Noble Paul നോബിള് नोब्ळ् > >> On Thu, May 14, 2009 at 4:07 PM, nk 11 wrote: >> > Ok so the VIP will point to the new master. but what makes a slave >> promoted >> > to a master? Only the fact that it will receive add/update requests? >> > And I suppose that this "hot" promotion is possible only if the slave is >> > convigured as master also... >> right.. By default you can setup all slaves to be master also. It does >> not cost anything if it is not serving any requests. >> >> so , if you have such a setting you will have to disable that slave to >> be a slave and restart it and you will have to make the VIP point to >> this new slave as master. >> >> so hot promotion is still not possible. >> > >> > 2009/5/14 Noble Paul നോബിള് नोब्ळ् >> >> >> >> ideally , we don't do that. >> >> you can just keep the master host behind a VIP so if you wish to >> >> change the master make the VIP point to the new host >> >> >> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 >> wrote: >> >> > This is more interesting.Such a procedure would involve taking down >> and >> >> > reconfiguring the slave? >> >> > >> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot >> >> > wrote: >> >> > >> >> >> Or ... >> >> >> >> >> >> 1. Promote existing slave to new master >> >> >> 2. Add new slave to cluster >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -Bryan >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: >> >> >> >> >> >> - Migrate configuration files from old master (or backup) to new >> >> >> master. >> >> >>> - Replicate from a slave to the new master. >> >> >>> - Resume indexing to new master. >> >> >>> >> >> >>> -Jay >> >> >>> >> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 >> wrote: >> >> >>> >> >> >>> Nice. >> >> What if the master fails permanently (like a disk crash...) and >> the >> >> new >> >> master is a clean machine? >> >> 2009/5/13 Noble Paul നോബിള് नोब्ळ् >> >> >> >> On Wed, May 13, 2009 at 12:10 PM, nk 11 >> >> wrote: >> >> > >> >> >> Hello >> >> >> >> >> >> I'm kind of new to Solr and I've read about replication, and the >> >> >> fact >> >> >> >> >> > that a >> >> > >> >> >> node can act as both master and slave. >> >> >> I a replica fails and then comes back on line I suppose that it >> >> >> will >> >> >> >> >> > resyncs >> >> > >> >> >> with the master. >> >> >> >> >> > right >> >> > >> >> >> >> >> >> But what happnes if the master fails? A slave that is configured >> as >> >> >> >> >> > master >> >> > >> >> >> will kick in? What if that slave is not yes fully sync'ed with >> the >> >> >> >> >> > failed >> >> >> >> > master and has old data? >> >> >> >> >> > if the master fails you can't index the data. but the slaves will >> >> > continue serving the requests with the last index. You an bring >> back >> >> > the master up and resume indexing. >> >> > >> >> > >> >> >> What happens when the original master comes back on line? He >> will >> >> >> >> >> > remain >> >> >> >> > a >> >> > >> >> >> slave because there is another node with the master role? >> >> >> >> >> >> Thank you! >> >> >> >> >> >> >> >> > >> >> > >> >> > -- >> >> > - >> >> > Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> -- >> >> - >> >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > >> > >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > >
Re: Solr vs Sphinx
Totally agree on optimizing out of the box experience, it's just never a one size fits all thing. And we have to be very careful about micro- benchmarks driving these settings. Currently, many of us use Wikipedia, but that's just one doc set and I'd venture to say most Solr users do not have docs that look anything like Wikipedia. One of the things the Open Relevance project (http://wiki.apache.org/lucene-java/OpenRelevance , see the discussion on gene...@lucene.a.o) should aim to do is bring in a variety of test collections, from lots of different genres. This will help both with relevance and with speed testing. -Grant On May 14, 2009, at 6:47 AM, Michael McCandless wrote: On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll wrote: I've contacted others in the past who have done "comparisons" and after one round of emailing it was almost always clear that they didn't know what best practices are for any given product and thus were doing things sub-optimally. While I agree, one should properly match & tune all apps they are testing (for a fair comparison), we in turn must set out-of-the-box defaults (in Lucene and Solr) that get you as close to the "best practices" as possible. We don't always do that, and I think we should do better. My most recent example of this is BooleanQuery's performance. It turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable performance gain (27% on my most recent test) for OR queries. So why haven't we enabled this by default, already? (As far as I can tell it's functionally equivalent, as long as the Collector can accept out-of-order docs, which our core collectors can). We can't expect the "other camp" to discover that this obscure setting must be set, to maximize Lucene's OR query performance. Mike -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: master/slave failure scenario
yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 wrote: > sorry for the mail. I wanted to hit reply :( > > On Thu, May 14, 2009 at 3:37 PM, nk 11 wrote: >> >> oh, so the configuration must be manualy changed? >> Can't something be passed at (re)start time? >> >> 2009/5/14 Noble Paul നോബിള് नोब्ळ् >>> >>> On Thu, May 14, 2009 at 4:07 PM, nk 11 wrote: >>> > Ok so the VIP will point to the new master. but what makes a slave >>> > promoted >>> > to a master? Only the fact that it will receive add/update requests? >>> > And I suppose that this "hot" promotion is possible only if the slave >>> > is >>> > convigured as master also... >>> right.. By default you can setup all slaves to be master also. It does >>> not cost anything if it is not serving any requests. >>> >>> so , if you have such a setting you will have to disable that slave to >>> be a slave and restart it and you will have to make the VIP point to >>> this new slave as master. >>> >>> so hot promotion is still not possible. >>> > >>> > 2009/5/14 Noble Paul നോബിള് नोब्ळ् >>> >> >>> >> ideally , we don't do that. >>> >> you can just keep the master host behind a VIP so if you wish to >>> >> change the master make the VIP point to the new host >>> >> >>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 >>> >> wrote: >>> >> > This is more interesting.Such a procedure would involve taking down >>> >> > and >>> >> > reconfiguring the slave? >>> >> > >>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot >>> >> > wrote: >>> >> > >>> >> >> Or ... >>> >> >> >>> >> >> 1. Promote existing slave to new master >>> >> >> 2. Add new slave to cluster >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> -Bryan >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: >>> >> >> >>> >> >> - Migrate configuration files from old master (or backup) to new >>> >> >> master. >>> >> >>> - Replicate from a slave to the new master. >>> >> >>> - Resume indexing to new master. >>> >> >>> >>> >> >>> -Jay >>> >> >>> >>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 >>> >> >>> wrote: >>> >> >>> >>> >> >>> Nice. >>> >> What if the master fails permanently (like a disk crash...) and >>> >> the >>> >> new >>> >> master is a clean machine? >>> >> 2009/5/13 Noble Paul നോബിള് नोब्ळ् >>> >> >>> >> On Wed, May 13, 2009 at 12:10 PM, nk 11 >>> >> wrote: >>> >> > >>> >> >> Hello >>> >> >> >>> >> >> I'm kind of new to Solr and I've read about replication, and >>> >> >> the >>> >> >> fact >>> >> >> >>> >> > that a >>> >> > >>> >> >> node can act as both master and slave. >>> >> >> I a replica fails and then comes back on line I suppose that it >>> >> >> will >>> >> >> >>> >> > resyncs >>> >> > >>> >> >> with the master. >>> >> >> >>> >> > right >>> >> > >>> >> >> >>> >> >> But what happnes if the master fails? A slave that is >>> >> >> configured as >>> >> >> >>> >> > master >>> >> > >>> >> >> will kick in? What if that slave is not yes fully sync'ed with >>> >> >> the >>> >> >> >>> >> > failed >>> >> >>> >> > master and has old data? >>> >> >> >>> >> > if the master fails you can't index the data. but the slaves >>> >> > will >>> >> > continue serving the requests with the last index. You an bring >>> >> > back >>> >> > the master up and resume indexing. >>> >> > >>> >> > >>> >> >> What happens when the original master comes back on line? He >>> >> >> will >>> >> >> >>> >> > remain >>> >> >>> >> > a >>> >> > >>> >> >> slave because there is another node with the master role? >>> >> >> >>> >> >> Thank you! >>> >> >> >>> >> >> >>> >> > >>> >> > >>> >> > -- >>> >> > - >>> >> > Noble Paul | Principal Engineer| AOL | http://aol.com >>> >> > >>> >> > >>> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> - >>> >> Noble Paul | Principal Engineer| AOL | http://aol.com >>> > >>> > >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >> > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Custom Servlet Filter, Where to put filter-mappings
I like Grant's suggestion as the simplest solution. As for XML merging and XSLT, I really wouldn't want to go that route personally, but one solution that comes close to that is to template web.xml with some substitution tags and use Ant's ability to replace tokens. So we could put in @FILTER@ and @FILTER_MAPPING@ placeholders in web.xml and pull in the replacements from fragment files. But even with all of these fancy options available, I'd still just use the alternate web.xml technique that Grant proposed. Erik On May 13, 2009, at 10:55 PM, Jacob Singh wrote: HI Grant, That's not a bad idea... I could try that. I was also looking at cactus: http://jakarta.apache.org/cactus/integration/ant/index.html It has an ant task to merge XML. Could this be a contrib-crawl add- on? Alternately, do you know of any xslt templates built for this? Could write one, but that's a fair bit of work to support everything. Perhaps an xslt task combined with a contrib-crawl would do the trick? Best, -J On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll wrote: Hmmm, maybe we need to think about someway to hook this into the build process or make it easier to just drop it into the conf or lib dirs. I'm no web.xml expert, but I'm sure you're not the first one to want to do this kind of thing. The easiest way _might_ be to patch build.xml to take a property for the location of the web.xml, defaulting to the current Solr one. Then, people who want to use their own version could just pass in - Dweb.xml=web.xml>. The downside to this is that it may cause problems for us devs when users ask questions about strange behavior and it turns out they have mucked up the web.xml FYI: dist-war is in build.xml, not common-build.xml. -Grant On May 12, 2009, at 5:52 AM, Jacob Singh wrote: Hi folks, I just wrote a Servlet Filter to handle authentication for our service. Here's what I did: 1. Created a dir in contrib 2. Put my project in there, I took the dataimporthandler build.xml as an example and modified it to suit my needs. Worked great! 3. ant dist now builds my jar and includes it I now need to modify web.xml to add my filter-mapping, init params, etc. How can I do this cleanly? Or do I need to manually open up the archive and edit it and then re-war it? In common-build I don't see a target for dist-war, so don't see how it is possible... Thanks! Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Max no of solr cores supported and how to restrict a query to a particular core?
Solr already supports this . please refer this http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08 ensure that your solr.xml is persistent http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8 On Thu, May 14, 2009 at 3:43 PM, KK wrote: > Thank you very much. Got the point. > One off the track question, can we automate the creation of new cores[it > requires manually editing the solr.xml file as I know, and what about the > location of core index directory, do we need to point that manually as > well]. > After going through the wiki what I found is we've to mention the names of > cores in solr.xml. I want to automate the process in such a way that when a > user registers[ on say my site for the service], we'll create a coresponding > core for the same user and with a specific core id[unique for this user > only] so that the user will be given a search interface that will redirect > all searches for this user to http://host:port/ user>/select > Will apprecite any ideas on this. > > Thanks, > KK. > > 2009/5/14 Noble Paul നോബിള് नोब्ळ् > >> there is no hard limit on the no:of cores. it is limited by your >> system's ability to open files and the resources. >> the queries are automatically sent to appropriate core if your url is >> >> htt://host:port//select >> >> On Thu, May 14, 2009 at 1:58 PM, KK wrote: >> > I want to know the maximum no of cores supported by Solr. 1000s or may be >> > millions all under one solr instance ? >> > Also I want to know how to redirect a particular query to a particular >> core. >> > Actually I'm querying solr from Ajax, so I think there must be some >> request >> > parameter that says which core we want to query, right? Can some one tell >> me >> > how to do this, any good pointers on the same will be helpful as well. >> > Thank you. >> > >> > --kk >> > >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr vs Sphinx
On Thu, May 14, 2009 at 06:47:01AM -0400, Michael McCandless wrote: > While I agree, one should properly match & tune all apps they are > testing (for a fair comparison), we in turn must set out-of-the-box > defaults (in Lucene and Solr) that get you as close to the "best > practices" as possible. So, should Lucene use the non-compound file format by default because some idiot's sloppy benchmarks might run a smidge faster, even though that will cause many users to run out of file descriptors? Anyone doing comparative benchmarking who doesn't submit their code to the support list for the software under review is either a dolt or a propagandist. Good benchmarking is extremely difficult, like all experimental science. If there isn't ample evidence that the benchmarker appreciates that, their tests aren't worth a second thought. If you don't avail yourself of the help of experts when assembling your experiment, you are unserious. Richard Feynman: "...if you're doing an experiment, you should report everything that you think might make it invalid - not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked - to make sure the other fellow can tell they have been eliminated." Marvin Humphrey
Re: master/slave failure scenario
wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् > yeah there is a hack > > https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316 > > On Thu, May 14, 2009 at 6:07 PM, nk 11 wrote: > > sorry for the mail. I wanted to hit reply :( > > > > On Thu, May 14, 2009 at 3:37 PM, nk 11 wrote: > >> > >> oh, so the configuration must be manualy changed? > >> Can't something be passed at (re)start time? > >> > >> 2009/5/14 Noble Paul നോബിള് नोब्ळ् > >>> > >>> On Thu, May 14, 2009 at 4:07 PM, nk 11 wrote: > >>> > Ok so the VIP will point to the new master. but what makes a slave > >>> > promoted > >>> > to a master? Only the fact that it will receive add/update requests? > >>> > And I suppose that this "hot" promotion is possible only if the slave > >>> > is > >>> > convigured as master also... > >>> right.. By default you can setup all slaves to be master also. It does > >>> not cost anything if it is not serving any requests. > >>> > >>> so , if you have such a setting you will have to disable that slave to > >>> be a slave and restart it and you will have to make the VIP point to > >>> this new slave as master. > >>> > >>> so hot promotion is still not possible. > >>> > > >>> > 2009/5/14 Noble Paul നോബിള് नोब्ळ् > >>> >> > >>> >> ideally , we don't do that. > >>> >> you can just keep the master host behind a VIP so if you wish to > >>> >> change the master make the VIP point to the new host > >>> >> > >>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 > >>> >> wrote: > >>> >> > This is more interesting.Such a procedure would involve taking > down > >>> >> > and > >>> >> > reconfiguring the slave? > >>> >> > > >>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot > >>> >> > wrote: > >>> >> > > >>> >> >> Or ... > >>> >> >> > >>> >> >> 1. Promote existing slave to new master > >>> >> >> 2. Add new slave to cluster > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -Bryan > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: > >>> >> >> > >>> >> >> - Migrate configuration files from old master (or backup) to new > >>> >> >> master. > >>> >> >>> - Replicate from a slave to the new master. > >>> >> >>> - Resume indexing to new master. > >>> >> >>> > >>> >> >>> -Jay > >>> >> >>> > >>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 > >>> >> >>> wrote: > >>> >> >>> > >>> >> >>> Nice. > >>> >> What if the master fails permanently (like a disk crash...) and > >>> >> the > >>> >> new > >>> >> master is a clean machine? > >>> >> 2009/5/13 Noble Paul നോബിള് नोब्ळ् > >>> >> > >>> >> On Wed, May 13, 2009 at 12:10 PM, nk 11 < > nick.cass...@gmail.com> > >>> >> wrote: > >>> >> > > >>> >> >> Hello > >>> >> >> > >>> >> >> I'm kind of new to Solr and I've read about replication, and > >>> >> >> the > >>> >> >> fact > >>> >> >> > >>> >> > that a > >>> >> > > >>> >> >> node can act as both master and slave. > >>> >> >> I a replica fails and then comes back on line I suppose that > it > >>> >> >> will > >>> >> >> > >>> >> > resyncs > >>> >> > > >>> >> >> with the master. > >>> >> >> > >>> >> > right > >>> >> > > >>> >> >> > >>> >> >> But what happnes if the master fails? A slave that is > >>> >> >> configured as > >>> >> >> > >>> >> > master > >>> >> > > >>> >> >> will kick in? What if that slave is not yes fully sync'ed > with > >>> >> >> the > >>> >> >> > >>> >> > failed > >>> >> > >>> >> > master and has old data? > >>> >> >> > >>> >> > if the master fails you can't index the data. but the slaves > >>> >> > will > >>> >> > continue serving the requests with the last index. You an > bring > >>> >> > back > >>> >> > the master up and resume indexing. > >>> >> > > >>> >> > > >>> >> >> What happens when the original master comes back on line? He > >>> >> >> will > >>> >> >> > >>> >> > remain > >>> >> > >>> >> > a > >>> >> > > >>> >> >> slave because there is another node with the master role? > >>> >> >> > >>> >> >> Thank you! > >>> >> >> > >>> >> >> > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > - > >>> >> > Noble Paul | Principal Engineer| AOL | http://aol.com > >>> >> > > >>> >> > > >>> >> > >>> >> >> > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> - > >>> >> Noble Paul | Principal Engineer| AOL | http://aol.com > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> - > >>> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > > > -- >
RE: Autocommit blocking adds? AutoCommit Speedup?
Siddharth, The settings you have in your solrconfig for ramBufferSizeMB and maxBufferedDocs control how much memory may be used during indexing besides any overhead with the documents being "in-flight" at a given moment (deserialized into memory but not yet handed to lucene). There are streaming versions of the client/server that help with that as well by trying to process them as they arrive. The patch SOLR-1155 does not add more memory use, but rather lets the threads proceed through to Lucene without blocking within Solr as often. So instead of a stuck thread holding the documents in memory they will be moving threads doing the same. So the buffer sizes mentioned above along with the amount of documents you send at a time will push your memory footprint. Send smaller batches (less efficient) or stream; or make sure you have enough memory for the amount of docs you send at a time. For indexing I slow my commits down if there is no need for the documents to become available for query right away. For pure indexing, a long autoCommit time and large max document count ebfore auto committing helps. Committing isn't what flushes them out of memory, it is what makes the on-disk version part of the overall index. Over committing will slow you way down. Especially if you have any listeners on the commits doing a lot of work (i.e. Solr distribution). Also, if you are querying on the indexer that can eat memory and compete with the memory you are trying to reserve for indexing. So a split model of indexing and querying on different instances lets you tune each the best; but then you have a gap in time from indexing to querying as the trade-off. It is hard to say what is going on with GC without knowing what garbage collection settings you are passing to the VM, and what version of the Java VM you are using. Which garbage collector are you using and what tuning parameters? I tend to use Parallel GC on my indexers with GC Overhead limit turned off allowing for some pauses (which users don't see on a back-end indexer) but good GC with lower heap fragmentation. I tend to use concurrent mark and sweep GC on my query slaves with tuned incremental mode and pacing which is a low pause collector taking advantage of the cores on my servers and can incrementally keep up with the needs of a query slave. -- Jayson Gargate, Siddharth wrote: > > Hi all, > I am also facing the same issue where autocommit blocks all > other requests. I having around 1,00,000 documents with average size of > 100K each. It took more than 20 hours to index. > I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. > Do I need more configuration changes? > Also I see that memory usage goes to peak level of heap specified(6 GB > in my case). Looks like Solr spends most of the time in GC. > According to my understanding, fix for Solr-1155 would be that commit > will run in background and new documents will be queued in the memory. > But I am afraid of the memory consumption by this queue if commit takes > much longer to complete. > > Thanks, > Siddharth > > -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Indexing speed comes down to a lot of factors. The settings as talked about above, VM settings, the size of the documents, how many are sent at a time, how active you can keep the indexer (i.e. one thread sending documents lets the indexer relax whereas N threads keeps pressure on the indexer), how often you commit and of course the hardware you are running on. Disk I/O is a big factor along with having enough cores and memory to buffer and process the documents. Comparing two sets of numbers is tough. We have indexes that range from indexing a few million an hour up through 18-20M per hour in a indexing cluster for distributed search. --j Jack Godwin wrote: > > 20+ hours? I index 3 million records in 3 hours. Is your auto commit > causing a snapshot? What do you have listed in the events. > > Jack > > On 5/14/09, Gargate, Siddharth wrote: >> Hi all, >> I am also facing the same issue where autocommit blocks all >> other requests. I having around 1,00,000 documents with average size of >> 100K each. It took more than 20 hours to index. >> I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. >> Do I need more configuration changes? >> Also I see that memory usage goes to peak level of heap specified(6 GB >> in my case). Looks like Solr spends most of the time in GC. >> According to my understanding, fix for Solr-1155 would be that commit >> will run in background and new documents will be queued in the memory. >> But I am afraid of the memory consumption by this queue if commit takes >> much longer to complete. >> >> Thanks, >> Siddharth >> > > -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr vs Sphinx
On Thu, May 14, 2009 at 6:51 AM, Andrey Klochkov wrote: > Can you please point me to some information concerning allowDocsOutOfOrder? > What's this at all? There is this cryptic static setter (in Lucene): BooleanQuery.setAllowDocsOutOfOrder(boolean) It defaults to false, which means BooleanScorer2 will always be used to compute hits for a BooleanQuery. When set to true, BooleanScorer will instead be used, when possible. BooleanScorer gets better performance, but it collects docs out of order, which for some external collectors might cause a problem. All of Lucene's core collectors work fine with out-of-order collection (but I'm not sure about Solr's collectors). If you experiment with this, please post back with your results! Mike
Additional metadata when using Solr Cell
Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Max no of solr cores supported and how to restrict a query to a particular core?
Thank you very much. LOL, Its in the same wiki I was told to go through. I've a question regarding creating ofsolr cores on the fly. The wiki says, .Creates a new core and register it. If persistence is enabled (persist=true), the configuration for this new core will be saved in 'solr.xml'. If a core with the same name exists, while the "new" created core is initializing, the "old" one will continue to accept requests. Once it has finished, all new request will go to the "new" core, and the "old" core will be unloaded. So I've to wait for some time [say a couple of secs, may be less than that] before I start adding pages to that core. I think this is the way to handle it , otherwise some content which should have been indexed by the new core, will get indexed by the existing core[as the wiki says], which I don't want to happen. Any other ideas for handling the same. Thanks, KK. 2009/5/14 Noble Paul നോബിള് नोब्ळ् > Solr already supports this . > please refer this > > http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08 > > ensure that your solr.xml is persistent > > http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8 > > On Thu, May 14, 2009 at 3:43 PM, KK wrote: > > Thank you very much. Got the point. > > One off the track question, can we automate the creation of new cores[it > > requires manually editing the solr.xml file as I know, and what about the > > location of core index directory, do we need to point that manually as > > well]. > > After going through the wiki what I found is we've to mention the names > of > > cores in solr.xml. I want to automate the process in such a way that when > a > > user registers[ on say my site for the service], we'll create a > coresponding > > core for the same user and with a specific core id[unique for this user > > only] so that the user will be given a search interface that will > redirect > > all searches for this user to http://host:port/ this > > user>/select > > Will apprecite any ideas on this. > > > > Thanks, > > KK. > > > > 2009/5/14 Noble Paul നോബിള് नोब्ळ् > > > >> there is no hard limit on the no:of cores. it is limited by your > >> system's ability to open files and the resources. > >> the queries are automatically sent to appropriate core if your url is > >> > >> htt://host:port//select > >> > >> On Thu, May 14, 2009 at 1:58 PM, KK wrote: > >> > I want to know the maximum no of cores supported by Solr. 1000s or may > be > >> > millions all under one solr instance ? > >> > Also I want to know how to redirect a particular query to a particular > >> core. > >> > Actually I'm querying solr from Ajax, so I think there must be some > >> request > >> > parameter that says which core we want to query, right? Can some one > tell > >> me > >> > how to do this, any good pointers on the same will be helpful as well. > >> > Thank you. > >> > > >> > --kk > >> > > >> > >> > >> > >> -- > >> - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: Custom Servlet Filter, Where to put filter-mappings
I found a very elegant (I think) solution to this. I'll post a patch today or tomorrow. Best, -Jacob On Thu, May 14, 2009 at 6:22 PM, Erik Hatcher wrote: > I like Grant's suggestion as the simplest solution. > > As for XML merging and XSLT, I really wouldn't want to go that route > personally, but one solution that comes close to that is to template web.xml > with some substitution tags and use Ant's ability to replace tokens. So we > could put in @FILTER@ and @FILTER_MAPPING@ placeholders in web.xml and pull > in the replacements from fragment files. But even with all of these fancy > options available, I'd still just use the alternate web.xml technique that > Grant proposed. > > Erik > > > On May 13, 2009, at 10:55 PM, Jacob Singh wrote: > >> HI Grant, >> >> That's not a bad idea... I could try that. I was also looking at cactus: >> http://jakarta.apache.org/cactus/integration/ant/index.html >> >> It has an ant task to merge XML. Could this be a contrib-crawl add-on? >> >> Alternately, do you know of any xslt templates built for this? Could >> write one, but that's a fair bit of work to support everything. >> Perhaps an xslt task combined with a contrib-crawl would do the trick? >> >> Best, >> -J >> >> On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll >> wrote: >>> >>> Hmmm, maybe we need to think about someway to hook this into the build >>> process or make it easier to just drop it into the conf or lib dirs. I'm >>> no >>> web.xml expert, but I'm sure you're not the first one to want to do this >>> kind of thing. >>> >>> The easiest way _might_ be to patch build.xml to take a property for the >>> location of the web.xml, defaulting to the current Solr one. Then, >>> people >>> who want to use their own version could just pass in -Dweb.xml=>> my >>> web.xml>. The downside to this is that it may cause problems for us devs >>> when users ask questions about strange behavior and it turns out they >>> have >>> mucked up the web.xml >>> >>> FYI: dist-war is in build.xml, not common-build.xml. >>> >>> -Grant >>> >>> On May 12, 2009, at 5:52 AM, Jacob Singh wrote: >>> Hi folks, I just wrote a Servlet Filter to handle authentication for our service. Here's what I did: 1. Created a dir in contrib 2. Put my project in there, I took the dataimporthandler build.xml as an example and modified it to suit my needs. Worked great! 3. ant dist now builds my jar and includes it I now need to modify web.xml to add my filter-mapping, init params, etc. How can I do this cleanly? Or do I need to manually open up the archive and edit it and then re-war it? In common-build I don't see a target for dist-war, so don't see how it is possible... Thanks! Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >> >> >> >> -- >> >> +1 510 277-0891 (o) >> +91 33 7458 (m) >> >> web: http://pajamadesign.com >> >> Skype: pajamadesign >> Yahoo: jacobsingh >> AIM: jacobsingh >> gTalk: jacobsi...@gmail.com > > -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Additional metadata when using Solr Cell
what does /admin/luke show for fields and terms in the fields? On May 14, 2009, at 10:03 AM, rossputin wrote: Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract? ext .idx .attr =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params = {ext .def .fl = text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select? indent=on&version=2.2&q=hello&start=0&rows=10&fl=author %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Additional metadata when using Solr Cell
There is no reference to the author field I am trying to set.. I am using the latest nightly download. -- Ross Grant Ingersoll-6 wrote: > > what does /admin/luke show for fields and terms in the fields? > > On May 14, 2009, at 10:03 AM, rossputin wrote: > >> >> Hi. >> >> I am indexing a PDF document with the ExtractingRequestHandler. My >> curl >> post has a URL like: >> >> ../solr/update/extract? >> ext >> .idx >> .attr >> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody >> >> Sure enough I see in the server logs: >> >> params >> = >> {ext >> .def >> .fl >> = >> text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} >> >> I am trying to get my field back in the results from a query: >> >> ../solr/select? >> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author >> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= >> >> I see the score in the results 'doc' but no reference to author. >> >> Can anyone advise on what I am forgetting to do, to get hold of this >> field? >> >> Thanks in advance for your help, >> >> -- Ross >> -- >> View this message in context: >> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Additional metadata when using Solr Cell
Do you have an author field in your schema? On May 14, 2009, at 10:31 AM, rossputin wrote: There is no reference to the author field I am trying to set.. I am using the latest nightly download. -- Ross Grant Ingersoll-6 wrote: what does /admin/luke show for fields and terms in the fields? On May 14, 2009, at 10:03 AM, rossputin wrote: Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract? ext .idx .attr =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params = {ext .def .fl = text &ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select? indent=on&version=2.2&q=hello&start=0&rows=10&fl=author %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Additional metadata when using Solr Cell
There is now, thanks for your help. On the same topic.. is there a best practice for modifying schema, in a future-proof way ? -- Ross Grant Ingersoll-6 wrote: > > Do you have an author field in your schema? > > On May 14, 2009, at 10:31 AM, rossputin wrote: > >> >> There is no reference to the author field I am trying to set.. I am >> using the >> latest nightly download. >> >> -- Ross >> >> >> Grant Ingersoll-6 wrote: >>> >>> what does /admin/luke show for fields and terms in the fields? >>> >>> On May 14, 2009, at 10:03 AM, rossputin wrote: >>> Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract? ext .idx .attr =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params = {ext .def .fl = text &ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select? indent=on&version=2.2&q=hello&start=0&rows=10&fl=author %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>> using Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23542620.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: AW: Geographical search based on latitude and longitude
Hi Grant, thanks for the reply. Is the logic for a function query that calculates distances that Yonik mentioned (gdist(position,101.2,234.3)) already implemented? This could be either very inaccurate or load intense. If the logic isn't done until now maybe I can prepare it. Norman -Ursprüngliche Nachricht- Von: Grant Ingersoll [mailto:gsing...@apache.org] Gesendet: Dienstag, 12. Mai 2009 19:43 An: solr-user@lucene.apache.org Betreff: Re: AW: Geographical search based on latitude and longitude Yes, that is part of it, but there is more to it. See Yonik's comment about needs further down. On May 12, 2009, at 7:36 AM, Norman Leutner wrote: > So are you using boundary box to find results within a given range(km) > like mentioned here: > http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html > ? > > > Best regards > > Norman Leutner > all2e GmbH > > -Ursprüngliche Nachricht- > Von: Grant Ingersoll [mailto:gsing...@apache.org] > Gesendet: Dienstag, 12. Mai 2009 13:18 > An: solr-user@lucene.apache.org > Betreff: Re: Geographical search based on latitude and longitude > > See https://issues.apache.org/jira/browse/SOLR-773. In other words, > we're working on it and would love some help! > > -Grant > > On May 12, 2009, at 7:12 AM, Norman Leutner wrote: > >> Hi together, >> >> I'm new to Solr and want to port a geographical range search from >> MySQL to Solr. >> >> Currently I'm using some mathematical functions (based on GRS80 >> modell) directly within MySQL to calculate >> the actual distance from the locations within the database to a >> current location (lat and long are known): >> >> $query=SELECT street, zip, city, state, country, ". >> $radius."*ACOS(cos(RADIANS(latitude))*cos(". >> $theta.")*(sin(RADIANS(longitude))*sin(".$phi.") >> +cos(RADIANS(longitude))*cos(".$phi."))+sin(RADIANS(latitude))*sin(". >> $theta.")) AS Distance FROM ezgis_position WHERE ". >> $radius."*ACOS(cos(RADIANS(latitude))*cos(". >> $theta.")*(sin(RADIANS(longitude))*sin(".$phi.") >> +cos(RADIANS(longitude))*cos(".$phi."))+sin(RADIANS(latitude))*sin(". >> $theta.")) <= ".$range." ORDER BY Distance"; >> >> This works pretty fine and fast. Due to we want to include this >> within our Solr search result I would like to have a attribute like >> "actual_distance" within the result. Is there a way to use those >> functions like (radians, sin, acos,...) directly within Solr? >> >> Thanks in advance for any feedback >> Norman Leutner > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr vs Sphinx
Yonik Seeley-2 wrote: > > It's probably the case that every search engine out there is faster > than Solr at one thing or another, and that Solr is faster or better > at some other things. > > I prefer to spend my time improving Solr rather than engage in > benchmarking wars... and Solr 1.4 will have a ton of speed > improvements over Solr 1.3. > > -Yonik > http://www.lucidimagination.com > > Solr is very fast even with 1.3 and the developers have done an incredible job. However, maybe the next Solr improvement should be the creation of a configuration manager and/or automated tuning tool. I know that optimizing Solr performance can be time consuming and sometimes frustrating. -- View this message in context: http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23544492.html Sent from the Solr - User mailing list archive at Nabble.com.
CommonsHttpSolrServer vs EmbeddedSolrServer
What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer. Which is the preferred server to use? In some blog i read that EmbeddedSolrServer is 50% faster than CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer. Can anyone please guide me the right path/way.So that i pick the right implementation. Thanks in advance. --Sachin -- View this message in context: http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication master+slave
https://issues.apache.org/jira/browse/SOLR-1167 -Bryan On May 13, 2009, at May 13, 7:20 PM, Otis Gospodnetic wrote: Bryan, maybe it's time to stick this in JIRA? http://wiki.apache.org/solr/HowToContribute Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bryan Talbot To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 10:11:21 PM Subject: Re: Replication master+slave I think the patch I included earlier covers solr core, but it looks like at least some other extensions (DIH) create and use their own XML parser. So, if this functionality is to extend to all XML files, those will need similar patches. Here's one for DIH: --- src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (revision 774137) +++ src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (working copy) @@ -148,8 +148,10 @@ void loadDataConfig(String configFile) { try { - DocumentBuilder builder = DocumentBuilderFactory.newInstance() - .newDocumentBuilder(); + DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + DocumentBuilder builder = dbf.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader( configFile))); The only down side I can see to this is it doesn't offer very expressive conditional inclusion: the file is included if it's present otherwise fallback inclusions can be used. It's also specific to XML files and obviously won't work for other types of configuration files. However, it is simple and effective. -Bryan On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote: Coincidentally, from http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ : "Hadoop configuration files now support XInclude elements for including portions of another configuration file (HADOOP-4944). This mechanism allows you to make configuration files more modular and reusable." So "others are doing it, too". Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bryan Talbot To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 11:26:41 AM Subject: Re: Replication master+slave I see that Nobel's final comment in SOLR-1154 is that config files need to be able to include snippets from external files. In my limited testing, a simple patch to enable XInclude support seems to work. --- src/java/org/apache/solr/core/Config.java (revision 774137) +++ src/java/org/apache/solr/core/Config.java (working copy) @@ -100,8 +100,10 @@ if (lis == null) { lis = loader.openConfig(name); } - javax.xml.parsers.DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); - doc = builder.parse(lis); + javax.xml.parsers.DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + doc = dbf.newDocumentBuilder().parse(lis); DOMUtil.substituteProperties(doc, loader.getCoreProperties()); } catch (ParserConfigurationException e) { This allows a clause like this to include the contents of replication.xml if it exists. If it's not found an exception will be thrown. href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml " xmlns:xi="http://www.w3.org/2001/XInclude";> If the file is optional and no exception should be thrown if the file is missing, simply include a fallback action: in this case the fallback is empty and does nothing. href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml " xmlns:xi="http://www.w3.org/2001/XInclude";> -Bryan On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote: I was looking at the same problem, and had a discussion with Noble. You can use a hack to achieve what you want, see https://issues.apache.org/jira/browse/SOLR-1154 Thanks, Jianhan On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a "good thing" to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders would make it possible to modularize configuration files to make this possible? http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) -Bryan On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot wrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both
Powered by Solr
I was intending to make an entry to the 'Powered by Solr' page, so I created a Wiki account and logged in. When I go to that page, it shows it as being 'immutable', which I take as meaning I can't edit it. Is there someone I can send the information to who can do the edit? Or perhaps there is some sort of trick to editing that page? Thanks for your help, and apologies in advance if this is a silly question... Terence
Re: Powered by Solr
On Thu, May 14, 2009 at 1:54 PM, Terence Gannon wrote: > I was intending to make an entry to the 'Powered by Solr' page, so I > created a Wiki account and logged in. When I go to that page, it > shows it as being 'immutable', which I take as meaning I can't edit > it. Did you try hitting refresh on your browser after you logged in? -Yonik http://www.lucidimagination.com
Re: Solr vs Sphinx
On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey wrote: > Richard Feynman: > >"...if you're doing an experiment, you should report everything that you >think might make it invalid - not only what you think is right about it: >other causes that could possibly explain your results; and things you >thought of that you've eliminated by some other experiment, and how they >worked - to make sure the other fellow can tell they have been eliminated." Excellent quote! > So, should Lucene use the non-compound file format by default because some > idiot's sloppy benchmarks might run a smidge faster, even though that will > cause many users to run out of file descriptors? No, I don't think we should change that default. Nor (for example) can we switch to SweetSpotSimilarity by default, even though it seems to improve relevance, because it requires app-dependent configuration. Nor should we set IndexWriter's RAM buffer to 1 GB. Etc. But when there is a choice that has near zero downside and improves performance (like my example), we should make the switch. Making IndexReader.open return a readOnly reader is another example (... which we plan to do in 3.0). Every time Lucene or Solr has a default built-in setting, we should think carefully about how to set it. > Anyone doing comparative benchmarking who doesn't submit their code to the > support list for the software under review is either a dolt or a propagandist. > > Good benchmarking is extremely difficult, like all experimental science. If > there isn't ample evidence that the benchmarker appreciates that, their tests > aren't worth a second thought. If you don't avail yourself of the help of > experts when assembling your experiment, you are unserious. Agreed. Mike
Re: Solr memory requirements?
I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage? Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson wrote: > Warning: I'm wy out of my competency range when I comment > on SOLR, but I've seen the statement that string fields are NOT > tokenized while text fields are, and I notice that almost all of your fields > are string type. > > Would someone more knowledgeable than me care to comment on whether > this is at all relevant? Offered in the spirit that sometimes there are > things > so basic that only an amateur can see them > > Best > Erick > > On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: > >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> > required="true" omitNorms="true" compressed="false"/> >> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > default="NOW/HOUR" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > default="NOW/HOUR" omitNorms="true"/> >> >> >> >> > omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values >> for the field(s) used for sorting. Similarly for facet fields. Solr >> caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> consume during indexing. There is no need to commit every 50K docs unless >> you want to trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> going to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in memory? >> >> >> >> 2) How much memory is required during index time? If I'm committing >> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> >> I need to give to Solr. >> >> >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> >> size index? Is there any benchmark on this? >> >> >> >> Here are some of my configuration from solrconfig.xml, >> >> >> >> 1) 64 >> >> 2) All the caches (under query tag) are commented out >> >> 3) Few others, >> >> a) true ==> >> >> would this require memory? >> >> b) 50 >> >> c) 200 >> >> d) >> >> e) false >> >> f) 2 >> >> >> >> The problem we are having is following, >> >> >> >> I've given Solr RAM of 6G. As the total index size (all cores >> >> combined) start growing the Solr memory consumption goes up. With 800 >> >> million documents, I see Solr already taking up all the memory at >> >> startup. After that the commits, searches everything become slow. We >> >> will be having distributed setup with multiple Solr instances (around >> >> 8) on four boxes, but our requirement is to have each Solr instance at >> >> least maintain around 1.5 billion documents. >> >> >> >> We are trying to see if we can somehow reduce the Solr memory >> >> footprint. If someone can provide a pointer on what parameters affect >> >> memory and what effects it has we can then decide whether we want that >> >> parameter or not. I'm not sure if there is any minimum Solr >> >> requirement for it to be able mainta
Re: Powered by Solr
> Did you try hitting refresh on your browser after you logged in? Wow, I really should have known that...thank you for your patient reply, Yonik. Regards...Terence
replication of lucene-write.lock file
When using solr 1.4 replication, I see that the lucene-write.lock file is being replicated to slaves. I'm importing data from a db every 5 minutes using cron to trigger a DIH delta-import. Replication polls every 60 seconds and the master is configured to take a snapshot (replicateAfter) commit. Why should the lock file be replicated to slaves? The lock file isn't stale on the master and is absent unless the delta- import is in process. I've not tried it yet, but with the lock file replicated, it seems like promotion of a slave to a master in a failure recovery scenario requires the manual removal of the lock file. -Bryan
Re: CommonsHttpSolrServer vs EmbeddedSolrServer
CommonsHttpSolrServer is how you access Solr from a Java client via HTTP. You can connect to a Solr running anywhere EmbeddedSolrServer starts up Solr internally, and connects directly, all in a single JVM... Embedded may be faster, the jury is out, but you have to have your Solr server and your Solr client on the same box... Unless you really need it, I would start with CommonsHttpSolrServer, it's easier to configure and get going with and more flexible. Eric On May 14, 2009, at 1:30 PM, sachin78 wrote: What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer. Which is the preferred server to use? In some blog i read that EmbeddedSolrServer is 50% faster than CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer. Can anyone please guide me the right path/way.So that i pick the right implementation. Thanks in advance. --Sachin -- View this message in context: http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: CommonsHttpSolrServer vs EmbeddedSolrServer
right -- which one you pick will depend more on your runtime environment then anything else. If you need to hit a server (on a different machine) CommonsHttpSolrServer is your only option. If you are running an embedded application -- where your custom code lives in the same JVM as solr -- you can use EmbeddedSolrServer. The nice thing is that since they are the same interface, you can change later. The performance comments on the wiki can be a bit misleading -- yes, in some cases embedded could be faster, but that may depend on how you are sending things -- are you sending 1000s of single document requests really fast? If so, try sending a bunch of documents together in one request. Also consider using the StreamingHttpSolrServer (https://issues.apache.org/jira/browse/SOLR-906 ) -- it has a few quirks, but can be much faster. In any case, as long as you program against the SolrServer interface, then you could swap the implementation as needed. ryan On May 14, 2009, at 3:35 PM, Eric Pugh wrote: CommonsHttpSolrServer is how you access Solr from a Java client via HTTP. You can connect to a Solr running anywhere EmbeddedSolrServer starts up Solr internally, and connects directly, all in a single JVM... Embedded may be faster, the jury is out, but you have to have your Solr server and your Solr client on the same box... Unless you really need it, I would start with CommonsHttpSolrServer, it's easier to configure and get going with and more flexible. Eric On May 14, 2009, at 1:30 PM, sachin78 wrote: What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer. Which is the preferred server to use? In some blog i read that EmbeddedSolrServer is 50% faster than CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer. Can anyone please guide me the right path/way.So that i pick the right implementation. Thanks in advance. --Sachin -- View this message in context: http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Solr vs Sphinx
On 14-May-09, at 9:46 AM, gdeconto wrote: Solr is very fast even with 1.3 and the developers have done an incredible job. However, maybe the next Solr improvement should be the creation of a configuration manager and/or automated tuning tool. I know that optimizing Solr performance can be time consuming and sometimes frustrating. "Making Solr more self-service" has been a theme we have had and should strive to move toward. In some respects, extreme configurability is a liability, if considerable tweaking and experimentation is needed to achieve optimum results. You can't expect everyone to put in the investment to develop the expertise. That said, it is very difficult to come up with appropriate auto- tuning heuristics that don't fail. It almost calls for a level higher than Solr that you could hint what you want to do with the field (sort, facet, etc.), and it makes the field definitions appropriately. The problem with such abstractions is that they are invariably leaky, and thus diagnosing problems requires similar expertise as omitting the abstraction step in the first place. Getting this trade-off right is one of the central problems of computer science. -Mike
Re: Solr memory requirements?
Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). The search query is a very generic one in this case which goes through all the cores (4 of them - 800 million records), finds 400million matches and returns 100 rows. Does the Solr searcher holds up the reference to objects in memory? I couldn't find any settings that would tell me it does, but every search causing heap to go up is definitely suspicious. 2) I ran the jmap histo to get the top objects (this is on a smaller instance with 2 G memory, this is before running search - after running search I wasn't able to run jmap), num #instances #bytes class name -- 1: 3890855 222608992 [C 2: 3891673 155666920 java.lang.String 3: 3284341 131373640 org.apache.lucene.index.TermInfo 4: 3334198 106694336 org.apache.lucene.index.Term 5: 271 26286496 [J 6:16 26273936 [Lorg.apache.lucene.index.Term; 7:16 26273936 [Lorg.apache.lucene.index.TermInfo; 8:320512 15384576 org.apache.lucene.index.FreqProxTermsWriter$PostingList 9: 10335 11554136 [I I'm not sure what's the first one (C)? I couldn't profile it to know what all the Strings are being allocated by - any ideas? Any ideas on what Searcher might be holding on and how can we change that behavior? Thanks, -vivek On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: > I don't know if field type has any impact on the memory usage - does it? > > Our use cases require complete matches, thus there is no need of any > analysis in most cases - does it matter in terms of memory usage? > > Also, is there any default caching used by Solr if I comment out all > the caches under query in solrconfig.xml? I also don't have any > auto-warming queries. > > Thanks, > -vivek > > On Wed, May 13, 2009 at 4:24 PM, Erick Erickson > wrote: >> Warning: I'm wy out of my competency range when I comment >> on SOLR, but I've seen the statement that string fields are NOT >> tokenized while text fields are, and I notice that almost all of your fields >> are string type. >> >> Would someone more knowledgeable than me care to comment on whether >> this is at all relevant? Offered in the spirit that sometimes there are >> things >> so basic that only an amateur can see them >> >> Best >> Erick >> >> On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: >> >>> Thanks Otis. >>> >>> Our use case doesn't require any sorting or faceting. I'm wondering if >>> I've configured anything wrong. >>> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>> stored). All my fields are basic data type - which I thought are not >>> sorted. My id field is unique key. >>> >>> Is there any field here that might be getting sorted? >>> >>> >> required="true" omitNorms="true" compressed="false"/> >>> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> default="NOW/HOUR" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> default="NOW/HOUR" omitNorms="true"/> >>> >>> >>> >>> >> omitNorms="true" multiValued="true"/> >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >>> wrote: >>> > >>> > Hi, >>> > Some answers: >>> > 1) .tii files in the Lucene index. When you sort, all distinct values >>> for the field(s) used for sorting. Similarly for facet fields. Solr >>> caches. >>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >>> consume during indexing. There is no need to commit every 50K docs unless >>> you want to trigger snapshot creation. >>> > 3) see 1) above >>> > >>> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >>> going to fly. :) >>> > >>> > Otis >>> > -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > - Original Messag
Re: Solr vs Sphinx
Michael McCandless wrote: So why haven't we enabled this by default, already? Why isn't Lucene done already :) - Mark
Search Query Questions
I have two questions: 1) How do I search for ALL items? For example, I provide a sort query parameter of "updated" and a rows query parameter of 10 to limit the query results. I still have to provide a search query, of course. What if I want to provide a list of ALL results that match this? Or, in this case, the most recent 10 updated documents? 2) How do I search for all documents with a field that has data? For example, I have a field "foo" that is optional and multi-valued. How do I search for documents that have this field set to anything. Thanks, Chris Miller ServerMotion www.servermotion.com
Re: Search Query Questions
Oh, one more question 3) Is there a way to effectively do a GROUP BY? For example, if I have a document that has a photoID attached to it, is there a way to return a set of results that does not duplicate the photoID field? Thanks, Chris Miller ServerMotion www.servermotion.com On May 14, 2009, at 7:46 PM, Chris Miller wrote: I have two questions: 1) How do I search for ALL items? For example, I provide a sort query parameter of "updated" and a rows query parameter of 10 to limit the query results. I still have to provide a search query, of course. What if I want to provide a list of ALL results that match this? Or, in this case, the most recent 10 updated documents? 2) How do I search for all documents with a field that has data? For example, I have a field "foo" that is optional and multi-valued. How do I search for documents that have this field set to anything. Thanks, Chris Miller ServerMotion www.servermotion.com
Re: Additional metadata when using Solr Cell
rossputin wrote: Hi. I am indexing a PDF document with the ExtractingRequestHandler. My curl post has a URL like: ../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody Sure enough I see in the server logs: params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody} I am trying to get my field back in the results from a query: ../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= I see the score in the results 'doc' but no reference to author. Can anyone advise on what I am forgetting to do, to get hold of this field? Thanks in advance for your help, -- Ross Have you added author to the schema? If not, and if you are using the example config (that uses ext.ignore.und.fl=true), the field could just be ignored. Define it and it should be filled. -- - Mark http://www.lucidimagination.com
Re: Solr memory requirements?
800 million docs is on the high side for modern hardware. If even one field has norms on, your talking almost 800 MB right there. And then if another Searcher is brought up well the old one is serving (which happens when you update)? Doubled. Your best bet is to distribute across a couple machines. To minimize you would want to turn off or down caching, don't facet, don't sort, turn off all norms, possibly get at the Lucene term interval and raise it. Drop on deck searchers setting. Even then, 800 million...time to distribute I'd think. vivek sar wrote: Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). The search query is a very generic one in this case which goes through all the cores (4 of them - 800 million records), finds 400million matches and returns 100 rows. Does the Solr searcher holds up the reference to objects in memory? I couldn't find any settings that would tell me it does, but every search causing heap to go up is definitely suspicious. 2) I ran the jmap histo to get the top objects (this is on a smaller instance with 2 G memory, this is before running search - after running search I wasn't able to run jmap), num #instances #bytes class name -- 1: 3890855 222608992 [C 2: 3891673 155666920 java.lang.String 3: 3284341 131373640 org.apache.lucene.index.TermInfo 4: 3334198 106694336 org.apache.lucene.index.Term 5: 271 26286496 [J 6:16 26273936 [Lorg.apache.lucene.index.Term; 7:16 26273936 [Lorg.apache.lucene.index.TermInfo; 8:320512 15384576 org.apache.lucene.index.FreqProxTermsWriter$PostingList 9: 10335 11554136 [I I'm not sure what's the first one (C)? I couldn't profile it to know what all the Strings are being allocated by - any ideas? Any ideas on what Searcher might be holding on and how can we change that behavior? Thanks, -vivek On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage? Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson wrote: Warning: I'm wy out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type. Would someone more knowledgeable than me care to comment on whether this is at all relevant? Offered in the spirit that sometimes there are things so basic that only an amateur can see them Best Erick On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong. I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: Hi, Some answers: 1) .tii files in the Lucene index. When you sort, all distinct values for the field(s) used for sorting. Similarly for facet fields. Solr caches. 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume during indexing. There is no need to commit every 50K docs unless you want to trigger snapshot creation. 3) see 1) above 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's going to fly. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 3:04:46 PM Subject: Solr memory requirements? Hi, I'm pretty sure this has been asked before, but I couldn't find a complete answer in the forum archive. Here are my questions, 1) When solr starts up what does it loads up in the memory? Let's say I've 4 cores with each core 50G in size. When Solr comes up how much of it would be loaded in memory? 2
Re: Search Query Questions
I think you will want to look at the Field Collapsing patch for this. http://issues.apache.org/jira/browse/SOLR-236 . Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 14, 2009, at 5:52 PM, Chris Miller wrote: Oh, one more question 3) Is there a way to effectively do a GROUP BY? For example, if I have a document that has a photoID attached to it, is there a way to return a set of results that does not duplicate the photoID field? Thanks, Chris Miller ServerMotion www.servermotion.com On May 14, 2009, at 7:46 PM, Chris Miller wrote: I have two questions: 1) How do I search for ALL items? For example, I provide a sort query parameter of "updated" and a rows query parameter of 10 to limit the query results. I still have to provide a search query, of course. What if I want to provide a list of ALL results that match this? Or, in this case, the most recent 10 updated documents? 2) How do I search for all documents with a field that has data? For example, I have a field "foo" that is optional and multi- valued. How do I search for documents that have this field set to anything. Thanks, Chris Miller ServerMotion www.servermotion.com
Re: Solr memory requirements?
Thanks Mark. I checked all the items you mentioned, 1) I've omitnorms=true for all my indexed fields (stored only fields I guess doesn't matter) 2) I've tried commenting out all caches in the solrconfig.xml, but that doesn't help much 3) I've tried commenting out the first and new searcher listeners settings in the solrconfig.xml - the only way that helps is that at startup time the memory usage doesn't spike up - that's only because there is no auto-warmer query to run. But, I noticed commenting out searchers slows down any other queries to Solr. 4) I don't have any sort or facet in my queries 5) I'm not sure how to change the "Lucene term interval" from Solr - is there a way to do that? I've been playing around with this memory thing the whole day and have found that it's the search that's hogging the memory. Any time there is a search on all the records (800 million) the heap consumption jumps by 5G. This makes me think there has to be some configuration in Solr that's causing some terms per document to be loaded in memory. I've posted my settings several times on this forum, but no one has been able to pin point what configuration might be causing this. If someone is interested I can attach the solrconfig and schema files as well. Here are the settings again under Query tag, 1024 true 50 200 false 2 and schema, Any help is greatly appreciated. Thanks, -vivek On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: > 800 million docs is on the high side for modern hardware. > > If even one field has norms on, your talking almost 800 MB right there. And > then if another Searcher is brought up well the old one is serving (which > happens when you update)? Doubled. > > Your best bet is to distribute across a couple machines. > > To minimize you would want to turn off or down caching, don't facet, don't > sort, turn off all norms, possibly get at the Lucene term interval and raise > it. Drop on deck searchers setting. Even then, 800 million...time to > distribute I'd think. > > vivek sar wrote: >> >> Some update on this issue, >> >> 1) I attached jconsole to my app and monitored the memory usage. >> During indexing the memory usage goes up and down, which I think is >> normal. The memory remains around the min heap size (4 G) for >> indexing, but as soon as I run a search the tenured heap usage jumps >> up to 6G and remains there. Subsequent searches increases the heap >> usage even more until it reaches the max (8G) - after which everything >> (indexing and searching becomes slow). >> >> The search query is a very generic one in this case which goes through >> all the cores (4 of them - 800 million records), finds 400million >> matches and returns 100 rows. >> >> Does the Solr searcher holds up the reference to objects in memory? I >> couldn't find any settings that would tell me it does, but every >> search causing heap to go up is definitely suspicious. >> >> 2) I ran the jmap histo to get the top objects (this is on a smaller >> instance with 2 G memory, this is before running search - after >> running search I wasn't able to run jmap), >> >> num #instances #bytes class name >> -- >> 1: 3890855 222608992 [C >> 2: 3891673 155666920 java.lang.String >> 3: 3284341 131373640 org.apache.lucene.index.TermInfo >> 4: 3334198 106694336 org.apache.lucene.index.Term >> 5: 271 26286496 [J >> 6: 16 26273936 [Lorg.apache.lucene.index.Term; >> 7: 16 26273936 [Lorg.apache.lucene.index.TermInfo; >> 8: 320512 15384576 >> org.apache.lucene.index.FreqProxTermsWriter$PostingList >> 9: 10335 11554136 [I >> >> I'm not sure what's the first one (C)? I couldn't profile it to know >> what all the Strings are being allocated by - any ideas? >> >> Any ideas on what Searcher might be holding on and how can we change >> that behavior? >> >> Thanks, >> -vivek >> >> >> On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: >> >>> >>> I don't know if field type has any impact on the memory usage - does it? >>> >>> Our use cases require complete matches, thus there is no need of any >>> analysis in most cases - does it matter in terms of memory usage? >>> >>> Also, is there any default caching used by Solr if I comment out all >>> the caches under query in solrconfig.xml? I also don't have any >>> auto-warming queries. >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson >>> wrote: >>> Warning: I'm wy out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type. Would someone more knowledgeable than me care to comment on whether t