Re: Boosting for most recent documents
Hi, Related question to "getting the latest records first". After trying few suggested ways (function query, index time boosting) of getting the latest first I settled for simple "sort" parameter, sort=field+asc As per wiki, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), Lucene would cache "4 bytes * the number of documents" plus unique terms for the sorted field in fieldcache. This is done so subsequent sort requests can be retrieved from cache. So the memory usage if I got 1 billion records in one Indexer instance, for ex, 1) 1 billion records 2) sort on time stamp field (rounded to hour) - for 1 year - 8760 unique terms. (negligible) 3) Total memory requirement for sorting on this single field would be around 1G * 4 = 4GB So, if I run only one sort query once in a day there would still be 4GB required at all time. Is there any way to tell Solr/Lucene to release the memory once the query has been run? Basically I don't want cache. I've commented out all the cache parameters in the solrconfig.xml, but I still see the very first time I run the sort query the memory jumps by 4 G and remains there. Is there any way so Lucene/Solr doesn't use so much memory for sorting so my application can scale (sorting memory requirement won't be function of number of documents)? Thanks, -vivek On Thu, Jul 16, 2009 at 3:10 PM, Chris Hostetter wrote: > > : Does anyone know if Solr supports sorting by internal document ids, > : i.e, like Sort.INDEXORDER in Lucene? If so, how? > > It does not. in Solr the decisison to make "score desc" the default > search ment there is no way to request simple docId ordering. > > : Also, if anyone have any insight on if function query loads up unique > : terms (like field sorts) in memory or not. > > It uses the exact same FieldCache as sorting. > > > > > -Hoss >
Replication over multi-core solr
Hi, We use multi-core setup for Solr, where new cores are added dynamically to solr.xml. Only one core is active at a time. My question is how can the replication be done for multi-core - so every core is replicated on the slave? I went over the wiki, http://wiki.apache.org/solr/SolrReplication, and few questions related to that, 1) How do we replicate solr.xml where we have list of cores? Wiki says, "Only files in the 'conf' dir of solr instance is replicated. " - since, solr.xml is in the home directory how do we replicate that? 2) Solrconfig.xml in slave takes a static core url, http://localhost:port/solr/corename/replication As in our case cores are created dynamically (new core created after the active one reaches some capacity), how can we define master core dynamically for replication? The only I see it is using "fetchIndex" command and passing new core info there - is it right? If so, does the slave application have write code to poll Master periodically and fire "fetchIndex" command, but how would Slave know the Master corename - as they are created dynamically on the Master? Thanks, -vivek
Re: Replication over multi-core solr
Licinio, Please open a separate thread - as it's a different issue - and I can respond there. -vivek 2009/8/19 Licinio Fernández Maurelo : > Hi Vivek, > currently we want to add cores dynamically when the active one reaches > some capacity, > can you give me some hints to achieve such this functionality? (Just > wondering if you have used shell-scripting or you have code some 100% > Java based solution) > > Thx > > > 2009/8/19 Noble Paul നോബിള് नोब्ळ् : >> On Wed, Aug 19, 2009 at 2:27 AM, vivek sar wrote: >>> Hi, >>> >>> We use multi-core setup for Solr, where new cores are added >>> dynamically to solr.xml. Only one core is active at a time. My >>> question is how can the replication be done for multi-core - so every >>> core is replicated on the slave? >> >> replication does not handle new core creation. You will have to issue >> the core creation command to each slave separately. >>> >>> I went over the wiki, http://wiki.apache.org/solr/SolrReplication, >>> and few questions related to that, >>> >>> 1) How do we replicate solr.xml where we have list of cores? Wiki >>> says, "Only files in the 'conf' dir of solr instance is replicated. " >>> - since, solr.xml is in the home directory how do we replicate that? >> solr.xml canot be replicated. even if you did it is not reloaded. >>> >>> 2) Solrconfig.xml in slave takes a static core url, >>> >>> >> name="masterUrl">http://localhost:port/solr/corename/replication >> >> put a placeholder like >> > name="masterUrl">http://localhost:port/solr/${solr.core.name}/replication >> so the corename is automatically replaced >> >>> >>> As in our case cores are created dynamically (new core created after >>> the active one reaches some capacity), how can we define master core >>> dynamically for replication? The only I see it is using "fetchIndex" >>> command and passing new core info there - is it right? If so, does the >>> slave application have write code to poll Master periodically and fire >>> "fetchIndex" command, but how would Slave know the Master corename - >>> as they are created dynamically on the Master? >>> >>> Thanks, >>> -vivek >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > > > > -- > Lici >
Re: Adding cores dynamically
Lici, We're doing similar thing with multi-core - when a core reaches capacity (in our case 200 million records) we start a new core. We are doing this via web service call (Create web service), http://wiki.apache.org/solr/CoreAdmin This is all done in java code - before writing we check the number of records in core - if reached it's capacity we create a new core and then index there. -vivek 2009/8/19 Licinio Fernández Maurelo : > Hi there, > > currently we want to add cores dynamically when the active one reaches > some capacity, > can anyone give me some hints to achieve such this functionality? (Just > wondering if you have used shell-scripting or you have code some 100% > Java based solution) > > Thx > > > -- > Lici >
Re: Adding cores dynamically
There were two main reasons we went with multi-core solution, 1) We found the indexing speed starts dipping once the index grow to a certain size - in our case around 50G. We don't optimize, but we have to maintain a consistent index speed. The only way we could do that was keep creating new cores (on the same box, though we do use multiple boxes to scale horizontally as well) once it reaches its capacity. The old core is not written to again once it reaches its capacity. 2) Be able to drop the whole core for pruning purposes. We didn't want to delete records from the index, so the best solution was to simply delete the complete core directory (we do maintain the time period for each core), which is much faster and easy to maintain. So far things have been working fine. I'm not sure if there is any inherent problem with this architecture given the above limitations and requirements. -vivek On Tue, Aug 25, 2009 at 10:57 AM, Lance Norskog wrote: > One problem is the IT logistics of handling the file set. At 200 million > records you have at least 20G of data in one Lucene index. It takes hours to > optimize this, and 10s of minutes to copy the optimized index around to > query servers. > Another problem is that indexing speed drops off after the index reaches a > certain size. When making multiple indexes, you want to stop indexing before > that size. > Lance > > On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter > wrote: > >> >> : We're doing similar thing with multi-core - when a core reaches >> : capacity (in our case 200 million records) we start a new core. We are >> : doing this via web service call (Create web service), >> >> this whole thread perplexes me ... while i can understand not wanting to >> let an index grow without bound becuase of hardware limitation, i don't >> understand what value you are gaining by creating a new core on the same >> box -- you're using the same physical resources to search the same number >> of documents, making multiple cores for this actaully seems like it would >> take up *more* resources to search the same amount of content, because the >> individual cores will be isolated and the term dictionaries can't be >> shared (not to mention you have to do a multi-shard query to get results >> from all the cores) >> >> are you doing something special with the old cores vs the new ones? (ie: >> create the new cores on new machines, shutdown cores after a certian >> amount of time has expired, etc...) >> >> >> : > Hi there, >> : > >> : > currently we want to add cores dynamically when the active one reaches >> : > some capacity, >> : > can anyone give me some hints to achieve such this functionality? (Just >> : > wondering if you have used shell-scripting or you have code some 100% >> : > Java based solution) >> : > >> : > Thx >> : > >> : > >> : > -- >> : > Lici >> : > >> : >> >> >> >> -Hoss >> >> > > > -- > Lance Norskog > goks...@gmail.com >
How does ReplicationHandler backup work?
Hi, As one of our requirement we need to backup Master indexes to Slave periodically. I've been able to successfully sync the index using "fetchIndex" command, http://localhost:9006/solr/audit_20090828_1/replication?command=fetchindex&masterUrl=http://localhost:8080/solr/audit_20090828_1/replication now, I'm wondering how do I do the backup. Looking at the wiki, http://wiki.apache.org/solr/SolrReplication, it seems there is a backup command, but that says backup on Master. I tried replacing command "fetchindex" to "backup", but that didn't work. How can do I complete index backup (for a particular core) from Master to Slave? Thanks, -vivek
Partition index by time using Solr
Hi, I've used Lucene before, but new to Solr. I've gone through the mailing list, but unable to find any clear idea on how to partition Solr indexes. Here is what we want, 1) Be able to partition indexes by timestamp - basically partition per day (create a new index directory every day) 2) Be able to search partitions based on timestamp. All our queries are time based, so instead of looking into all the partitions I want to go directly to the partitions where the data might be. 3) Be able to purge any data older than 6 months without bringing down the application. Since, partitions would be marked by timestamp we would just have to delete the old partitions. This is going to be a distributed system with 2 boxes each running an instance of Solr. I don't want to replicate data, but each box may have same timestamp partition with different data. We would be indexing on avg of 20 million documents (each document = 500 bytes) with estimate of 10g in index size - evenly distributed across machines (each machine would get roughly 5g of index everyday). My questions, 1) Is this all possible using Solr? If not, should I just do this using Lucene or is there any other out-of-box alternative? 2) If it's possible in Solr how do we do this - configuration, setup etc. 3) How would I optimize the partitions - would it be required when using Solr? Thanks, -vivek
Re: Partition index by time using Solr
Thanks Otis for the response. I'm still not clear on few things, 1) I thought Solr can work with only one index at a time. In order to have multiple indexes you need multiple instances of Solr - isn't that right? How can we make Solr to read/ write from and to multiple indexes? 2) What does it mean by "partitioning outside of Solr"? If all the data is indexed by Solr into one index - how would one parition it outside Solr that is still searchable by Solr when needed? Our main problem is scaling with Solr. Our indexes grow so big (like 10G-20G everyday) that it's hard to optimize them and search on large indexes. That's why we are trying to partition them by time. We do need to keep up to 6 months of data. The only way I can think of limiting the index size is by running multiple Solr instances, but even then it's not a scalable solution if the indexes keep growing. Thanks, -vivek On Wed, Mar 25, 2009 at 6:59 PM, Otis Gospodnetic wrote: > > Hi, > > Yes, you can use Solr for this, but index partitioning should be done outside > of Solr. That is, your app will need to know where to send each doc based on > its timestamp, when and where to create new index (new Solr core), and so on. > Similarly, deleting older than N days is done by you, using a delete by > query with a date-based open-ended range query. The Solr setup is really > done the same as usual, since all the partitioning-related stuff lives > outside of Solr. Of course, you could come up with a "Solr Proxy" component > that abstract some/all of this and pretends to be Solr. > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, March 25, 2009 3:52:11 PM >> Subject: Partition index by time using Solr >> >> Hi, >> >> I've used Lucene before, but new to Solr. I've gone through the >> mailing list, but unable to find any clear idea on how to partition >> Solr indexes. Here is what we want, >> >> 1) Be able to partition indexes by timestamp - basically partition >> per day (create a new index directory every day) >> >> 2) Be able to search partitions based on timestamp. All our queries >> are time based, so instead of looking into all the partitions I want >> to go directly to the partitions where the data might be. >> >> 3) Be able to purge any data older than 6 months without bringing >> down the application. Since, partitions would be marked by timestamp >> we would just have to delete the old partitions. >> >> >> This is going to be a distributed system with 2 boxes each running >> an instance of Solr. I don't want to replicate data, but each box may >> have same timestamp partition with different data. We would be >> indexing on avg of 20 million documents (each document = 500 bytes) >> with estimate of 10g in index size - evenly distributed across >> machines >> (each machine would get roughly 5g of index everyday). >> >> My questions, >> >> 1) Is this all possible using Solr? If not, should I just do this >> using Lucene or is there any other out-of-box alternative? >> 2) If it's possible in Solr how do we do this - configuration, setup etc. >> 3) How would I optimize the partitions - would it be required when using >> Solr? >> >> Thanks, >> -vivek > >
Re: Partition index by time using Solr
Thanks again Otis. Few more questions, 1) My app currently is a stand-alone java app (not part of Solr JVM) that simply calls update webservice on Solr (running in a separate web container) passing 10k documents at once. In your example you mentioned getting list of Indexers and adding document to them manually - do you mean I use Lucene directly in my app to do the indexing and use Solr just for search purposes? How can I simply write to different cores (using Solr webservice) without putting Lucene code in my app? 2) In MultiCore example on Wiki shows pre-configured cores in the solr.xml. How can I create cores on fly from my app - is there a command (or web service) to tell Solr to load new core? For ex., every day I want to create a new core for that day on fly and index in that core only. Also, would I be able to search on cores created on fly? Currently, I'm using standard out-of-box requests and response handlers for Solr. Would using multi-core require any custom handlers? Thanks, -vivek On Thu, Mar 26, 2009 at 10:38 AM, Otis Gospodnetic wrote: > > Hi, > > 1) Look for "multicore" on Solr Wiki > > 2) I meant to say you would not index it all in one index (that's what you > wanted to do, no?). So in your app you'd do something like > ts = doc.getTimestamp(); > indexer = getIndexer(ts); // gives you different indexer based on the ts. > You keep track of all the indexers (e.g. all instances of solr client you > have in your app, each of which points to a different solr server/core/index) > indexer.index(doc); > > > If your issue is large indices and search performance, then the solution is > not to have multiple solr cores/indices per machine as much as distributed > indexing (multiple servers). Look at DistributedSearch page on the Wiki. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > >
How to optimize Index Process?
Hi, We have a distributed Solr system (2-3 boxes with each running 2 instances of Solr and each Solr instance can write to multiple cores). Our use case is high index volume - we can get up to 100 million records (1 record = 500 bytes) per day, but very low query traffic (only administrators may need to search for data - once an hour our so). So, we need very fast index time. Here are the things I'm trying to find out in order to optimize our index process, 1) What's the optimum index size? I've noticed as the index size grows the indexing time starts increasing. In our test less than 10G index size we could index over 2K/sec, but as it grows over 20G the index rate drops to 1400/sec and keeps dropping as index size grows. I'm trying to see whether we can partition (create new SolrCore) after 10G. - related question, is there a way to find the SolrCore size (any web service for that?) - based on that information I can create a new core and freeze the one which has reached 10G. 2) In our test, we noticed that after few hours (after 8 hours of indexing) there is a period (3-4 hours period) where the indexing is very-very slow (like 500 records/sec) and after that period indexing returns back to normal rate (1500/sec). Does Solr run any optimize command on its own? How can we find that out? I'm not issuing any optimize command - should I be doing that after certain time? 3) Every time I add new documents (10K at once) to the index I see searcher closing and then re-opening/re-warming (in Catalina.out) after commit is done. I'm not sure if this is an expensive operation. Since, our search volume is very low can I configure Solr to not do this? Would it make indexing any faster? Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@33d9337c main Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@46ba6905 main Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main 4) Anything else (any other configuration in Solr - I'm currently using all default settings in the solrconfig.xml and default handlers) that could help optimize my indexing process? Thanks, -vivek
OOM at MultiSegmentReader.norms
Hi, I've index of size 50G (around 100 million documents) and growing - around 2000 records (1 rec = 500 byes) are being written every second continuously. If I make any search on this index I get OOM. I'm using default cache settings (512,512,256) in the solrconfig.xml. The search is using the admin interface (returning 10 rows) with no sorting, faceting or highlighting. Max heap size is 1024m. Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) What could be the problem? Thanks, -vivek
Re: How to optimize Index Process?
Thanks Otis. This is very useful. I'll try all your suggestions and post my findings (and improvements). Thanks, -vivek On Fri, Mar 27, 2009 at 7:08 PM, Otis Gospodnetic wrote: > > Hi, > > Answers inlined. > > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message >> We have a distributed Solr system (2-3 boxes with each running 2 >> instances of Solr and each Solr instance can write to multiple cores). > > Is this really optimal? How many CPU cores do your boxes have vs. the number > of Solr cores? > >> Our use case is high index volume - we can get up to 100 million >> records (1 record = 500 bytes) per day, but very low query traffic >> (only administrators may need to search for data - once an hour our >> so). So, we need very fast index time. Here are the things I'm trying >> to find out in order to optimize our index process, > > It's tarting to sound like you might be able to batch your data and use > http://wiki.apache.org/solr/UpdateCSV -- it's the fastest indexing method, I > believe. > >> 1) What's the optimum index size? I've noticed as the index size grows >> the indexing time starts increasing. In our test less than 10G index >> size we could index over 2K/sec, but as it grows over 20G the index >> rate drops to 1400/sec and keeps dropping as index size grows. I'm >> trying to see whether we can partition (create new SolrCore) after >> 10G. > > That's likely due to Lucene's segment merging. You can make mergeFactor > bigger to make segment merging less frequent, but don't make it to high or > you'll run into open file descriptor limits (which you could raise, of > course). > >> - related question, is there a way to find the SolrCore size (any >> web service for that?) - based on that information I can create a new >> core and freeze the one which has reached 10G. > > You can see the number of docs in an index via Admin Statistics page (the > response is actually XML, look at the source) > >> 2) In our test, we noticed that after few hours (after 8 hours of >> indexing) there is a period (3-4 hours period) where the indexing is >> very-very slow (like 500 records/sec) and after that period indexing >> returns back to normal rate (1500/sec). Does Solr run any optimize >> command on its own? How can we find that out? I'm not issuing any >> optimize command - should I be doing that after certain time? > > No, it doesn't run optimize on its own. It could be running auto-commit, but > you should comment that out anyway. Try doing a thread dump to see what's > doing on and watching the system with top, vmstat. > No, you shouldn't optimize until you are completely done. > >> 3) Every time I add new documents (10K at once) to the index I see >> searcher closing and then re-opening/re-warming (in Catalina.out) >> after commit is done. I'm not sure if this is an expensive operation. >> Since, our search volume is very low can I configure Solr to not do >> this? Would it make indexing any faster? > > Are you running the commit command after every 10K docs? No need to do that > if you don't need your searcher to see the changes immediately. > >> Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close >> INFO: Closing searc...@33d9337c main >> Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) >> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher >> INFO: Opening searc...@46ba6905 main >> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main >> >> 4) Anything else (any other configuration in Solr - I'm currently >> using all default settings in the solrconfig.xml and default handlers) >> that could help optimize my indexing process? > > Increase ramBufferSizeMB as much as you can afford. > Comment out maxBufferedDocs, it's deprecated. > Increase mergeFactor slightly. > Consider the CSV approach. > Index with multiple threads (match the number of CPU cores). > If you are using Solrj, use the Streaming version of SolrServer. > Give the JVM more memory (you'll need it if you increase ramBufferSizeMB) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >
Re: OOM at MultiSegmentReader.norms
Thanks Otis and Mike. I'm indexing total of 9 fields, with 5 having norms turned on. I think I may not need it and will try use the omitNorms for them. How do I make use of RAMBuffer in Solr? I couldn't find anything on this on the Wiki - any pointer? Thanks, -vivek On Sat, Mar 28, 2009 at 1:09 AM, Michael McCandless wrote: > Still, 1024M ought to be enough to load one field's norms (how many > fields have norms?). If you do things requiring FieldCache that'll > also consume RAM. > > It's also possible you're hitting this bug (false OOME) in Sun's JRE: > > http://issues.apache.org/jira/browse/LUCENE-1566 > > Feel free to go vote for it! > > Mike > > On Fri, Mar 27, 2009 at 10:11 PM, Otis Gospodnetic > wrote: >> >> That's a tiny heap. Part of it is used for indexing, too. And the fact >> that your heap is so small shows you are not really making use of that nice >> ramBufferSizeMB setting. :) >> >> Also, use omitNorms="true" for fields that don't need norms (if their types >> don't already do that). >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: vivek sar >>> To: solr-user@lucene.apache.org >>> Sent: Friday, March 27, 2009 6:15:59 PM >>> Subject: OOM at MultiSegmentReader.norms >>> >>> Hi, >>> >>> I've index of size 50G (around 100 million documents) and growing - >>> around 2000 records (1 rec = 500 byes) are being written every second >>> continuously. If I make any search on this index I get OOM. I'm using >>> default cache settings (512,512,256) in the solrconfig.xml. The search >>> is using the admin interface (returning 10 rows) with no sorting, >>> faceting or highlighting. Max heap size is 1024m. >>> >>> Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log >>> SEVERE: java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335) >>> at >>> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) >>> at >>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) >>> at org.apache.lucene.search.Searcher.search(Searcher.java:126) >>> at org.apache.lucene.search.Searcher.search(Searcher.java:105) >>> at >>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) >>> at >>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) >>> at >>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) >>> at >>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) >>> at >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> >>> What could be the problem? >>> >>> Thanks, >>> -vivek >> >> >
Merging Solr Indexes
Hi, As part of speeding up the index process I'm thinking of spawning multiple threads which will write to different temporary SolrCores. Once the index process is done I want to merge all the indexes in temporary cores to a master core. For ex., if I want one SolrCore per day then every index cycle I'll spawn 4 threads which will index into some temporary index and once they are done I want to merge all these into the day core. My questions, 1) I want to use the same schema and solrconfig.xml for all cores without duplicating them - how do I do that? 2) How do I merge the temporary Solr cores into one master core programmatically? I've read the wiki on "MergingSolrIndexes", but I want to do it programmatically (like in Lucene - writer.addIndexes(..)) once the temporary indices are done. 3) Can I remove the temporary indices once the merge process is done? 4) Is this the right strategy to speed up indexing? Thanks, -vivek
Defining DataDir in Multi-Core
Hi, I'm trying to set up cores dynamically. I want to use the same schema.xml and solrconfig.xml for all the created cores, so plan to pass the same instance directory, but different dir directory. Here is what I got in solr.xml by default (I didn't want define any core here, but looks like we have to have at least one core defined before we start the Solr). Now I run the following URL in the browser (as described on wiki - http://wiki.apache.org/solr/CoreAdmin), http://localhost:8080/solr/admin/cores?action=CREATE&name=20090331_1&instanceDir=/Users/opal/temp/chat/solr&dataDir=/Users/opal/temp/chat/solr/data/20090331_1 I get a response, /Users/opal/temp/chat/solr/solr.xml Now when I check the solr.xml I see, Note, there is NO dir directory specified. When I check the status (http://localhost:8080/solr/admin/cores?action=STATUS) I see, core0 /Users/opal/temp/afterchat/solr/./ /Users/opal/temp/afterchat/solr/./data/ ... 20090331_2 /Users/opal/temp/afterchat/solr/ /Users/opal/temp/afterchat/solr/data/ both cores are pointing to the same data directory. My question is how can I create cores on fly and have them point to different data directories so each core write index in different location? Thanks, -vivek
Re: Defining DataDir in Multi-Core
I'm using the latest released one - Solr 1.3. The wiki says passing dataDir to CREATE action (web service) should work, but that doesn't seem to be working. -vivek 2009/3/31 Noble Paul നോബിള് नोब्ळ् : > which version of Solr are you using? if you are using one from trunk , > you can pass the dataDir as an extra parameter? > > On Wed, Apr 1, 2009 at 7:41 AM, vivek sar wrote: >> Hi, >> >> I'm trying to set up cores dynamically. I want to use the same >> schema.xml and solrconfig.xml for all the created cores, so plan to >> pass the same instance directory, but different dir directory. Here is >> what I got in solr.xml by default (I didn't want define any core here, >> but looks like we have to have at least one core defined before we >> start the Solr). >> >> >> >> >> >> >> >> Now I run the following URL in the browser (as described on wiki - >> http://wiki.apache.org/solr/CoreAdmin), >> >> http://localhost:8080/solr/admin/cores?action=CREATE&name=20090331_1&instanceDir=/Users/opal/temp/chat/solr&dataDir=/Users/opal/temp/chat/solr/data/20090331_1 >> >> I get a response, >> >> /Users/opal/temp/chat/solr/solr.xml >> >> Now when I check the solr.xml I see, >> >> >> >> >> >> >> >> >> Note, there is NO dir directory specified. When I check the status >> (http://localhost:8080/solr/admin/cores?action=STATUS) I see, >> >> core0 >> /Users/opal/temp/afterchat/solr/./ >> /Users/opal/temp/afterchat/solr/./data/ >> ... >> >> 20090331_2 >> /Users/opal/temp/afterchat/solr/ >> /Users/opal/temp/afterchat/solr/data/ >> >> both cores are pointing to the same data directory. My question is how >> can I create cores on fly and have them point to different data >> directories so each core write index in different location? >> >> Thanks, >> -vivek >> > > > > -- > --Noble Paul >
Re: Merging Solr Indexes
Thanks Otis. Could you write to same core (same index) from multiple threads at the same time? I thought each writer would lock the index so other can not write at the same time. I'll try it though. Another reason of putting indexes in separate core was to limit the index size. Our index can grow up to 50G a day, so I was hoping writing to smaller indexes would be faster in separate cores and if needed I can merge them at later point (like end of day). I want to keep daily cores. Isn't this a good idea? How else can I limit the index size (beside multiple instances or separate boxes). Thanks, -vivek On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic wrote: > > Let me start with 4) > Have you tried simply using multiple threads to send your docs to a single > Solr instance/core? You should get about the same performance as what you > are trying with your approach below, but without the headache of managing > multiple cores and index merging (not yet possible to do programatically). > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Tuesday, March 31, 2009 1:59:01 PM >> Subject: Merging Solr Indexes >> >> Hi, >> >> As part of speeding up the index process I'm thinking of spawning >> multiple threads which will write to different temporary SolrCores. >> Once the index process is done I want to merge all the indexes in >> temporary cores to a master core. For ex., if I want one SolrCore per >> day then every index cycle I'll spawn 4 threads which will index into >> some temporary index and once they are done I want to merge all these >> into the day core. My questions, >> >> 1) I want to use the same schema and solrconfig.xml for all cores >> without duplicating them - how do I do that? >> 2) How do I merge the temporary Solr cores into one master core >> programmatically? I've read the wiki on "MergingSolrIndexes", but I >> want to do it programmatically (like in Lucene - >> writer.addIndexes(..)) once the temporary indices are done. >> 3) Can I remove the temporary indices once the merge process is done? >> 4) Is this the right strategy to speed up indexing? >> >> Thanks, >> -vivek > >
Re: Defining DataDir in Multi-Core
Thanks Shalin. Is it available in the latest nightly build? Is there any other way I can create cores dynamically (using CREATE service) which will use the same schema.xml and solrconfig.xml, but write to different data directories? Thanks, -vivek On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar wrote: > On Wed, Apr 1, 2009 at 1:48 PM, vivek sar wrote: >> I'm using the latest released one - Solr 1.3. The wiki says passing >> dataDir to CREATE action (web service) should work, but that doesn't >> seem to be working. >> > > That is a Solr 1.4 feature (not released yet). > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Defining DataDir in Multi-Core
Hi, I tried the latest nightly build (04-01-09) - it takes the dataDir property now, but it's creating the Data dir at the wrong location. For ex., I've the following in solr.xml, but, it always seem to be creating the solr/data directory in the cwd (where I started the Tomcat from). Here is the log from Catalina.out, Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/, dataDir=./solr/data/ .. Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex WARNING: [core2] Solr index directory './solr/data/index' doesn't exist. Creating new index... I've also tried relative paths, but to no avail. Is this a bug? Thanks, -vivek On Wed, Apr 1, 2009 at 9:45 AM, vivek sar wrote: > Thanks Shalin. > > Is it available in the latest nightly build? > > Is there any other way I can create cores dynamically (using CREATE > service) which will use the same schema.xml and solrconfig.xml, but > write to different data directories? > > Thanks, > -vivek > > On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar > wrote: >> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar wrote: >>> I'm using the latest released one - Solr 1.3. The wiki says passing >>> dataDir to CREATE action (web service) should work, but that doesn't >>> seem to be working. >>> >> >> That is a Solr 1.4 feature (not released yet). >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >
Re: Runtime exception when adding documents using solrj
Hi, I'm trying to add the list of POJO objects (using annotations) using solrj, but the "server.addBeans(...) " is throwing this exception, org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2 Note, I'm using multi-core. There is no other exception in the solr log. Related question - I'm trying to upgrade the solrj from nightly build, but I get some classnotfound exception (java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory). What are all the dependencies for Solrj1.4 (wiki has only up to 1.3 information). Thanks, -vivek On Wed, Apr 1, 2009 at 3:30 AM, Radha C. wrote: > > Thanks Paul, I resolved it, I missed one field declaration in schema.xml. Now > I added, and it works. > > -Original Message- > From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] > Sent: Wednesday, April 01, 2009 3:52 PM > To: solr-user@lucene.apache.org; cra...@ceiindia.com > Subject: Re: Runtime exception when adding documents using solrj > > Can u take a look at the Solr logs and see what is hapening? > > On Wed, Apr 1, 2009 at 3:19 PM, Radha C. wrote: >> >> Thanks Paul, >> >> I changed the URL but I am getting another error - Bad request , Any help >> will be appriciated. >> >> Exception in thread "main" org.apache.solr.common.SolrException: Bad >> Request Bad Request >> request: http://localhost:8080/solr/update?wt=javabin >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common >> sHttpSolrServer.java:428) >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common >> sHttpSolrServer.java:245) >> at >> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateReque >> st.java:243) >> at >> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >> at SolrIndexTest.main(SolrIndexTest.java:47) >> Java Result: 1 >> >> >> >> >> -Original Message- >> From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] >> Sent: Wednesday, April 01, 2009 2:26 PM >> To: solr-user@lucene.apache.org; cra...@ceiindia.com >> Subject: Re: Runtime exception when adding documents using solrj >> >> the url is wrong >> try this >> CommonsHttpSolrServer server = new >> CommonsHttpSolrServer("http://localhost:8080/solr/";); >> >> On Wed, Apr 1, 2009 at 2:04 PM, Radha C. wrote: >>> >>> Can anyone please tell me , what is the issue with the below java code.. >>> >>> -Original Message- >>> From: Radha C. [mailto:cra...@ceiindia.com] >>> Sent: Wednesday, April 01, 2009 12:28 PM >>> To: solr-user@lucene.apache.org >>> Subject: RE: Runtime exception when adding documents using solrj >>> >>> >>> I am using Solr 1.3 version >>> >>> _ >>> >>> From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] >>> Sent: Wednesday, April 01, 2009 12:16 PM >>> To: solr-user@lucene.apache.org; cra...@ceiindia.com >>> Subject: Re: Runtime exception when adding documents using solrj >>> >>> >>> which version of Solr are you using? >>> >>> >>> On Wed, Apr 1, 2009 at 12:01 PM, Radha C. wrote: >>> >>> >>> Hi All, >>> >>> I am trying to index documents by using solrj client. I have written >>> a simple code below, >>> >>> { >>> CommonsHttpSolrServer server = new >>> CommonsHttpSolrServer("http://localhost:8080/solr/update";); >>> SolrInputDocument doc1=new SolrInputDocument(); >>> doc1.addField( "id", "id1", 1.0f ); >>> doc1.addField( "name", "doc1", 1.0f ); >>> doc1.addField( "price", 10 ); >>> SolrInputDocument doc2 = new SolrInputDocument(); >>> doc2.addField( "id", "id2", 1.0f ); >>> doc2.addField( "name", "doc2", 1.0f ); >>> doc2.addField( "price", 20 ); >>> Collection docs = new >>> ArrayList(); >>> docs.add( doc1 ); >>> docs.add( doc2 ); >>> server.add(docs); >>> server.commit(); >>> } >>> >>> But I am getting the below error, Can anyone tell me what is the >>> wrong with the above code. >>> >>> Exception in thread "main" java.lang.RuntimeException: Invalid >>> version or the data in not in 'javabin' format >>> at >>> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java: >>> 9 >>> 8) >>> at >>> org.apache.solr.client.solrj.impl.BinaryResponseParser.processRespons >>> e >>> (Binar >>> yResponseParser.java:39) >>> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commo >>> n >>> sHttpS >>> olrServer.java:470) >>> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commo >>> n >>> sHttpS >>> olrServer.java:245) >>> at >>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequ >>> e >>> st.jav >>> a:243) >>> at >>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >>> at SolrIndexTest.main(SolrIndexTest.java:46) >>> Java Result: 1 >>> >>> >>> >>> >>> >>> >>> -- >>> --Noble Paul >>> >
Re: Runtime exception when adding documents using solrj
Thanks Shalin. I added that in the solrconfig.xml, but now I get this exception, org.apache.solr.common.SolrException: Not Found Not Found request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2 I do have the "core0" under the solr.home. The core0 directory also contains the conf and data directories. The solr.xml has following in it, Am I missing anything else? Thanks, -vivek On Wed, Apr 1, 2009 at 1:02 PM, Shalin Shekhar Mangar wrote: > On Thu, Apr 2, 2009 at 1:13 AM, vivek sar wrote: >> Hi, >> >> I'm trying to add the list of POJO objects (using annotations) using >> solrj, but the "server.addBeans(...) " is throwing this exception, >> >> org.apache.solr.common.SolrException: Bad Request >> Bad Request >> request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2 >> >> Note, I'm using multi-core. There is no other exception in the solr log. >> > > Can you make sure all the cores' solrconfig.xml have the following line? > > class="solr.BinaryUpdateRequestHandler" /> > > The above is needed for binary update format to work. I don't think > the multi core example solrconfig.xml in solr nightly builds contain > this line. > >> Related question - I'm trying to upgrade the solrj from nightly build, >> but I get some classnotfound exception >> (java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory). What are >> all the dependencies for Solrj1.4 (wiki has only up to 1.3 >> information). >> > > I think you need slf4j-api-1.5.5.jar and slf4j-jdk14-1.5.5.jar. Both > can be found in solr's nightly downloads in the lib directory. > > -- > Regards, > Shalin Shekhar Mangar. >
java.lang.ClassCastException: java.lang.Long using Solrj
Hi, I'm using solrj (released v 1.3) to add my POJO objects (server.addbeans(...)), but I'm getting this exception, java.lang.ClassCastException: java.lang.Long at org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) I don't have any "Long" member variable in my java object - so not sure where is this coming from. I've checked the schema.xml to make sure the data types are ok. I'm adding 15K objects at a time - I'm assuming that should be ok. Any ideas? Thanks, -vivek
Re: Runtime exception when adding documents using solrj
Hello Shalin, Looks like I was using old version of solrconfig.xml (from Solr 1.2). After I updated to the latest solrconfig.xml (from 1.4) it seems to be working fine. Another question I got is how would I search on multi-cores, 1) If I want to search for a word in two different cores? 2) If I want to search for a word in all the cores. 3) How would I search on multiple cores on multiple machines? Single core I'm able to search like, http://localhost:8080/solr/20090402/select?q=*:* Thanks, -vivek -- Just in case if this might be helpful to others who might be trying to use Solr multicore. Here is what I tried. 1) Created this directory structure - multicore/core0 (put in the conf directory - with schema.xml and solrconfig.xml - under core0) multicore/core1 Make multicore as the solr.home and put the solr.xml under there 2) Added couple of cores in the solr.xml, Here core1 is using instancedir of core0 (using same schema.xml and solrconfig.xml). 2) Started Solr 3) Data/index directory is created under both cores 4) Tried following URLs, a) http://localhost:8080/solr/admin/cores - admin interface for both cores b) http://localhost:8080/solr/core0/admin/ - I see the single core admin page c) http://localhost:8080/solr/admin/cores?action=STATUS - same as a d) http://localhost:8080/solr/admin/cores?action=STATUS&core=core0 - same as b e) http://localhost:8080/solr/core0/select?q=*:* - shows result xml 5) I then created the core dynamically using CREATE service (this requires Solr 1.4), http://localhost:8080/solr/admin/cores?action=CREATE&name=20090402&instanceDir=/Users/opal/temp/chat/solr/multicore/core0&dataDir=/Users/opal/temp/chat/solr/multicore/20090402/data - this dynamically updated the solr.xml and created a directory structure (20090402/data) on the file system. 6) The use solrj to add beans to the recently created core On Wed, Apr 1, 2009 at 8:26 PM, Shalin Shekhar Mangar wrote: > On Thu, Apr 2, 2009 at 2:34 AM, vivek sar wrote: >> Thanks Shalin. >> >> I added that in the solrconfig.xml, but now I get this exception, >> >> org.apache.solr.common.SolrException: Not Found >> Not Found >> request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2 >> >> I do have the "core0" under the solr.home. The core0 directory also >> contains the conf and data directories. The solr.xml has following in >> it, >> >> >> >> >> >> > > Are you able to see the Solr admin dashboard at > http://localhost:8080/solr/core0/admin/ ? Are there any exceptions in > Solr log? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: java.lang.ClassCastException: java.lang.Long using Solrj
Thanks Noble. That helped - turned out there was field name mismatch in my bean. 2009/4/1 Noble Paul നോബിള് नोब्ळ् : > The classcast exception is misleading. It happens because the response > itself was some error response. > > debug it by setting the XmlResponseParser > http://wiki.apache.org/solr/Solrj#head-12c26b2d7806432c88b26cf66e236e9bd6e91849 > > On Thu, Apr 2, 2009 at 4:21 AM, vivek sar wrote: >> Hi, >> >> I'm using solrj (released v 1.3) to add my POJO objects >> (server.addbeans(...)), but I'm getting this exception, >> >> java.lang.ClassCastException: java.lang.Long >> at >> org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) >> at >> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) >> at >> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) >> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >> at >> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) >> >> I don't have any "Long" member variable in my java object - so not >> sure where is this coming from. I've checked the schema.xml to make >> sure the data types are ok. I'm adding 15K objects at a time - I'm >> assuming that should be ok. >> >> Any ideas? >> >> Thanks, >> -vivek >> > > > > -- > --Noble Paul >
Searching on mulit-core Solr
Hi, I've a multi-core system (one core per day), so there would be around 30 cores in a month on a box running one Solr instance. We have two boxes running the Solr instance and input data is feeded to them in round-robin fashion. Each box can have up to 30 cores in a month. Here are questions, 1) How would I search for a term in multiple cores on same box? Single core I'm able to search like, http://localhost:8080/solr/20090402/select?q=*:* 2) How would I search for a term in multiple cores on both boxes at the same time? 3) Is it possible to have two Solr instances on one box with one doing the indexing and other perform only searches on that index? The idea is have two JVMs with each doing its own task - I'm not sure whether the indexer process needs to know about searcher process - like do they need to have the same solr.xml (for multicore etc). We don't want to replicate the indexes also (we got very light search traffic, but very high indexing traffic) so they need to use the same index. Thanks, -vivek
Solr 1.4 (nightly build) seem hung under load
Hi, I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my application with Solr. My app uses Solrj to write to remote Solr (on same box, but different JVM). The stress test sends over 2 million records (1 record = 500 bytes, with each record having 10 fields) within 5 minutes. All was working fine (with 2 million records processed - 2G index size) and all the sudden Solr stopped responding - I call server.addBeans(...) passing 15K object and don't get any response for over an hour (usually it returns in 5 sec). I've 3 threads writing to the same index at the same time - not sure if that could cause any problem. I was told by Otis that it should be ok to have multiple threads write to same index - so I'm assuming it's ok, though from thread dump I do see couple of "update" threads waiting on ReadWriteLock and another thread (pool-6-thread-1) have a lock on SolrWriter. Attached is the thread dump of the Tomcat process where Solr is running. Any ideas? Thanks, -vivek
Re: Solr 1.4 (nightly build) seem hung under load
Just an update on this issue, the Solr did come back after 80 min - so not sure where was it stuck. I do use RAMBuffer of 64MB and have heap size of 6G. There is no error is Solr log and I'd it running under WARNING level so missed the INFO if there was any during that period. I'm also not running any "optimize" command. What could cause Solr to hang for 80 min? Thanks, -vivek On Fri, Apr 3, 2009 at 1:55 PM, vivek sar wrote: > Hi, > > I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my > application with Solr. My app uses Solrj to write to remote Solr (on > same box, but different JVM). The stress test sends over 2 million > records (1 record = 500 bytes, with each record having 10 fields) > within 5 minutes. All was working fine (with 2 million records > processed - 2G index size) and all the sudden Solr stopped responding > - I call server.addBeans(...) passing 15K object and don't get any > response for over an hour (usually it returns in 5 sec). > > I've 3 threads writing to the same index at the same time - not sure > if that could cause any problem. I was told by Otis that it should be > ok to have multiple threads write to same index - so I'm assuming it's > ok, though from thread dump I do see couple of "update" threads > waiting on ReadWriteLock and another thread (pool-6-thread-1) have a > lock on SolrWriter. > > Attached is the thread dump of the Tomcat process where Solr is > running. Any ideas? > > Thanks, > -vivek >
Re: Solr 1.4 (nightly build) seem hung under load
Hi, more update. It happened again and this time I'd INFO logged in the Solr log, INFO: {add=[330274716, 330274717, 330274718, 330274719, 330274720, 330274721, 330274722, 330274723, ...(14992 more)]} 0 6041 Apr 3, 2009 10:38:01 PM org.apache.solr.core.SolrCore execute INFO: [20090403] webapp=/solr path=/update params={wt=javabin} status=0 QTime=6041 Apr 3, 2009 10:38:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) It's still hung at commit even after 30 min. So, looks like it takes a long time to commit the records. I'm committing the records myself, but have the auto-commit turned on in the solrconfig.xml, 1000 100 In 15 min time period I'm getting approximately 6 million documents/records. Earlier I've read in the mailing list that we shouldn't be committing very often and now it seems not committing on time makes the commit process take forever. I want the records searchable every 30 min basically. So, 30 min old data is ok for searching, but indexing shouldn't slow down. 1) So, what's the good commit strategy? 2) How often (on how many records) should I do this? 3) Should I do it programmatically or can I have it in the solrconfig.xml? Thanks, -vivek On Fri, Apr 3, 2009 at 2:27 PM, vivek sar wrote: > Just an update on this issue, the Solr did come back after 80 min - so > not sure where was it stuck. I do use RAMBuffer of 64MB and have heap > size of 6G. > > There is no error is Solr log and I'd it running under WARNING level > so missed the INFO if there was any during that period. I'm also not > running any "optimize" command. What could cause Solr to hang for 80 > min? > > Thanks, > -vivek > > On Fri, Apr 3, 2009 at 1:55 PM, vivek sar wrote: >> Hi, >> >> I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my >> application with Solr. My app uses Solrj to write to remote Solr (on >> same box, but different JVM). The stress test sends over 2 million >> records (1 record = 500 bytes, with each record having 10 fields) >> within 5 minutes. All was working fine (with 2 million records >> processed - 2G index size) and all the sudden Solr stopped responding >> - I call server.addBeans(...) passing 15K object and don't get any >> response for over an hour (usually it returns in 5 sec). >> >> I've 3 threads writing to the same index at the same time - not sure >> if that could cause any problem. I was told by Otis that it should be >> ok to have multiple threads write to same index - so I'm assuming it's >> ok, though from thread dump I do see couple of "update" threads >> waiting on ReadWriteLock and another thread (pool-6-thread-1) have a >> lock on SolrWriter. >> >> Attached is the thread dump of the Tomcat process where Solr is >> running. Any ideas? >> >> Thanks, >> -vivek >> >
httpclient.ProtocolException using Solrj
Hi, I'm sending 15K records at once using Solrj (server.addBeans(...)) and have two threads writing to same index. One thread goes fine, but the second thread always fails with, org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) at com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44) at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77) at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:417) Does anyone know what could be the problem? Thanks, -vivek
Re: Searching on mulit-core Solr
Hi, Any help on this. I've looked at DistributedSearch on Wiki, but that doesn't seem to be working for me on multi-core and multiple Solr instances on the same box. Scenario, 1) Two boxes (localhost, 10.4.x.x) 2) Two Solr instances on each box (8080 and 8085 ports) 3) Two cores on each instance (core0, core1) I'm not sure how to construct my search on the above setup if I need to search across all the cores on all the boxes. Here is what I'm trying, http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan I get 404 error. Is this the right URL construction for my setup? How else can I do this? Thanks, -vivek On Fri, Apr 3, 2009 at 1:02 PM, vivek sar wrote: > Hi, > > I've a multi-core system (one core per day), so there would be around > 30 cores in a month on a box running one Solr instance. We have two > boxes running the Solr instance and input data is feeded to them in > round-robin fashion. Each box can have up to 30 cores in a month. Here > are questions, > > 1) How would I search for a term in multiple cores on same box? > > Single core I'm able to search like, > http://localhost:8080/solr/20090402/select?q=*:* > > 2) How would I search for a term in multiple cores on both boxes at > the same time? > > 3) Is it possible to have two Solr instances on one box with one doing > the indexing and other perform only searches on that index? The idea > is have two JVMs with each doing its own task - I'm not sure whether > the indexer process needs to know about searcher process - like do > they need to have the same solr.xml (for multicore etc). We don't want > to replicate the indexes also (we got very light search traffic, but > very high indexing traffic) so they need to use the same index. > > > Thanks, > -vivek >
Re: httpclient.ProtocolException using Solrj
Hi, Any ideas on this issue? I ran into this again - once it starts happening it keeps happening. One of the thread keeps failing. Here are my SolrServer settings, int socketTO = 0; int connectionTO = 100; int maxConnectionPerHost = 10; int maxTotalConnection = 50; boolean followRedirects = false; boolean allowCompression = true; int maxRetries = 1; Note, I'm using two threads to simultaneously write to the same index. org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) Thanks, -vivek On Sat, Apr 4, 2009 at 1:07 AM, vivek sar wrote: > Hi, > > I'm sending 15K records at once using Solrj (server.addBeans(...)) > and have two threads writing to same index. One thread goes fine, but > the second thread always fails with, > > > org.apache.solr.client.solrj.SolrServerException: > org.apache.commons.httpclient.ProtocolException: Unbuffered entity > enclosing request can not be repeated. > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) > at > org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) > at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) > at > com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44) > at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77) > at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:637) > Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered > entity enclosing request can not be repeated. > at > org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) > at > org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) > at > org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) > at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:417) > > Does anyone know what could be the problem? > > Thanks, > -vivek >
Re: Searching on mulit-core Solr
m to be working for me on multi-core and multiple Solr >>instances on the same box. >> >>Scenario, >> >>1) Two boxes (localhost, 10.4.x.x) >>2) Two Solr instances on each box (8080 and 8085 ports) >>3) Two cores on each instance (core0, core1) >> >>I'm not sure how to construct my search on the above setup if I need >>to search across all the cores on all the boxes. Here is what I'm >>trying, >> >>http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan >> >>I get 404 error. Is this the right URL construction for my setup? How >>else can I do this? >> >>Thanks, >>-vivek >> >>On Fri, Apr 3, 2009 at 1:02 PM, vivek sar wrote: >>> Hi, >>> >>> I've a multi-core system (one core per day), so there would be around >>> 30 cores in a month on a box running one Solr instance. We have two >>> boxes running the Solr instance and input data is feeded to them in >>> round-robin fashion. Each box can have up to 30 cores in a month. Here >>> are questions, >>> >>> 1) How would I search for a term in multiple cores on same box? >>> >>> Single core I'm able to search like, >>> http://localhost:8080/solr/20090402/select?q=*:* >>> >>> 2) How would I search for a term in multiple cores on both boxes at >>> the same time? >>> >>> 3) Is it possible to have two Solr instances on one box with one doing >>> the indexing and other perform only searches on that index? The idea >>> is have two JVMs with each doing its own task - I'm not sure whether >>> the indexer process needs to know about searcher process - like do >>> they need to have the same solr.xml (for multicore etc). We don't want >>> to replicate the indexes also (we got very light search traffic, but >>> very high indexing traffic) so they need to use the same index. >>> >>> >>> Thanks, >>> -vivek >>> > > -- > > === > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > === >
Re: httpclient.ProtocolException using Solrj
single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()-> Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using "simple" lock type. I'd tried "single" type before that once caused index corruption so I switched to "simple". Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् : > do you see the same problem when you use a single thread? > > what is the version of SolrJ that you use? > > > > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: >> Hi, >> >> Any ideas on this issue? I ran into this again - once it starts >> happening it keeps happening. One of the thread keeps failing. Here >> are my SolrServer settings, >> >> int socketTO = 0; >> int connectionTO = 100; >> int maxConnectionPerHost = 10; >> int maxTotalConnection = 50; >> boolean followRedirects = false; >> boolean allowCompression = true; >> int maxRetries = 1; >> >> Note, I'm using two threads to simultaneously write to the same index. >> >> org.apache.solr.client.solrj.SolrServerException: >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity >> enclosing request can not be repeated. >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) >> at >> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) >> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >> at >> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) >> >> Thanks, >> -vivek >> >> On Sat, Apr 4, 2009 at 1:07 AM, vivek sar wrote: >>> Hi, >>> >>> I'm sending 15K records at once using Solrj (server.addBeans(...)) >>> and have two threads writing to same index. One thread goes fine, but >>> the second thread always fails with, >>> >>> >>> org.apache.solr.client.solrj.SolrServerException: >>> org.apache.commons.httpclient.ProtocolException: Unbuffered entity >>> enclosing request can not be repeated. >>> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) >>> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) >>> at >>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) >>> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >>> at >>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) >>> at >>> com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44) >>> at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77) >>> at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
Re: Searching on mulit-core Solr
Any help on this issue? Would distributed search on multi-core on same Solr instance even work? Does it has to be different Solr instances altogether (separate shards)? I'm kind of stuck at this point right now. Keep getting one of the two errors (when running distributed search - single searches work fine) as mentioned in this thread earlier. Thanks, -vivek On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: > Thanks Fergus. I'm still having problem with multicore search. > > I tried the following with two cores (they both share the same schema > and solrconfig.xml) on the same box on same solr instance, > > 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the > cores in admin interface > 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in > xml > 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, > gives me top 10 records > 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, > gives me top 10 records > 5) > http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan > - this FAILS. I've seen two problems with this. > > a) When index are being committed I see, > > SEVERE: org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:637) > > b) Other times I see this, > > SEVERE: java.lang.NullPointerException > at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) > at > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Prot
Re: httpclient.ProtocolException using Solrj
Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws "connection reset" exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar wrote: > Vivek, do you share the same SolrServer instance between your two threads? > If so, are you using the MultiThreadedHttpConnectionManager when creating > the HttpClient instance? > > On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: > >> single thread everything works fine. Two threads are fine too for a >> while and all the sudden problem starts happening. >> >> I tried indexing using REST services as well (instead of Solrj), but >> with that too I get following error after a while, >> >> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - >> indexData()-> Failed to index >> java.net.SocketException: Broken pipe >> at java.net.SocketOutputStream.socketWrite0(Native Method) >> at >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >> at java.io.FilterOutputStream.write(FilterOutputStream.java:80) >> at >> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) >> at >> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) >> at >> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >> at >> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) >> at >> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) >> at >> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) >> at >> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) >> at >> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) >> >> >> Note, I'm using "simple" lock type. I'd tried "single" type before >> that once caused index corruption so I switched to "simple". >> >> Thanks, >> -vivek >> >> 2009/4/8 Noble Paul നോബിള് नोब्ळ् : >> > do you see the same problem when you use a single thread? >> > >> > what is the version of SolrJ that you use? >> > >> > >> > >> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: >> >> Hi, >> >> >> >> Any ideas on this issue? I ran into this again - once it starts >> >> happening it keeps happening. One of the thread keeps failing. Here >> >> are my SolrServer settings, >> >> >> >> int socketTO = 0; >> >> int connectionTO = 100; >> >> int maxConnectionPerHost = 10; >> >> int maxTotalConnection = 50; >> >> boolean followRedirects = false; >> >> boolean allowCompression = true; >> >> int maxRetries = 1; >> >> >> >> Note, I'm using two threads to simultaneously write to the same index. >> >> >> >> org.apache.solr.client.solrj.SolrServerException: >> >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity >> >> enclosing request can not be repeated. >> >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) >> >> at >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) >> >> at >> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) >> >> at >> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >> >> at >> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) >> >> >>
Re: Searching on mulit-core Solr
Hi, I've gone through the mailing archive and have read contradicting remarks on this issue. Can someone please clear this up as I'm not able to run distributed search on multi-cores. Is there any document on how can I search across multicore which share the same schema. Here are the various comments I've read on this mailing list, 1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781 Don't think you can search against multiple cores "automatically" - i.e. got to make multiple queries, one for each core and combine results yourself. Yes, this will slow things down. - Otis 2) http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173 The idea behind multicore is that you will use them if you have completely different type of documents (basically multiple schemas). - Shalin 3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229 That should work, yes, though it may not be a wise thing to do performance-wise, if the number of CPU cores that solr server has is lower than the number of Solr cores. - Otis My only motivation behind using multi-core is to keep the index size in limit. All my cores are using the same schema. My index grow to over 30G within a day and I need to keep up to a year of data. I couldn't find any other way of scaling using Solr. I've noticed once the index grows above 10G the index process starts slowing down, the commit takes much longer and optimize is hard to finish. So, I'm trying to create a new core after every 10 million documents (equals to 10G in my case). I don't want to start new Solr instance every 10G - that won't scale for a year time. I'm going to use 3-4 servers to hold all these cores. Now if someone could please tell me if this is a wrong scaling architecture I could re-think. I want fast indexing at the same time fast enough search. If I've to search on each core separately and merge myself the search performance is going to be awful. Is Solr the right tool for managing billions of records (I can get up to 100million records every day - with 1Kb per record - 100GB of index a day)? Most of the field values are pretty distinct (like 10 million email addresses) so the index size would be huge too. I would think it's a common problem to scale huge size index keeping both indexing and search time acceptable. I'm not sure if this can be managed on just 4 servers - we don't have 100s of boxes for this project. Any other tool that might be more appropriate for this kind of case - like Katta or Lucene on Hadoop, or simply use Lucene using Parallel Search and partition the indexes on size? Thanks, -vivek On Wed, Apr 8, 2009 at 11:07 AM, vivek sar wrote: > Any help on this issue? Would distributed search on multi-core on same > Solr instance even work? Does it has to be different Solr instances > altogether (separate shards)? > > I'm kind of stuck at this point right now. Keep getting one of the two > errors (when running distributed search - single searches work fine) > as mentioned in this thread earlier. > > Thanks, > -vivek > > On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: >> Thanks Fergus. I'm still having problem with multicore search. >> >> I tried the following with two cores (they both share the same schema >> and solrconfig.xml) on the same box on same solr instance, >> >> 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the >> cores in admin interface >> 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in >> xml >> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, >> gives me top 10 records >> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, >> gives me top 10 records >> 5) >> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan >> - this FAILS. I've seen two problems with this. >> >> a) When index are being committed I see, >> >> SEVERE: org.apache.solr.common.SolrException: >> org.apache.solr.client.solrj.SolrServerException: >> java.net.SocketException: Connection reset >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >>
Re: Searching on mulit-core Solr
Erik, Here is what I'd posted in this thread earlier, I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan - this FAILS. I've seen two problems with this. a) This is the error most of the times, SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) b) When index are being committed I see this during search, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Any tips on how can I search on multicore on same solr instance? Thanks, -vivek On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher wrote: > > On Apr 9, 2009, at 3:00 AM, vivek sar wrote: >> >> Can someone please clear this up as I'm not >> able to run distributed search on multi-cores. > > What error or problem are you encountering when trying this? How are you > trying it? > > Erik > >
Re: Searching on mulit-core Solr
Attached is the solr.xml - note, the schema and solrconfig are located in the core0 and all other cores point to the same core0 instance for schema. Searches on individual cores work fine so I'm using the solr.xml is correct - I also get their status correctly. From the "NullPointerException" it seems it fails at, for (int i=resultSize-1; i>=0; i--) { ShardDoc shardDoc = (ShardDoc)queue.pop(); shardDoc.positionInResponse = i; // Need the toString() for correlation with other lists that must // be strings (like keys in highlighting, explain, etc) resultIds.put(shardDoc.id.toString(), shardDoc); } I've a unique field (required) in my documents so I'm not sure whether that can be null - could doc itself be null - how? Same search on the same cores individually works fine. Not sure if there is a way to debug this. I'm not sure on when would I get "Connection reset" exception - would it be if indexing is happening at the same time at hight rate - would that cause problems? Thanks, -vivek On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie wrote: >>Any help on this issue? Would distributed search on multi-core on same >>Solr instance even work? Does it has to be different Solr instances >>altogether (separate shards)? > > As best I can tell this works fine for me. Multiple cores on the one > machine. Very different schema and solrconfig.xml for each of the > cores. Distributed searching using shards works fine. But I am using > the trunk version. > > Perhaps you should post your solr.xml file. > >>I'm kind of stuck at this point right now. Keep getting one of the two >>errors (when running distributed search - single searches work fine) >>as mentioned in this thread earlier. >> >>Thanks, >>-vivek >> >>On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: >>> Thanks Fergus. I'm still having problem with multicore search. >>> >>> I tried the following with two cores (they both share the same schema >>> and solrconfig.xml) on the same box on same solr instance, >>> >>> 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the >>> cores in admin interface >>> 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores >>> in xml >>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, >>> gives me top 10 records >>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, >>> gives me top 10 records >>> 5) >>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan >>> - this FAILS. I've seen two problems with this. >>> >>> a) When index are being committed I see, >>> >>> SEVERE: org.apache.solr.common.SolrException: >>> org.apache.solr.client.solrj.SolrServerException: >>> java.net.SocketException: Connection reset >>> at >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> at >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >>> at >>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >>> at >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>
Re: httpclient.ProtocolException using Solrj
I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : > how many documents are you inserting ? > may be you can create multiple instances of CommonshttpSolrServer and > upload in parallel > > > On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: >> Thanks Shalin and Paul. >> >> I'm not using MultipartRequest. I do share the same SolrServer between >> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm >> simply using CommonsHttpSolrServer to create the SolrServer. I've also >> tried StreamingUpdateSolrServer, which works much faster, but does >> throws "connection reset" exception once in a while. >> >> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find >> anything on it on Wiki. >> >> I was also thinking of using EmbeddedSolrServer - in what case would I >> be able to use it? Does my application and the Solr web app need to >> run into the same JVM for this to work? How would I use the >> EmbeddedSolrServer? >> >> Thanks, >> -vivek >> >> >> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar >> wrote: >>> Vivek, do you share the same SolrServer instance between your two threads? >>> If so, are you using the MultiThreadedHttpConnectionManager when creating >>> the HttpClient instance? >>> >>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: >>> >>>> single thread everything works fine. Two threads are fine too for a >>>> while and all the sudden problem starts happening. >>>> >>>> I tried indexing using REST services as well (instead of Solrj), but >>>> with that too I get following error after a while, >>>> >>>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - >>>> indexData()-> Failed to index >>>> java.net.SocketException: Broken pipe >>>> at java.net.SocketOutputStream.socketWrite0(Native Method) >>>> at >>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >>>> at java.io.FilterOutputStream.write(FilterOutputStream.java:80) >>>> at >>>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) >>>> at >>>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) >>>> at >>>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >>>> at >>>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) >>>> at >>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) >>>> at >>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) >>>> at >>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) >>>> at >>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) >>>> >>>> >>>> Note, I'm using "simple" lock type. I'd tried "single" type before >>>> that once caused index corruption so I switched to "simple". >>>> >>>> Thanks, >>>> -vivek >>>> >>>> 2009/4/8 Noble Paul നോബിള് नोब्ळ् : >>>> > do you see the same problem when you use a single thread? >>>> > >>>> > what is the version of SolrJ that you use? >>>> > >>>> > >>>> > >>>> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: >>>> >> Hi, >>>> >> >>>> >> Any ideas on this issue? I ran into this again - once it starts >>>> >> happening it keeps happening. One of the thread keeps failing. Here >>>> >> are my SolrServer settings, >>>> >> >>>> >>
Re: httpclient.ProtocolException using Solrj
Here is what I'm doing, SolrServer server = new StreamingUpdateSolrServer(url, 1000,5); server.addBeans(dataList); //where dataList is List with 10K elements I run two threads each using the same server object and then each call server.addBeans(...). I'm able to get 50K/sec inserted using that, but the commit after that (after 100k records) takes 70sec - which messes up the avg time. There are two problems here, 1) Once in a while I get "connection reset" error, Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) Note: if I use CommonsHttpSolrServer I get the buffer error. 2) The commit takes way too long for every 100k (I may commit more often if this can not be improved) I'm trying to fix this error problem which happens only if I run two threads both calling addBeans (10k at a time). One thread work fine. I'm not sure how can I use the MultiThreadedConnectionManager to create StreamingUpdateSolrServer and if they would help? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : > using a single request is the fatest > > http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 > > I could index at the rate of 10,000 docs/sec using this and > BinaryRequestWriter > > On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote: >> I'm inserting 10K in a batch (using addBeans method). I read somewhere >> in the wiki that it's better to use the same instance of SolrServer >> for better performance. Would MultiThreadedConnectionManager help? How >> do I use it? >> >> I also wanted to know how can use EmbeddedSolrServer - does my app >> needs to be running in the same jvm with Solr webapp? >> >> Thanks, >> -vivek >> >> 2009/4/9 Noble Paul നോബിള് नोब्ळ् : >>> how many documents are you inserting ? >>> may be you can create multiple instances of CommonshttpSolrServer and >>> upload in parallel >>> >>> >>> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: >>>> Thanks Shalin and Paul. >>>> >>>> I'm not using MultipartRequest. I do share the same SolrServer between >>>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm >>>> simply using CommonsHttpSolrServer to create the SolrServer. I've also >>>> tried StreamingUpdateSolrServer, which works much faster, but does >>>> throws "connection reset" exception once in a while. >>>> >>>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find >>>> anything on it on Wiki. >>>> >>>> I was also thinking of using EmbeddedSolrServer - in what case would I >>>> be able to use it? Does my application and the Solr web app need to >>>> run into the same JVM for this to work? How would I use the >>>> EmbeddedSolrServer? >>>> >>>> Thanks, >>>> -vivek >>>> >>>> >>>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar >>>> wrote: >>>>> Vivek, do you share the same SolrServer instance between your two threads? >>>>> If so, are you using the MultiThreadedHttpConnectionManager when creating >>>>> the HttpClient instance? >>>>> >>>>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: >>>>> >>>>>> single thread everything works fine. Two threads are fine too for a >>>>>> while and all the sudden problem starts happening. >>>>>> >>>>>> I tried indexing using REST services as well (instead of Solrj), but >>>>>> with that too I get following error after a while, >>>>>> >>>>>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - >>>>>> indexData()-> Failed to index >>>>>> java.net.SocketException: Broken pipe >>>>>> at java.net.SocketOutputStream.socketWrite0(Native Method) >>>>>> at >>>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>>>>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>>>>> at >>>>>> java.io.Buffe
Question on Solr Distributed Search
Hi, I've another thread on multi-core distributed search, but just wanted to put a simple question here on distributed search to get some response. I've a search query, http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - returns with 10 result now if I add "shards" parameter to it, http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa - this fails with org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at .. at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) .. Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) Attached is my solrconfig.xml. Do I need a special RequestHandler for sharding? I haven't been able to make any distributed search successfully. Any help is appreciated. Note: I'm indexing using Solrj - not sure if that makes any difference to the search part. Thanks, -vivek true 100 64 2147483647 1 1000 1 single true 100 64 2147483647 1 true single 1024 false 10 false explicit inStock:true text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 2<-1 5<-2 6<90% 5 solr
Re: Question on Solr Distributed Search
I think the reason behind the "connection reset" is. Looking at the code it points to QueryComponent.mergeIds() resultIds.put(shardDoc.id.toString(), shardDoc); looks like the doc unique id is returning null. I'm not sure how is it possible as its a required field. Right my unique id is not stored (only indexed) - does it has to be stored for distributed search? HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) On Thu, Apr 9, 2009 at 5:01 PM, vivek sar wrote: > Hi, > > I've another thread on multi-core distributed search, but just > wanted to put a simple question here on distributed search to get some > response. I've a search query, > > http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - > returns with 10 result > > now if I add "shards" parameter to it, > > http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa > - this fails with > > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset > org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) > at > .. > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:637) > Caused by: org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) > .. > Caused by: java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:168) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at > org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) > at > org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) > at > org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) > at > org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) > at > org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) > > Attached is my solrconfig.xml. Do I need a special RequestHandler for > sharding? I haven't been able to make any distributed search > successfully. Any help is appreciated. > > Note: I'm indexing using Solrj - not sure if that makes any difference > to the search part. > > Thanks, > -vivek >
Re: Question on Solr Distributed Search
Just an update. I changed the schema to store the unique id field, but I still get the connection reset exception. I did notice that if there is no data in the core then it returns the 0 result (no exception), but if there is data and you search using "shards" parameter I get the connection reset exception. Can anyone provide some tip on where can I look for this problem? Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) ... 1 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) On Thu, Apr 9, 2009 at 6:51 PM, vivek sar wrote: > I think the reason behind the "connection reset" is. Looking at the > code it points to QueryComponent.mergeIds() > > resultIds.put(shardDoc.id.toString(), shardDoc); > > looks like the doc unique id is returning null. I'm not sure how is it > possible as its a required field. Right my unique id is not stored > (only indexed) - does it has to be stored for distributed search? > > HTTP Status 500 - null java.lang.NullPointerException at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) > at > org.apache.solr.handler.compon
Re: Question on Solr Distributed Search
yes - it's all new indexes. I can search them individually, but adding "shards" throws "Connection Reset" error. Is there any way I can debug this or any other pointers? -vivek On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar wrote: > On Fri, Apr 10, 2009 at 7:50 AM, vivek sar wrote: > >> Just an update. I changed the schema to store the unique id field, but >> I still get the connection reset exception. I did notice that if there >> is no data in the core then it returns the 0 result (no exception), >> but if there is data and you search using "shards" parameter I get the >> connection reset exception. Can anyone provide some tip on where can I >> look for this problem? >> >> > Did you re-index after changing the field to stored? > -- > Regards, > Shalin Shekhar Mangar. >
Question on StreamingUpdateSolrServer
Hi, I was using CommonsHttpSolrServer for indexing, but having two threads writing (10K batches) at the same time was throwing, "ProtocolException: Unbuffered entity enclosing request can not be repeated. " I switched to StreamingUpdateSolrServer (using addBeans) and I don't see the problem anymore. The speed is very fast - getting around 25k/sec (single thread), but I'm facing another problem. When the indexer using StreamingUpdateSolrServer is running I'm not able to send any url request from browser to Solr web app. I just get blank page. I can't even get to the admin interface. I'm also not able to shutdown the Tomcat running the Solr webapp when the Indexer is running. I've to first stop the Indexer app and then stop the Tomcat. I don't have this problem when using CommonsHttpSolrServer. Here is how I'm creating it, server = new StreamingUpdateSolrServer(url, 1000,3); I simply call server.addBeans(...) on it. Is there anything else I need to do to make use of StreamingUpdateSolrServer? Why does Tomcat become unresponsive when Indexer using StreamingUpdateSolrServer is running (though, indexing happens fine)? Thanks, -vivek
Re: Question on StreamingUpdateSolrServer
I also noticed that the Solr app has over 6000 file handles open - "lsof | grep solr | wc -l" - shows 6455 I've 10 cores (using multi-core) managed by the same Solr instance. As soon as start up the Tomcat the open file count goes up to 6400. Few questions, 1) Why is Solr holding on to all the segments from all the cores - is it because of auto-warmer? 2) How can I reduce the open file count? 3) Is there a way to stop the auto-warmer? 4) Could this be related to "Tomcat returning blank page for every request"? Any ideas? Thanks, -vivek On Fri, Apr 10, 2009 at 1:48 PM, vivek sar wrote: > Hi, > > I was using CommonsHttpSolrServer for indexing, but having two > threads writing (10K batches) at the same time was throwing, > > "ProtocolException: Unbuffered entity enclosing request can not be repeated. > " > > I switched to StreamingUpdateSolrServer (using addBeans) and I don't > see the problem anymore. The speed is very fast - getting around > 25k/sec (single thread), but I'm facing another problem. When the > indexer using StreamingUpdateSolrServer is running I'm not able to > send any url request from browser to Solr web app. I just get blank > page. I can't even get to the admin interface. I'm also not able to > shutdown the Tomcat running the Solr webapp when the Indexer is > running. I've to first stop the Indexer app and then stop the Tomcat. > I don't have this problem when using CommonsHttpSolrServer. > > Here is how I'm creating it, > > server = new StreamingUpdateSolrServer(url, 1000,3); > > I simply call server.addBeans(...) on it. Is there anything else I > need to do to make use of StreamingUpdateSolrServer? Why does Tomcat > become unresponsive when Indexer using StreamingUpdateSolrServer is > running (though, indexing happens fine)? > > Thanks, > -vivek >
Re: Question on StreamingUpdateSolrServer
Thanks Shalin. The problem is I don't see any error message in the catalina.out. I don't even see the request coming in - I simply get blank page on browser. If I keep trying the request goes through and I get respond from Solr, but then it become unresponsive again or sometimes throws "connection reset" error. I'm not sure why would it work sometimes and not the other times for the same query. As soon as I stop the Indexer process things start working fine. Any way I can debug this problem? -vivek On Fri, Apr 10, 2009 at 11:05 PM, Shalin Shekhar Mangar wrote: > On Sat, Apr 11, 2009 at 3:29 AM, vivek sar wrote: > >> I also noticed that the Solr app has over 6000 file handles open - >> >> "lsof | grep solr | wc -l" - shows 6455 >> >> I've 10 cores (using multi-core) managed by the same Solr instance. As >> soon as start up the Tomcat the open file count goes up to 6400. Few >> questions, >> >> 1) Why is Solr holding on to all the segments from all the cores - is >> it because of auto-warmer? > > > You have 10 cores, so Solr opens 10 indexes, each of which contains multiple > files. That is one reason. Apart from that, Tomcat will keep some file > handles for incoming connections. > > >> >> 2) How can I reduce the open file count? > > > Are they causing a problem? Tomcat will log messages when it cannot accept > incoming connections if it runs out of available file handles. But if you > experiencing issues, you can increase the file handle limit or you can set > useCompoundFile=true in solrconfig.xml. > > >> >> 3) Is there a way to stop the auto-warmer? >> 4) Could this be related to "Tomcat returning blank page for every >> request"? >> > > It could be. Check the Tomcat and Solr logs. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Question on StreamingUpdateSolrServer
Thanks Shalin. I noticed couple more things. As I index around 100 million records a day, my Indexer is running pretty much at all times throughout the day. Whenever I run a search query I usually get "connection reset" when the commit is happening and get "blank page" when the auto-warming of searchers is happening. Here are my questions, 1) Is this coincidence or a known issue? Can't we search while commit or auto-warming is happening? 2) How do I stop auto-warming? My search traffic is very low so I'm trying to turn off auto-warming after commit has happened - is there anything in the solrconfig.xml to do that? 3) What would be the best strategy for searching in my scenario where commits may be happening all the time (I commit every 50K records - so every 30-60 sec there is a commit happening followed by auto-warming that takes 40 sec)? Search frequency is pretty low for us, but we want to make sure that whenever it happens it is fast enough and returns result (instead of exception or a blank screen). Thanks for all the help. -vivek On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar wrote: > On Sun, Apr 12, 2009 at 2:15 AM, vivek sar wrote: > >> >> The problem is I don't see any error message in the catalina.out. I >> don't even see the request coming in - I simply get blank page on >> browser. If I keep trying the request goes through and I get respond >> from Solr, but then it become unresponsive again or sometimes throws >> "connection reset" error. I'm not sure why would it work sometimes and >> not the other times for the same query. As soon as I stop the Indexer >> process things start working fine. Any way I can debug this problem? >> > > I'm not sure. I've never seen this issue myself. > > Could you try using the bundled jetty instead of Tomcat or on a different > box just to make sure this is not an environment specific issue? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Question on StreamingUpdateSolrServer
I index in 10K batches and commit after 5 index cyles (after 50K). Is there any limitation that I can't search during commit or auto-warming? I got 8 CPU cores and only 2 were showing busy (using top) - so it's unlikely that the CPU was pegged. 2009/4/12 Noble Paul നോബിള് नोब्ळ् : > If you use StreamingUpdateSolrServer it POSTs all the docs in a single > request. 10 million docs may be a bit too much for a single request. I > guess you should batch it in multiple requests of smaller chunks, > > It is likely that the CPU is really hot when the autowarming is hapening. > > getting a decent search perf w/o autowarming is not easy . > > autowarmCount is an attribute of a cache .see here > http://wiki.apache.org/solr/SolrCaching > > On Mon, Apr 13, 2009 at 3:32 AM, vivek sar wrote: >> Thanks Shalin. >> >> I noticed couple more things. As I index around 100 million records a >> day, my Indexer is running pretty much at all times throughout the >> day. Whenever I run a search query I usually get "connection reset" >> when the commit is happening and get "blank page" when the >> auto-warming of searchers is happening. Here are my questions, >> >> 1) Is this coincidence or a known issue? Can't we search while commit >> or auto-warming is happening? >> 2) How do I stop auto-warming? My search traffic is very low so I'm >> trying to turn off auto-warming after commit has happened - is there >> anything in the solrconfig.xml to do that? >> 3) What would be the best strategy for searching in my scenario where >> commits may be happening all the time (I commit every 50K records - so >> every 30-60 sec there is a commit happening followed by auto-warming >> that takes 40 sec)? >> >> Search frequency is pretty low for us, but we want to make sure that >> whenever it happens it is fast enough and returns result (instead of >> exception or a blank screen). >> >> Thanks for all the help. >> >> -vivek >> >> >> >> On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar >> wrote: >>> On Sun, Apr 12, 2009 at 2:15 AM, vivek sar wrote: >>> >>>> >>>> The problem is I don't see any error message in the catalina.out. I >>>> don't even see the request coming in - I simply get blank page on >>>> browser. If I keep trying the request goes through and I get respond >>>> from Solr, but then it become unresponsive again or sometimes throws >>>> "connection reset" error. I'm not sure why would it work sometimes and >>>> not the other times for the same query. As soon as I stop the Indexer >>>> process things start working fine. Any way I can debug this problem? >>>> >>> >>> I'm not sure. I've never seen this issue myself. >>> >>> Could you try using the bundled jetty instead of Tomcat or on a different >>> box just to make sure this is not an environment specific issue? >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >> > > > > -- > --Noble Paul >
Re: Question on StreamingUpdateSolrServer
Here is some more information about my setup, Solr - v1.4 (nightly build 03/29/09) Servlet Container - Tomcat 6.0.18 JVM - 1.6.0 (64 bit) OS - Mac OS X Server 10.5.6 Hardware Overview: Processor Name: Quad-Core Intel Xeon Processor Speed: 3 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 20 GB Bus Speed: 1.6 GHz JVM Parameters (for Solr): export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360" Other: lsof|grep solr|wc -l 2493 ulimit -an open files (-n) 9000 Tomcat Total Solr cores on same instance - 65 useCompoundFile - true The tests I ran, While Indexer is running 1) Go to "http://juum19.co.com:8080/solr";- returns blank page (no error in the catalina.out) 2) Try "telnet juum19.co.com 8080" - returns with "Connection closed by foreign host" Stop the Indexer Program (Tomcat is still running with Solr) 3) Go to "http://juum19.co.com:8080/solr"; - works ok, shows the list of all the Solr cores 4) Try telnet - able to Telnet fine 5) Now comment out all the caches in solrconfig.xml. Try same tests, but the Tomcat still doesn't response. Is there a way to stop the auto-warmer. I commented out the caches in the solrconfig.xml but still see the following log, INFO: autowarming result for searc...@3aba3830 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} INFO: Closing searc...@175dc1e2 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 6) Change the Indexer frequency so it runs every 2 min (instead of all the time). I noticed once the commit is done, I'm able to run my searches. During commit and auto-warming period I just get blank page. 7) Changed from Solrj to XML update - I still get the blank page whenever update/commit is happening. Apr 13, 2009 6:46:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 So, looks like it's not just StreamingUpdateSolrServer, but whenever the update/commit is happening I'm not able to search. I don't know if it's related to using multi-core. In this test I was using only single thread for update to a single core using only single Solr instance. So, it's clearly related to index process (update, commit and auto-warming). As soon as update/commit/auto-warming is completed I'm able to run my queries again. Is there anything that could stop searching while update process is in-progress - like any lock or something? Any other ideas? Thanks, -vivek On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar Mangar wrote: > On Mon, Apr 13, 2009 at 12:36 PM, vivek sar wrote: > >> I index in 10K batches and commit after 5 index cyles (after 50K). Is >> there any limitation that I can't search during commit or >> auto-warming? I got 8 CPU cores and only 2 were showing busy (using >> top) - so it's unlikely that the CPU was pegged. >> >> > No, there is no such limitation. The old searcher will continue to serve > search requests until the new one is warmed and registered. > > So, CPU does not seem to be an issue. Does this happen only when you use > StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are > you using? Which servlet container and version? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Question on StreamingUpdateSolrServer
Some more update. As I mentioned earlier we are using multi-core Solr (up to 65 cores in one Solr instance with each core 10G). This was opening around 3000 file descriptors (lsof). I removed some cores and after some trial and error I found at 25 cores system seems to work fine (around 1400 file descriptors). Tomcat is responsive even when the indexing is happening at Solr (for 25 cores). But, as soon as it goes to 26 cores the Tomcat becomes unresponsive again. The puzzling thing is if I stop indexing I can search on even 65 cores, but while indexing is happening it seems to support only up to 25 cores. 1) Is there a limit on number of cores a Solr instance can handle? 2) Does Solr do anything to the existing cores while indexing? I'm writing to only one core at a time. We are struggling to find why Tomcat stops responding on high number of cores while indexing is in-progress. Any help is very much appreciated. Thanks, -vivek On Mon, Apr 13, 2009 at 10:52 AM, vivek sar wrote: > Here is some more information about my setup, > > Solr - v1.4 (nightly build 03/29/09) > Servlet Container - Tomcat 6.0.18 > JVM - 1.6.0 (64 bit) > OS - Mac OS X Server 10.5.6 > > Hardware Overview: > > Processor Name: Quad-Core Intel Xeon > Processor Speed: 3 GHz > Number Of Processors: 2 > Total Number Of Cores: 8 > L2 Cache (per processor): 12 MB > Memory: 20 GB > Bus Speed: 1.6 GHz > > JVM Parameters (for Solr): > > export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log > -Dsun.rmi.dgc.client.gcInterval=360 > -Dsun.rmi.dgc.server.gcInterval=360" > > Other: > > lsof|grep solr|wc -l > 2493 > > ulimit -an > open files (-n) 9000 > > Tomcat > connectionTimeout="2" > maxThreads="100" /> > > Total Solr cores on same instance - 65 > > useCompoundFile - true > > The tests I ran, > > While Indexer is running > 1) Go to "http://juum19.co.com:8080/solr"; - returns blank page (no > error in the catalina.out) > > 2) Try "telnet juum19.co.com 8080" - returns with "Connection closed > by foreign host" > > Stop the Indexer Program (Tomcat is still running with Solr) > > 3) Go to "http://juum19.co.com:8080/solr"; - works ok, shows the list > of all the Solr cores > > 4) Try telnet - able to Telnet fine > > 5) Now comment out all the caches in solrconfig.xml. Try same tests, > but the Tomcat still doesn't response. > > Is there a way to stop the auto-warmer. I commented out the caches in > the solrconfig.xml but still see the following log, > > INFO: autowarming result for searc...@3aba3830 main > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > INFO: Closing searc...@175dc1e2 > main > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > > 6) Change the Indexer frequency so it runs every 2 min (instead of all > the time). I noticed once the commit is done, I'm able to run my > searches. During commit and auto-warming period I just get blank page. > > 7) Changed from Solrj to XML update - I still get the blank page > whenever update/commit is happening. > > Apr 13, 2009 6:46:18 PM > org.apache.solr.update.processor.LogUpdateProcessor finish > INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, > 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 > Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute > INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 > > > So, looks like it's not just StreamingUpdateSolrServer, but whenever > the update/commit is happening I'm not able to search. I don't know if > it's related to using multi-core. In this test I was using only single >
Re: Question on StreamingUpdateSolrServer
The machine's ulimit is set to 9000 and the OS has upper limit of 12000 on files. What would explain this? Has anyone tried Solr with 25 cores on the same Solr instance? Thanks, -vivek 2009/4/13 Noble Paul നോബിള് नोब्ळ् : > On Tue, Apr 14, 2009 at 7:14 AM, vivek sar wrote: >> Some more update. As I mentioned earlier we are using multi-core Solr >> (up to 65 cores in one Solr instance with each core 10G). This was >> opening around 3000 file descriptors (lsof). I removed some cores and >> after some trial and error I found at 25 cores system seems to work >> fine (around 1400 file descriptors). Tomcat is responsive even when >> the indexing is happening at Solr (for 25 cores). But, as soon as it >> goes to 26 cores the Tomcat becomes unresponsive again. The puzzling >> thing is if I stop indexing I can search on even 65 cores, but while >> indexing is happening it seems to support only up to 25 cores. >> >> 1) Is there a limit on number of cores a Solr instance can handle? >> 2) Does Solr do anything to the existing cores while indexing? I'm >> writing to only one core at a time. > There is no hard limit (it is Integer.MAX_VALUE) . But inreality your > mileage depends on your hardware and no:of file handles the OS can > open >> >> We are struggling to find why Tomcat stops responding on high number >> of cores while indexing is in-progress. Any help is very much >> appreciated. >> >> Thanks, >> -vivek >> >> On Mon, Apr 13, 2009 at 10:52 AM, vivek sar wrote: >>> Here is some more information about my setup, >>> >>> Solr - v1.4 (nightly build 03/29/09) >>> Servlet Container - Tomcat 6.0.18 >>> JVM - 1.6.0 (64 bit) >>> OS - Mac OS X Server 10.5.6 >>> >>> Hardware Overview: >>> >>> Processor Name: Quad-Core Intel Xeon >>> Processor Speed: 3 GHz >>> Number Of Processors: 2 >>> Total Number Of Cores: 8 >>> L2 Cache (per processor): 12 MB >>> Memory: 20 GB >>> Bus Speed: 1.6 GHz >>> >>> JVM Parameters (for Solr): >>> >>> export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP >>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log >>> -Dsun.rmi.dgc.client.gcInterval=360 >>> -Dsun.rmi.dgc.server.gcInterval=360" >>> >>> Other: >>> >>> lsof|grep solr|wc -l >>> 2493 >>> >>> ulimit -an >>> open files (-n) 9000 >>> >>> Tomcat >>> >> connectionTimeout="2" >>> maxThreads="100" /> >>> >>> Total Solr cores on same instance - 65 >>> >>> useCompoundFile - true >>> >>> The tests I ran, >>> >>> While Indexer is running >>> 1) Go to "http://juum19.co.com:8080/solr"; - returns blank page (no >>> error in the catalina.out) >>> >>> 2) Try "telnet juum19.co.com 8080" - returns with "Connection closed >>> by foreign host" >>> >>> Stop the Indexer Program (Tomcat is still running with Solr) >>> >>> 3) Go to "http://juum19.co.com:8080/solr"; - works ok, shows the list >>> of all the Solr cores >>> >>> 4) Try telnet - able to Telnet fine >>> >>> 5) Now comment out all the caches in solrconfig.xml. Try same tests, >>> but the Tomcat still doesn't response. >>> >>> Is there a way to stop the auto-warmer. I commented out the caches in >>> the solrconfig.xml but still see the following log, >>> >>> INFO: autowarming result for searc...@3aba3830 main >>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} >>> >>> INFO: Closing searc...@175dc1e2 >>> main >>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} >>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} >>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evi
Using CSV for indexing ... Remote Streaming disabled
Hi, I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki (http://wiki.apache.org/solr/UpdateCSV). I've updated the solrconfig.xml to have this lines, ... When I try to upload the csv, curl 'http://localhost:8080/solr/20090414_1/update/csv?commit=true&separator=%09&escape=%5c&stream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv' I get following response, HTTP Status 400 - Remote Streaming is disabled.type Status reportmessage Remote Streaming is disabled.description The request sent by the client was syntactically incorrect (Remote Streaming is disabled.).Apache Tomcat/6.0.18 Why is it complaining about the remote streaming if it's already enabled? Is there anything I'm missing? Thanks, -vivek
Commits taking too long
Hi, I've index where I commit every 50K records (using Solrj). Usually this commit takes 20sec to complete, but every now and then the commit takes way too long - from 10 min to 30 min. I see more delays as the index size continues to grow - once it gets over 5G I start seeing long commit cycles more frequently. See this for ex., Apr 15, 2009 12:04:13 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=false) Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fq,version=1239747075391,generation=566,filenames=[_19m.cfs, _jm.cfs, _1bk.cfs, _193.cfx, _19z.cfs, _1b8.cfs, _1bf.cfs, _10g.cfs, _ 2s.cfs, _1bf.cfx, _18x.cfx, _19c.cfx, _193.cfs, _18x.cfs, _1b7.cfs, _1aw.cfs, _1aq.cfs, _1bi.cfx, _1a6.cfs, _19l.cfs, _1ad.cfs, _1a6.cfx, _1as.cfs, _19l.cfx, _1aa.cfs, _1an.cfs, _19d.cfs, _1a3.cfx, _1a3.cfs, _19g.cfs, _b7.cfs, _19 e.cfs, _19b.cfs, _1ab.cfs, _1b3.cfx, _19j.cfs, _190.cfs, _uu.cfs, _1b3.cfs, _1ak.cfs, _19p.cfs, _195.cfs, _194.cfs, _19i.cfx, _199.cfs, _19i.cfs, _19o.cfx, _196.cfs, _199.cfx, _196.cfx, _19o.cfs, _190.cfx, _xn.cfs, _1b0.cfx, _1at. cfs, _1av.cfs, _1ao.cfs, _1a9.cfx, _1b0.cfs, _5l.cfs, _1ao.cfx, _1ap.cfs, _1b6.cfx, _19a.cfs, _139.cfs, _1a1.cfs, _s1.cfs, _1b6.cfs, _1a9.cfs, _197.cfs, _1bd.cfs, _19n.cfs, _1au.cfx, _1au.cfs, _1a5.cfs, _1be.cfs, segments_fq, _1b4.cfs, _gt.cfs, _1ag.cfs, _18z.cfs, _162.cfs, _1a4.cfs, _198.cfs, _19x.cfs, _1ah.cfs, _1ai.cfs, _19q.cfs, _1a7.cfs, _1ae.cfs, _19h.cfs, _19x.cfx, _1a2.cfs, _1bj.cfs, _1bb.cfs, _1b1.cfs, _1ai.cfx, _19r.cfs, _18y.cfs, _19u.cfx, _1a8. cfs, _19u.cfs, _1aj.cfs, _19r.cfx, _1ac.cfs, _1az.cfs, _1ac.cfx, _19y.cfs, _1bc.cfx, _19s.cfs, _1ar.cfs, _1al.cfx, _1bg.cfs, _18v.cfs, _1ar.cfx, _1bc.cfs, _1a0.cfx, _1b2.cfs, _1af.cfs, _1bi.cfs, _1af.cfx, _19f.cfs, _1a0.cfs, _1bh.cfs, _19f.cfx, _19c.cfs, _e0.cfs, _1ax.cfx, _1b5.cfs, _191.cfs, _18w.cfs, _19t.cfs, _8e.cfs, _19v.cfs, _192.cfs, _1b9.cfs, _1ay.cfs, _p8.cfs, _19k.cfs, _1b9.cfx, _1ax.cfs, _1am.cfs, _1ba.cfs, _mf.cfs, _1al.cfs, _19w.cfs] commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fr,version=1239747075392,generation=567,filenames=[_jm.cfs, _1bo.cfs, _xn.cfs, segments_fr, _8e.cfs, _gt.cfs, _18v.cfs, _uu.cfs, _1 0g.cfs, _2s.cfs, _5l.cfs, _162.cfs, _p8.cfs, _139.cfs, _s1.cfs, _mf.cfs, _b7.cfs, _e0.cfs] Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: last commit = 1239747075392 Here is my default index settings, true 100 64 2147483647 1 1000 1 single What am I doing wrong here? What's causing these delays? Thanks, -vivek
Re: Question on StreamingUpdateSolrServer
Thanks Otis. I did increase the number of file descriptors to 22K, but I still get this problem. I've noticed following so far, 1) As soon as I get to around 1140 index segments (this is total over multiple cores) I start seeing this problem. 2) When the problem starts occassionally the index request (solrserver.commit) also fails with the following error, java.net.SocketException: Connection reset 3) Whenever the commit fails, I'm able to access Solr by the browser (http://ets11.co.com/solr). If the commit is succssfull and going on I get blank page on Firefox. Even the telnet to 8080 fails with "Connection closed by foreign host." It does seem like there is some resource issue as it happens only once we reach a breaking point (too many index segment files) - lsof at this point usually shows at 1400, but my ulimit is much higher than that. I already use compound format for index files. I can also run optimize occassionally (though not preferred as it blocks the whole index cycle for a long time). I do want to find out what resource limitation is causing this and it has to do something with when Indexer is committing the records where there are large number of segment files. Any other ideas? Thanks, -vivek On Wed, Apr 15, 2009 at 3:10 PM, Otis Gospodnetic wrote: > > One more thing. I don't think this was mentioned, but you can: > - optimize your indices > - use compound index format > > That will lower the number of open file handles. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Friday, April 10, 2009 5:59:37 PM >> Subject: Re: Question on StreamingUpdateSolrServer >> >> I also noticed that the Solr app has over 6000 file handles open - >> >> "lsof | grep solr | wc -l" - shows 6455 >> >> I've 10 cores (using multi-core) managed by the same Solr instance. As >> soon as start up the Tomcat the open file count goes up to 6400. Few >> questions, >> >> 1) Why is Solr holding on to all the segments from all the cores - is >> it because of auto-warmer? >> 2) How can I reduce the open file count? >> 3) Is there a way to stop the auto-warmer? >> 4) Could this be related to "Tomcat returning blank page for every request"? >> >> Any ideas? >> >> Thanks, >> -vivek >> >> On Fri, Apr 10, 2009 at 1:48 PM, vivek sar wrote: >> > Hi, >> > >> > I was using CommonsHttpSolrServer for indexing, but having two >> > threads writing (10K batches) at the same time was throwing, >> > >> > "ProtocolException: Unbuffered entity enclosing request can not be >> > repeated. >> " >> > >> > I switched to StreamingUpdateSolrServer (using addBeans) and I don't >> > see the problem anymore. The speed is very fast - getting around >> > 25k/sec (single thread), but I'm facing another problem. When the >> > indexer using StreamingUpdateSolrServer is running I'm not able to >> > send any url request from browser to Solr web app. I just get blank >> > page. I can't even get to the admin interface. I'm also not able to >> > shutdown the Tomcat running the Solr webapp when the Indexer is >> > running. I've to first stop the Indexer app and then stop the Tomcat. >> > I don't have this problem when using CommonsHttpSolrServer. >> > >> > Here is how I'm creating it, >> > >> > server = new StreamingUpdateSolrServer(url, 1000,3); >> > >> > I simply call server.addBeans(...) on it. Is there anything else I >> > need to do to make use of StreamingUpdateSolrServer? Why does Tomcat >> > become unresponsive when Indexer using StreamingUpdateSolrServer is >> > running (though, indexing happens fine)? >> > >> > Thanks, >> > -vivek >> > > >
Re: Using CSV for indexing ... Remote Streaming disabled
Any help on this? Could this error be because of something else (not remote streaming issue)? Thanks. On Wed, Apr 15, 2009 at 10:04 AM, vivek sar wrote: > Hi, > > I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki > (http://wiki.apache.org/solr/UpdateCSV). I've updated the > solrconfig.xml to have this lines, > > > multipartUploadLimitInKB="20480" /> > ... > > > startup="lazy" /> > > When I try to upload the csv, > > curl > 'http://localhost:8080/solr/20090414_1/update/csv?commit=true&separator=%09&escape=%5c&stream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv' > > I get following response, > > HTTP Status 400 - Remote Streaming is > disabled.type Status > reportmessage Remote Streaming is > disabled.description The request sent by the > client was syntactically incorrect (Remote Streaming is > disabled.).Apache > Tomcat/6.0.18 > > Why is it complaining about the remote streaming if it's already > enabled? Is there anything I'm missing? > > Thanks, > -vivek >
Re: Solr Search Error
Hi, I'm using the Solr 1.4 (03/29 nightly build) and when searching on a large index (40G) I get the same exception as in this thread, HTTP Status 500 - 13724 java.lang.ArrayIndexOutOfBoundsException: 13724 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:262) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1072) at ... The search url is, http://think2.co.com:8080/solr/20090415_1/select/?q=japan&version=2.2&start=0&rows=10&indent=on It would have millions of records matching this term, but I guess that shouldn't throw this exception. I saw a similar jira to ArrayOutOfBoundException, https://issues.apache.org/jira/browse/SOLR-450 (it's not the same though). I also see someone reported this same problem back in 2007 so I'm not sure whether it's a real bug or some configuration issue, http://www.nabble.com/ArrayIndexOutOfBoundsException-on-TermScorer-td11750899.html#a11750899 Any ideas? Thanks, -vivek On Fri, Mar 27, 2009 at 10:11 AM, Narayanan, Karthikeyan wrote: > Hi Otis, > Thanks for the recommendation. Will try with latest > nightly build.. I did couple of full data import and got this error at > few times while searching.. > > > Thanks. > > Karthik > > > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Friday, March 27, 2009 12:57 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr Search Error > > > Hi Karthik, > > First thing I'd do is get the latest Solr nightly build. > If that doesn't fix thing, I'd grab the latest Lucene nightly build and > use it to replace Lucene jars that are in your version of Solr. > If that doesn't work I'd email the ML with a bit more info about the > type of search that causes this (e.g. Do all searches cause this or only > some? What do those that trigger this error look like or have in > common?) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: "Narayanan, Karthikeyan" >> To: solr-user@lucene.apache.org >> Sent: Friday, March 27, 2009 11:42:12 AM >> Subject: Solr Search Error >> >> Hi All, >> I am intermittently getting this Exception when I do the > search. >> What could be the reason?. >> >> Caused by: org.apache.solr.common.SolrException: 11938 >> java.lang.ArrayIndexOutOfBoundsException: 11938 at >> org.apache.lucene.search.TermScorer.score(TermScorer.java:74) > at >> org.apache.lucene.search.TermScorer.score(TermScorer.java:61) > at >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137) > at >> org.apache.lucene.search.Searcher.search(Searcher.java:126) at >> org.apache.lucene.search.Searcher.search(Searcher.java:105) at >> > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. > java:966) >> at >> > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j > ava:838) >> at >> > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2 > 69) at >> > org.apache.solr.handler.component.QueryComponent.process(QueryComponent. > java:160) >> at >> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search > Handler.java:169) >> at >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB > ase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > at >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja > va:303) >> at >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j > ava:232) >> at >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica > tionFilterChain.java:215) >> at >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt > erChain.java:188) >> at >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv > e.java:210) >> at >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv > e.java:174) >> at >> > org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator > Base.java:433) >> at >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java > :127) >> at >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java > :117) >> at >> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. > java:108) >> at >> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 > 51) at >> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 > 0) at >> > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.p
Multiple Solr-instance share same solr.home
Hi, Is it possible to have two solr instances share the same solr.home? I've two Solr instances running on the same box and I was wondering if I can configure them to have the same solr.home. I tried it, but looks like the second instance overwrites the first one's value in the solr.xml (I'm using multicore for both instances). This is just for convenience so I don't have to manage multiple solr index directory locations - I can have all the indexes written into the same location and do the clean up from one place itself. If this is not supported then it's not a big deal. Thanks, -vivek
Re: Multiple Solr-instance share same solr.home
Both Solr instances will be writing to separate indexes, but can they share the same solr.home? So, here is what I want, 1) solr.home = solr/multicore 2) There is a single solr.xml under multicore directory 3) Each instance would use the same solr.xml, which will have entries for multiple cores 4) Each instance will write to different core at a time - so one index will be written by only one writer at a time. not sure if this is a supported configuration. Thanks. -vivek On Sun, Apr 19, 2009 at 5:55 AM, Otis Gospodnetic wrote: > > Vivek - no, unless you want trouble - only 1 writer can write to a specific > index at a time. > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Sunday, April 19, 2009 4:33:00 AM >> Subject: Multiple Solr-instance share same solr.home >> >> Hi, >> >> Is it possible to have two solr instances share the same solr.home? >> I've two Solr instances running on the same box and I was wondering if >> I can configure them to have the same solr.home. I tried it, but looks >> like the second instance overwrites the first one's value in the >> solr.xml (I'm using multicore for both instances). This is just for >> convenience so I don't have to manage multiple solr index directory >> locations - I can have all the indexes written into the same location >> and do the clean up from one place itself. If this is not supported >> then it's not a big deal. >> >> Thanks, >> -vivek > >
Control segment size
Hi, Is there any configuration to control the segments' file size in Solr? Currently, I've an index (70G) with 80 segment files and one of the file is 24G. We noticed that in some cases commit takes over 2 hours to complete (committing 50K records), whereas usually it finishes in 20 seconds. After further investigation it turns out the system was doing lot of paging - the file system buffer was trying to write back the big segment back to disk. I got 20G memory on system with 6 G assigned to Solr instance (running 2 instances). It seems if I can control the segment size to max of 4-5 GB I'll be ok. Is there any way to do so? I got merging factor of 100 - does that impacts the size too? Why different segments have different size? Thanks, -vivek
Using UUID for unique key
Hi, I've a distributed Solr instances. I'm using Java's UUID (UUID.randomUUID()) to generate the unique id for my documents. Before adding unique key I was able to commit 50K records in 15sec (pretty constant over the growing index), after adding unique key it's taking over 35 sec for 50k and the time is increasing as the index size grows. Here is my schema setting for unique key, Why is commit taking so long? Should I not be using UUID key for unique keys? What are other options - timestamp etc.? Thanks, -vivek
Re: Using UUID for unique key
I did clean up the indexes and re-started the index process from scratch (new index file). As another test if I use simple numeric counter for unique id the index speed is fast (within 20 sec for commit 50k records). I'm thinking UUID might not be the way to go for unique id - I'll look into using sequence# instead. Thanks, -vivek On Tue, May 5, 2009 at 11:03 AM, Otis Gospodnetic wrote: > > You really had nothing in uniqueKey element in schema.xml at first? I'm not > looking at Solr code right now, but it could be the lack of the cost of that > lookup that made things faster. Now you have a lookup + generation + more > data to pass through analyzer + write out, though I can't imagine how that > would make things 2x slower. You didn't say whether you cleared the old > index after adding UUID key did you do that? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Tuesday, May 5, 2009 1:49:21 PM >> Subject: Using UUID for unique key >> >> Hi, >> >> I've a distributed Solr instances. I'm using Java's UUID >> (UUID.randomUUID()) to generate the unique id for my documents. Before >> adding unique key I was able to commit 50K records in 15sec (pretty >> constant over the growing index), after adding unique key it's taking >> over 35 sec for 50k and the time is increasing as the index size >> grows. Here is my schema setting for unique key, >> >> >> required="true" omitNorms="true" compressed="false"/> >> >> Why is commit taking so long? Should I not be using UUID key for >> unique keys? What are other options - timestamp etc.? >> >> Thanks, >> -vivek > >
Delete complete core without stopping Solr
Hi, I'm using multi-core feature of Solr. Each Solr instance maintains multiple-core - each core of size 100G. I would like to delete older cores directory completely after 2 weeks (using file.delete). Currently, Solr loads all the cores that are listed in solr.xml. I was thinking of following, 1) Call unload service to unload the core from Solr - would this remove the entry from solr.xml as well? 2) Delete the core directory Would this work? I'm hoping I don't have to restart or do any individual document deletes. Thanks, -vivek
Re: Control segment size
Thanks Otis. I did set the maxMergeDocs to 10M, but I still see couple of index files over 30G which do not match with max number of documents. Here are some numbers, 1) My total index size = 66GB 2) Number of total documents = 200M 3) 1M doc = 300MB 4) 10M doc should be roughly around 3-4GB. Under the index I see, -rw-r--r-- 1 dssearch staff 31771545312 May 6 14:15 _2tp.cfs -rw-r--r-- 1 dssearch staff 31932190573 May 7 08:13 _5ne.cfs -rw-r--r-- 1 dssearch staff543118747 May 7 08:32 _5p2.cfs -rw-r--r-- 1 dssearch staff543124452 May 7 08:53 _5qr.cfs -rw-r--r-- 1 dssearch staff543100201 May 7 09:18 _5sg.cfs .. .. As you can see couple of files are huge. Are those documents or index files? How can I control the file size so no single file grows more than 10GB. Thanks, -vivek On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic wrote: > > Hi, > > You are looking for maxMergeDocs, I believe. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Thursday, April 23, 2009 1:08:20 PM >> Subject: Control segment size >> >> Hi, >> >> Is there any configuration to control the segments' file size in >> Solr? Currently, I've an index (70G) with 80 segment files and one of >> the file is 24G. We noticed that in some cases commit takes over 2 >> hours to complete (committing 50K records), whereas usually it >> finishes in 20 seconds. After further investigation it turns out the >> system was doing lot of paging - the file system buffer was trying to >> write back the big segment back to disk. I got 20G memory on system >> with 6 G assigned to Solr instance (running 2 instances). >> >> It seems if I can control the segment size to max of 4-5 GB I'll be >> ok. Is there any way to do so? >> >> I got merging factor of 100 - does that impacts the size too? Why >> different segments have different size? >> >> Thanks, >> -vivek > >
Re: Control segment size
Shalin, Here is what I've read on maxMergeDocs, "While merging segments, Lucene will ensure that no segment with more than maxMergeDocs is created." Wouldn't that mean that no index file should contain more than max docs? I guess the index files could also just contain the index information which is not limited by any property - is that true? Is there any work around to limit the index size, beside limiting the index itself? Thanks, -vivek On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar wrote: > On Fri, May 8, 2009 at 1:30 AM, vivek sar wrote: > >> >> I did set the maxMergeDocs to 10M, but I still see couple of index >> files over 30G which do not match with max number of documents. Here >> are some numbers, >> >> 1) My total index size = 66GB >> 2) Number of total documents = 200M >> 3) 1M doc = 300MB >> 4) 10M doc should be roughly around 3-4GB. >> >> As you can see couple of files are huge. Are those documents or index >> files? How can I control the file size so no single file grows more >> than 10GB. >> > > No, there is no way to limit an individual file to a specific size. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Commits taking too long
Hi, This problem is still haunting us. I've reduced the merge factor to 50, but as my index get fat (anything over 20G), the commit starts taking much longer. Some info, 1) Less than 20 G index size, 5000 records commit takes around 15sec 2) Over 20G the commit starts taking 50-70sec for 5K records 3) mergefactor = 50 4) Using multicore - each core is around 70G (currently there are 5 cores maintained by single Solr instance) 5) RAM = 6G 6) OS = OS X 10.5 7) JVM Options: export JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=3090,suspend=n \ -server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m \ -XX:NewRatio=2 -XX:MaxPermSize=512m \ -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${AC_ROOT}/data/pmiJavaHeapDump.hprof \ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 \ -Droot.dir=$AC_ROOT" export CATALINA_OPTS="-server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m \ -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit" I also see following from GC log to coincide with commit slowness, 40387.691: [GC 40387.691: [ParNew (promotion failed): 132131K->149120K(149120K), 186.3768727 secs]40574.068: [CMSbailing out to foreground collection 40736.670: [CMS-concurrent-mark: 168.574/356.749 secs] [Times: user=276.41 sys=1192.51, real=356.77 secs] (concurrent mode failure): 6116976K->5908559K(6121088K), 174.0819842 secs] 6229178K->5908559K(6270208K), 360.4589949 secs] [Times: user=267.90 sys=1185.49, real=360.48 secs] 40748.155: [GC [1 CMS-initial-mark: 5908559K(6121088K)] 5910029K(6270208K), 0.0014832 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 40748.156: [CMS-concurrent-mark-start] 40748.513: [GC 40748.513: [ParNew: 127872K->21248K(149120K), 0.7482810 secs] 6036431K->6050277K(6270208K), 0.7483775 secs] [Times: user=1.66 sys=0.71, real=0.75 secs] 40749.613: [GC 40749.613: [ParNew: 149120K->149120K(149120K), 0.227 secs]40749.613: [CMS40784.961: [CMS-concurrent-mark: 36.055/36.805 secs] [Times: user=20.74 sys=31.41, real=36.81 secs] (concurrent mode failure): 6029029K->4899386K(6121088K), 44.2068275 secs] 6178149K->4899386K(6270208K), 44.2069457 secs] [Times: user=26.05 sys=30.21, real=44.21 secs] Few questions, 1) Should I lower the merge factor even more? Low merge factor seems to cause more frequent commit pauses. 2) Do I need more RAM to maintain large indexes? 3) Should I not have any core bigger than 20G? 4) Any other configuration (Solr or JVM) that can help with this? 5) Does search has to wait until commit completes? Right now the search doesn't return while the commit is happening. We are using Solr 1.4 (nightly build from 3/29/09). Thanks, -vivek On Wed, Apr 15, 2009 at 11:41 AM, Mark Miller wrote: > vivek sar wrote: >> >> Hi, >> >> I've index where I commit every 50K records (using Solrj). Usually >> this commit takes 20sec to complete, but every now and then the commit >> takes way too long - from 10 min to 30 min. I see more delays as the >> index size continues to grow - once it gets over 5G I start seeing >> long commit cycles more frequently. See this for ex., >> >> Apr 15, 2009 12:04:13 AM org.apache.solr.update.DirectUpdateHandler2 >> commit >> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=false) >> Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy onCommit >> INFO: SolrDeletionPolicy.onCommit: commits:num=2 >> >> commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fq,version=1239747075391,generation=566,filenames=[_19m.cfs, >> _jm.cfs, _1bk.cfs, _193.cfx, _19z.cfs, _1b8.cfs, _1bf.cfs, _10g.cfs, _ >> 2s.cfs, _1bf.cfx, _18x.cfx, _19c.cfx, _193.cfs, _18x.cfs, _1b7.cfs, >> _1aw.cfs, _1aq.cfs, _1bi.cfx, _1a6.cfs, _19l.cfs, _1ad.cfs, _1a6.cfx, >> _1as.cfs, _19l.cfx, _1aa.cfs, _1an.cfs, _19d.cfs, _1a3.cfx, _1a3.cfs, >> _19g.cfs, _b7.cfs, _19 >> e.cfs, _19b.cfs, _1ab.cfs, _1b3.cfx, _19j.cfs, _190.cfs, _uu.cfs, >> _1b3.cfs, _1ak.cfs, _19p.cfs, _195.cfs, _194.cfs, _19i.cfx, _199.cfs, >> _19i.cfs, _19o.cfx, _196.cfs, _199.cfx, _196.cfx, _19o.cfs, _190.cfx, >> _xn.cfs, _1b0.cfx, _1at. >> cfs, _1av.cfs, _1ao.cfs, _1a9.cfx, _1b0.cfs, _5l.cfs, _1ao.cfx, >> _1ap.cfs, _1b6.cfx, _19a.cfs, _139.cfs, _1a1.cfs, _s1.cfs, _1b6.cfs, >> _1a9.cfs, _197.cfs, _1bd.cfs, _19n.cfs, _1au.cfx, _1au.cfs, _1a5.cfs, >> _1be.cfs, segments_fq, _1b4.cfs, _gt.cfs, _1ag.cfs, _18z.cfs, >> _162.cfs, _1a4.cfs, _198.cfs, _19x.cfs, _1ah.cfs, _1ai.cfs, _19q.cfs, >> _1a7.cfs, _1ae.cfs, _19h.cfs, _19x.cfx, _1a2.cfs, _1bj.cfs, _1bb.cfs, >> _1b1.cfs, _1ai.cfx, _19r.cfs, _18y.cfs, _19u.cfx, _1a8. >> cfs, _1
Solr memory requirements?
Hi, I'm pretty sure this has been asked before, but I couldn't find a complete answer in the forum archive. Here are my questions, 1) When solr starts up what does it loads up in the memory? Let's say I've 4 cores with each core 50G in size. When Solr comes up how much of it would be loaded in memory? 2) How much memory is required during index time? If I'm committing 50K records at a time (1 record = 1KB) using solrj, how much memory do I need to give to Solr. 3) Is there a minimum memory requirement by Solr to maintain a certain size index? Is there any benchmark on this? Here are some of my configuration from solrconfig.xml, 1) 64 2) All the caches (under query tag) are commented out 3) Few others, a) true==> would this require memory? b) 50 c) 200 d) e) false f) 2 The problem we are having is following, I've given Solr RAM of 6G. As the total index size (all cores combined) start growing the Solr memory consumption goes up. With 800 million documents, I see Solr already taking up all the memory at startup. After that the commits, searches everything become slow. We will be having distributed setup with multiple Solr instances (around 8) on four boxes, but our requirement is to have each Solr instance at least maintain around 1.5 billion documents. We are trying to see if we can somehow reduce the Solr memory footprint. If someone can provide a pointer on what parameters affect memory and what effects it has we can then decide whether we want that parameter or not. I'm not sure if there is any minimum Solr requirement for it to be able maintain large indexes. I've used Lucene before and that didn't require anything by default - it used up memory only during index and search times - not otherwise. Any help is very much appreciated. Thanks, -vivek
Re: Solr memory requirements?
Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong. I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: > > Hi, > Some answers: > 1) .tii files in the Lucene index. When you sort, all distinct values for > the field(s) used for sorting. Similarly for facet fields. Solr caches. > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume > during indexing. There is no need to commit every 50K docs unless you want > to trigger snapshot creation. > 3) see 1) above > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > going to fly. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> Subject: Solr memory requirements? >> >> Hi, >> >> I'm pretty sure this has been asked before, but I couldn't find a >> complete answer in the forum archive. Here are my questions, >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> I've 4 cores with each core 50G in size. When Solr comes up how much >> of it would be loaded in memory? >> >> 2) How much memory is required during index time? If I'm committing >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> I need to give to Solr. >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> size index? Is there any benchmark on this? >> >> Here are some of my configuration from solrconfig.xml, >> >> 1) 64 >> 2) All the caches (under query tag) are commented out >> 3) Few others, >> a) true ==> >> would this require memory? >> b) 50 >> c) 200 >> d) >> e) false >> f) 2 >> >> The problem we are having is following, >> >> I've given Solr RAM of 6G. As the total index size (all cores >> combined) start growing the Solr memory consumption goes up. With 800 >> million documents, I see Solr already taking up all the memory at >> startup. After that the commits, searches everything become slow. We >> will be having distributed setup with multiple Solr instances (around >> 8) on four boxes, but our requirement is to have each Solr instance at >> least maintain around 1.5 billion documents. >> >> We are trying to see if we can somehow reduce the Solr memory >> footprint. If someone can provide a pointer on what parameters affect >> memory and what effects it has we can then decide whether we want that >> parameter or not. I'm not sure if there is any minimum Solr >> requirement for it to be able maintain large indexes. I've used Lucene >> before and that didn't require anything by default - it used up memory >> only during index and search times - not otherwise. >> >> Any help is very much appreciated. >> >> Thanks, >> -vivek > >
Re: Solr memory requirements?
Otis, In that case, I'm not sure why Solr is taking up so much memory as soon as we start it up. I checked for .tii file and there is only one, -rw-r--r-- 1 search staff 20306 May 11 21:47 ./20090510_1/data/index/_3au.tii I have all the cache disabled - so that shouldn't be a problem too. My ramBuffer size is only 64MB. I read note on sorting, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see something related to FieldCache. I don't see this as parameter defined in either solrconfig.xml or schema.xml. Could this be something that can load things in memory at startup? How can we disable it? I'm trying to find out if there is a way to tell how much memory Solr would consume and way to cap it. Thanks, -vivek On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic wrote: > > Hi, > > Sorting is triggered by the sort parameter in the URL, not a characteristic > of a field. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 4:42:16 PM >> Subject: Re: Solr memory requirements? >> >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> >> required="true" omitNorms="true" compressed="false"/> >> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> default="NOW/HOUR" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> default="NOW/HOUR" omitNorms="true"/> >> >> >> >> >> omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values for >> > the >> field(s) used for sorting. Similarly for facet fields. Solr caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> > consume >> during indexing. There is no need to commit every 50K docs unless you want >> to >> trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> > going >> to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in me
Re: Solr memory requirements?
Just an update on the memory issue - might be useful for others. I read the following, http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) and looks like the first and new searcher listeners would populate the FieldCache. Commenting out these two listener entries seems to do the trick - at least the heap size is not growing as soon as Solr starts up. I ran some searches and they all came out fine. Index rate is also pretty good. Would there be any impact of disabling these listeners? Thanks, -vivek On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: > Otis, > > In that case, I'm not sure why Solr is taking up so much memory as > soon as we start it up. I checked for .tii file and there is only one, > > -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > > I have all the cache disabled - so that shouldn't be a problem too. My > ramBuffer size is only 64MB. > > I read note on sorting, > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > something related to FieldCache. I don't see this as parameter defined > in either solrconfig.xml or schema.xml. Could this be something that > can load things in memory at startup? How can we disable it? > > I'm trying to find out if there is a way to tell how much memory Solr > would consume and way to cap it. > > Thanks, > -vivek > > > > > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > wrote: >> >> Hi, >> >> Sorting is triggered by the sort parameter in the URL, not a characteristic >> of a field. :) >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: vivek sar >>> To: solr-user@lucene.apache.org >>> Sent: Wednesday, May 13, 2009 4:42:16 PM >>> Subject: Re: Solr memory requirements? >>> >>> Thanks Otis. >>> >>> Our use case doesn't require any sorting or faceting. I'm wondering if >>> I've configured anything wrong. >>> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>> stored). All my fields are basic data type - which I thought are not >>> sorted. My id field is unique key. >>> >>> Is there any field here that might be getting sorted? >>> >>> >>> required="true" omitNorms="true" compressed="false"/> >>> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> default="NOW/HOUR" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> default="NOW/HOUR" omitNorms="true"/> >>> >>> >>> >>> >>> omitNorms="true" multiValued="true"/> >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >>> wrote: >>> > >>> > Hi, >>> > Some answers: >>> > 1) .tii files in the Lucene index. When you sort, all distinct values >>> > for the >>> field(s) used for sorting. Similarly for facet fields. Solr caches. >>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >>> > consume >>> during indexing. There is no need to commit every 50K docs unless you want >>> to >>> trigger
Re: Solr memory requirements?
Disabling first/new searchers did help for the initial load time, but after 10-15 min the heap memory start climbing up again and reached max within 20 min. Now the GC is coming up all the time, which is slowing down the commit and search cycles. This is still puzzling what does Solr holds in the memory and doesn't release? I haven't been able to profile as the dump is too big. Would setting termIndexInterval help - not sure how can that be set using Solr. Some other query properties under solrconfig, 1024 true 50 200 false 2 Currently, I got 800 million documents and have specified 8G heap size. Any other suggestion on what can I do to control the Solr memory consumption? Thanks, -vivek On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: > Just an update on the memory issue - might be useful for others. I > read the following, > > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) > > and looks like the first and new searcher listeners would populate the > FieldCache. Commenting out these two listener entries seems to do the > trick - at least the heap size is not growing as soon as Solr starts > up. > > I ran some searches and they all came out fine. Index rate is also > pretty good. Would there be any impact of disabling these listeners? > > Thanks, > -vivek > > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: >> Otis, >> >> In that case, I'm not sure why Solr is taking up so much memory as >> soon as we start it up. I checked for .tii file and there is only one, >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> ramBuffer size is only 64MB. >> >> I read note on sorting, >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> something related to FieldCache. I don't see this as parameter defined >> in either solrconfig.xml or schema.xml. Could this be something that >> can load things in memory at startup? How can we disable it? >> >> I'm trying to find out if there is a way to tell how much memory Solr >> would consume and way to cap it. >> >> Thanks, >> -vivek >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> wrote: >>> >>> Hi, >>> >>> Sorting is triggered by the sort parameter in the URL, not a characteristic >>> of a field. :) >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message >>>> From: vivek sar >>>> To: solr-user@lucene.apache.org >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM >>>> Subject: Re: Solr memory requirements? >>>> >>>> Thanks Otis. >>>> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if >>>> I've configured anything wrong. >>>> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>>> stored). All my fields are basic data type - which I thought are not >>>> sorted. My id field is unique key. >>>> >>>> Is there any field here that might be getting sorted? >>>> >>>> >>>> required="true" omitNorms="true" compressed="false"/> >>>> >>>> >>>> compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> default="NOW/HOUR" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> compressed="false"/> >>>> >>>> compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" co
Re: Solr memory requirements?
I think maxBufferedDocs has been deprecated in Solr 1.4 - it's recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64. This shouldn't be a problem I think. There has to be something else that Solr is holding up in memory. Anyone else? Thanks, -vivek On Wed, May 13, 2009 at 4:01 PM, Jack Godwin wrote: > Have you checked the maxBufferedDocs? I had to drop mine down to 1000 with > 3 million docs. > Jack > > On Wed, May 13, 2009 at 6:53 PM, vivek sar wrote: > >> Disabling first/new searchers did help for the initial load time, but >> after 10-15 min the heap memory start climbing up again and reached >> max within 20 min. Now the GC is coming up all the time, which is >> slowing down the commit and search cycles. >> >> This is still puzzling what does Solr holds in the memory and doesn't >> release? >> >> I haven't been able to profile as the dump is too big. Would setting >> termIndexInterval help - not sure how can that be set using Solr. >> >> Some other query properties under solrconfig, >> >> >> 1024 >> true >> 50 >> 200 >> >> false >> 2 >> >> >> Currently, I got 800 million documents and have specified 8G heap size. >> >> Any other suggestion on what can I do to control the Solr memory >> consumption? >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: >> > Just an update on the memory issue - might be useful for others. I >> > read the following, >> > >> > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) >> > >> > and looks like the first and new searcher listeners would populate the >> > FieldCache. Commenting out these two listener entries seems to do the >> > trick - at least the heap size is not growing as soon as Solr starts >> > up. >> > >> > I ran some searches and they all came out fine. Index rate is also >> > pretty good. Would there be any impact of disabling these listeners? >> > >> > Thanks, >> > -vivek >> > >> > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: >> >> Otis, >> >> >> >> In that case, I'm not sure why Solr is taking up so much memory as >> >> soon as we start it up. I checked for .tii file and there is only one, >> >> >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> >> ramBuffer size is only 64MB. >> >> >> >> I read note on sorting, >> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> >> something related to FieldCache. I don't see this as parameter defined >> >> in either solrconfig.xml or schema.xml. Could this be something that >> >> can load things in memory at startup? How can we disable it? >> >> >> >> I'm trying to find out if there is a way to tell how much memory Solr >> >> would consume and way to cap it. >> >> >> >> Thanks, >> >> -vivek >> >> >> >> >> >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Sorting is triggered by the sort parameter in the URL, not a >> characteristic of a field. :) >> >>> >> >>> Otis >> >>> -- >> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >>> >> >>> >> >>> >> >>> - Original Message >> >>>> From: vivek sar >> >>>> To: solr-user@lucene.apache.org >> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM >> >>>> Subject: Re: Solr memory requirements? >> >>>> >> >>>> Thanks Otis. >> >>>> >> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if >> >>>> I've configured anything wrong. >> >>>> >> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just >> >>>> stored). All my fields are basic data type - which I thought are not >> >>>> sorted. My id field is unique key. >> >>>> >> >>>> Is there any fie
Re: Solr memory requirements?
Otis, We are not running master-slave configuration. We get very few searches(admin only) in a day so we didn't see the need of replication/snapshot. This problem is with one Solr instance managing 4 cores (each core 200 million records). Both indexing and searching is performed by the same Solr instance. What are .tii files used for? I see this file under only one core. Still looking for what gets loaded in heap by Solr (during load time, indexing, and searching) and stays there. I see most of these are tenured objects and not getting released by GC - will post profile records tomorrow. Thanks, -vivek On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic wrote: > > There is constant mixing of indexing concepts and searching concepts in this > thread. Are you having problems on the master (indexing) or on the slave > (searching)? > > > That .tii is only 20K and you said this is a large index? That doesn't smell > right... > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 5:12:00 PM >> Subject: Re: Solr memory requirements? >> >> Otis, >> >> In that case, I'm not sure why Solr is taking up so much memory as >> soon as we start it up. I checked for .tii file and there is only one, >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> ramBuffer size is only 64MB. >> >> I read note on sorting, >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> something related to FieldCache. I don't see this as parameter defined >> in either solrconfig.xml or schema.xml. Could this be something that >> can load things in memory at startup? How can we disable it? >> >> I'm trying to find out if there is a way to tell how much memory Solr >> would consume and way to cap it. >> >> Thanks, >> -vivek >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > >> > Sorting is triggered by the sort parameter in the URL, not a >> > characteristic of >> a field. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 4:42:16 PM >> >> Subject: Re: Solr memory requirements? >> >> >> >> Thanks Otis. >> >> >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> >> I've configured anything wrong. >> >> >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> >> stored). All my fields are basic data type - which I thought are not >> >> sorted. My id field is unique key. >> >> >> >> Is there any field here that might be getting sorted? >> >> >> >> >> >> required="true" omitNorms="true" compressed="false"/> >> >> >> >> >> >> compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> default="NOW/HOUR" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> compressed="false"/> >> >> >> >> compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> compressed="false&qu
Re: Solr memory requirements?
I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage? Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson wrote: > Warning: I'm wy out of my competency range when I comment > on SOLR, but I've seen the statement that string fields are NOT > tokenized while text fields are, and I notice that almost all of your fields > are string type. > > Would someone more knowledgeable than me care to comment on whether > this is at all relevant? Offered in the spirit that sometimes there are > things > so basic that only an amateur can see them > > Best > Erick > > On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: > >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> > required="true" omitNorms="true" compressed="false"/> >> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > default="NOW/HOUR" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > default="NOW/HOUR" omitNorms="true"/> >> >> >> >> > omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values >> for the field(s) used for sorting. Similarly for facet fields. Solr >> caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> consume during indexing. There is no need to commit every 50K docs unless >> you want to trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> going to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in memory? >> >> >> >> 2) How much memory is required during index time? If I'm committing >> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> >> I need to give to Solr. >> >> >> >> 3) Is there a minimum memory requirement
Re: Solr memory requirements?
Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). The search query is a very generic one in this case which goes through all the cores (4 of them - 800 million records), finds 400million matches and returns 100 rows. Does the Solr searcher holds up the reference to objects in memory? I couldn't find any settings that would tell me it does, but every search causing heap to go up is definitely suspicious. 2) I ran the jmap histo to get the top objects (this is on a smaller instance with 2 G memory, this is before running search - after running search I wasn't able to run jmap), num #instances #bytes class name -- 1: 3890855 222608992 [C 2: 3891673 155666920 java.lang.String 3: 3284341 131373640 org.apache.lucene.index.TermInfo 4: 3334198 106694336 org.apache.lucene.index.Term 5: 271 26286496 [J 6:16 26273936 [Lorg.apache.lucene.index.Term; 7:16 26273936 [Lorg.apache.lucene.index.TermInfo; 8:320512 15384576 org.apache.lucene.index.FreqProxTermsWriter$PostingList 9: 10335 11554136 [I I'm not sure what's the first one (C)? I couldn't profile it to know what all the Strings are being allocated by - any ideas? Any ideas on what Searcher might be holding on and how can we change that behavior? Thanks, -vivek On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: > I don't know if field type has any impact on the memory usage - does it? > > Our use cases require complete matches, thus there is no need of any > analysis in most cases - does it matter in terms of memory usage? > > Also, is there any default caching used by Solr if I comment out all > the caches under query in solrconfig.xml? I also don't have any > auto-warming queries. > > Thanks, > -vivek > > On Wed, May 13, 2009 at 4:24 PM, Erick Erickson > wrote: >> Warning: I'm wy out of my competency range when I comment >> on SOLR, but I've seen the statement that string fields are NOT >> tokenized while text fields are, and I notice that almost all of your fields >> are string type. >> >> Would someone more knowledgeable than me care to comment on whether >> this is at all relevant? Offered in the spirit that sometimes there are >> things >> so basic that only an amateur can see them >> >> Best >> Erick >> >> On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: >> >>> Thanks Otis. >>> >>> Our use case doesn't require any sorting or faceting. I'm wondering if >>> I've configured anything wrong. >>> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>> stored). All my fields are basic data type - which I thought are not >>> sorted. My id field is unique key. >>> >>> Is there any field here that might be getting sorted? >>> >>> >> required="true" omitNorms="true" compressed="false"/> >>> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> default="NOW/HOUR" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> &
Re: Solr memory requirements?
Thanks Mark. I checked all the items you mentioned, 1) I've omitnorms=true for all my indexed fields (stored only fields I guess doesn't matter) 2) I've tried commenting out all caches in the solrconfig.xml, but that doesn't help much 3) I've tried commenting out the first and new searcher listeners settings in the solrconfig.xml - the only way that helps is that at startup time the memory usage doesn't spike up - that's only because there is no auto-warmer query to run. But, I noticed commenting out searchers slows down any other queries to Solr. 4) I don't have any sort or facet in my queries 5) I'm not sure how to change the "Lucene term interval" from Solr - is there a way to do that? I've been playing around with this memory thing the whole day and have found that it's the search that's hogging the memory. Any time there is a search on all the records (800 million) the heap consumption jumps by 5G. This makes me think there has to be some configuration in Solr that's causing some terms per document to be loaded in memory. I've posted my settings several times on this forum, but no one has been able to pin point what configuration might be causing this. If someone is interested I can attach the solrconfig and schema files as well. Here are the settings again under Query tag, 1024 true 50 200 false 2 and schema, Any help is greatly appreciated. Thanks, -vivek On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: > 800 million docs is on the high side for modern hardware. > > If even one field has norms on, your talking almost 800 MB right there. And > then if another Searcher is brought up well the old one is serving (which > happens when you update)? Doubled. > > Your best bet is to distribute across a couple machines. > > To minimize you would want to turn off or down caching, don't facet, don't > sort, turn off all norms, possibly get at the Lucene term interval and raise > it. Drop on deck searchers setting. Even then, 800 million...time to > distribute I'd think. > > vivek sar wrote: >> >> Some update on this issue, >> >> 1) I attached jconsole to my app and monitored the memory usage. >> During indexing the memory usage goes up and down, which I think is >> normal. The memory remains around the min heap size (4 G) for >> indexing, but as soon as I run a search the tenured heap usage jumps >> up to 6G and remains there. Subsequent searches increases the heap >> usage even more until it reaches the max (8G) - after which everything >> (indexing and searching becomes slow). >> >> The search query is a very generic one in this case which goes through >> all the cores (4 of them - 800 million records), finds 400million >> matches and returns 100 rows. >> >> Does the Solr searcher holds up the reference to objects in memory? I >> couldn't find any settings that would tell me it does, but every >> search causing heap to go up is definitely suspicious. >> >> 2) I ran the jmap histo to get the top objects (this is on a smaller >> instance with 2 G memory, this is before running search - after >> running search I wasn't able to run jmap), >> >> num #instances #bytes class name >> -- >> 1: 3890855 222608992 [C >> 2: 3891673 155666920 java.lang.String >> 3: 3284341 131373640 org.apache.lucene.index.TermInfo >> 4: 3334198 106694336 org.apache.lucene.index.Term >> 5: 271 26286496 [J >> 6: 16 26273936 [Lorg.apache.lucene.index.Term; >> 7: 16 26273936 [Lorg.apache.lucene.index.TermInfo; >> 8: 320512 15384576 >> org.apache.lucene.index.FreqProxTermsWriter$PostingList >> 9: 10335 11554136 [I >> >> I'm not sure what's the first one (C)? I couldn't profile it to know >> what all the Strings are being allocated by - any ideas? >> >> Any ideas on what Searcher might be holding on and how can we change >> that behavior? >> >> Thanks, >> -vivek >> >> >> On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: >> >>> >>> I don't know if field type has any impact on the memory usage - does it? >>> >>> Our use cases require complete matches, thus there is no need of any >>> analysis in most cases - does it matter in terms of memory usage? >>> >>> Also, is there any default caching used by Solr if I comment out all &g
Re: Solr memory requirements?
Some more info, Profiling the heap dump shows "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object - taking up almost 80% of total memory (6G) - see the attached screen shot for a smaller dump. There is some norms object - not sure where are they coming from as I've omitnorms=true for all indexed records. I also noticed that if I run a query - let's say generic query that hits 100million records and then follow up with a specific query - which hits only 1 record, the second query causes the increase in heap. Looks like there are few bytes being loaded into memory for each document - I've checked the schema all indexes have omitNorms=true, all caches are commented out - still looking to see what else might put things in memory which don't get collected by GC. I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr 1.4 (which I'm using). Not sure if that can cause any problem. I do use range queries for dates - would that have any effect? Any other ideas? Thanks, -vivek On Thu, May 14, 2009 at 8:38 PM, vivek sar wrote: > Thanks Mark. > > I checked all the items you mentioned, > > 1) I've omitnorms=true for all my indexed fields (stored only fields I > guess doesn't matter) > 2) I've tried commenting out all caches in the solrconfig.xml, but > that doesn't help much > 3) I've tried commenting out the first and new searcher listeners > settings in the solrconfig.xml - the only way that helps is that at > startup time the memory usage doesn't spike up - that's only because > there is no auto-warmer query to run. But, I noticed commenting out > searchers slows down any other queries to Solr. > 4) I don't have any sort or facet in my queries > 5) I'm not sure how to change the "Lucene term interval" from Solr - > is there a way to do that? > > I've been playing around with this memory thing the whole day and have > found that it's the search that's hogging the memory. Any time there > is a search on all the records (800 million) the heap consumption > jumps by 5G. This makes me think there has to be some configuration in > Solr that's causing some terms per document to be loaded in memory. > > I've posted my settings several times on this forum, but no one has > been able to pin point what configuration might be causing this. If > someone is interested I can attach the solrconfig and schema files as > well. Here are the settings again under Query tag, > > > 1024 > true > 50 > 200 > > false > 2 > > > and schema, > > required="true" omitNorms="true" compressed="false"/> > > compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > default="NOW/HOUR" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > compressed="false"/> > compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > default="NOW/HOUR" omitNorms="true"/> > > > omitNorms="true" multiValued="true"/> > > Any help is greatly appreciated. > > Thanks, > -vivek > > On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: >> 800 million docs is on the high side for modern hardware. >> >> If even one field has norms on, your talking almost 800 MB right there. And >> then if another Searcher is brought up well the old one is serving (which >> happens when you update)? Doubled. >> >> Your best bet is to distribute across a couple machines. >> >> To minimize you would want to turn off or down caching, don't facet, don't >> sort, turn off all norms, possibly get at the Lucene term interval and raise >> it. Drop on deck searchers setting. Even then, 800 million...time to >> distribute I'd think. >> >> vivek sar wrote: >>> >>> Some update on this issue, >>> &
Re: Defining DataDir in Multi-Core
Yeah, it was sometime back - it did work. Thanks for following up. On Tue, May 19, 2009 at 12:34 AM, RaghavPrabhu wrote: > > Hi Vivek, > > Have you figure out the problem of creating the data dir in wrong > location? > > For me its working... > > Just comment the data dir (in solrconfig.xml file) and create the core > via REST call. It should work!!! > > Thanks & regards > Prabhu.K > > > > vivek sar wrote: >> >> Hi, >> >> I tried the latest nightly build (04-01-09) - it takes the dataDir >> property now, but it's creating the Data dir at the wrong location. >> For ex., I've the following in solr.xml, >> >> >> >> > dataDir="/Users/opal/temp/afterchat/solr/data/core0"/> >> >> >> >> but, it always seem to be creating the solr/data directory in the cwd >> (where I started the Tomcat from). Here is the log from Catalina.out, >> >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore >> INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/, >> dataDir=./solr/data/ >> .. >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex >> WARNING: [core2] Solr index directory './solr/data/index' doesn't >> exist. Creating new index... >> >> I've also tried relative paths, but to no avail. >> >> Is this a bug? >> >> Thanks, >> -vivek >> >> On Wed, Apr 1, 2009 at 9:45 AM, vivek sar wrote: >>> Thanks Shalin. >>> >>> Is it available in the latest nightly build? >>> >>> Is there any other way I can create cores dynamically (using CREATE >>> service) which will use the same schema.xml and solrconfig.xml, but >>> write to different data directories? >>> >>> Thanks, >>> -vivek >>> >>> On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar >>> wrote: >>>> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar wrote: >>>>> I'm using the latest released one - Solr 1.3. The wiki says passing >>>>> dataDir to CREATE action (web service) should work, but that doesn't >>>>> seem to be working. >>>>> >>>> >>>> That is a Solr 1.4 feature (not released yet). >>>> >>>> -- >>>> Regards, >>>> Shalin Shekhar Mangar. >>>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Defining-DataDir-in-Multi-Core-tp22818543p23611179.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Defining DataDir in Multi-Core
As per splitting the index, I simply start a new core once a core reaches a certain size - using CREATE and then start writing to that new core. Note that Solr will maintain all the cores defined in the solr.xml. As per reusing the same schema and solrconfig.xml - you can create a default core (say core0) and put in the conf directory there. In the solr.xml for every new core set the instanceDir to core0 and its dataDir to the new core's data directory. Hope this helps. -vivek 2009/5/19 Noble Paul നോബിള് नोब्ळ् : > On Tue, May 19, 2009 at 2:32 PM, KK wrote: >> I could not follow[is this mail a continuation of some old mail, a part of >> which seems to be missing], but I want to. >> Is it the case that CREATE is to be supported by solr1.4 i.e currently >> solr1.3 doesnot support this? Correct me if I'm wrong . > CREATE is supported in Solr1.3 also > > the dataDir attribute is a new feature in 1.4 >> >> Vivek could you please tell me how did you fix the problem of using a single >> schema and config file for all cores and having different data directories. >> I'm stuck at the same point as you were. Please help me out. Can you provide >> some specific examples that shows the way you used the create statement to >> register new cores on the fly. Thank you . >> >> --KK >> >> On Tue, May 19, 2009 at 1:17 PM, vivek sar wrote: >> >>> Yeah, it was sometime back - it did work. Thanks for following up. >>> >>> On Tue, May 19, 2009 at 12:34 AM, RaghavPrabhu >>> wrote: >>> > >>> > Hi Vivek, >>> > >>> > Have you figure out the problem of creating the data dir in wrong >>> > location? >>> > >>> > For me its working... >>> > >>> > Just comment the data dir (in solrconfig.xml file) and create the core >>> > via REST call. It should work!!! >>> > >>> > Thanks & regards >>> > Prabhu.K >>> > >>> > >>> > >>> > vivek sar wrote: >>> >> >>> >> Hi, >>> >> >>> >> I tried the latest nightly build (04-01-09) - it takes the dataDir >>> >> property now, but it's creating the Data dir at the wrong location. >>> >> For ex., I've the following in solr.xml, >>> >> >>> >> >>> >> >>> >> >> >> dataDir="/Users/opal/temp/afterchat/solr/data/core0"/> >>> >> >>> >> >>> >> >>> >> but, it always seem to be creating the solr/data directory in the cwd >>> >> (where I started the Tomcat from). Here is the log from Catalina.out, >>> >> >>> >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore >>> >> INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/, >>> >> dataDir=./solr/data/ >>> >> .. >>> >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex >>> >> WARNING: [core2] Solr index directory './solr/data/index' doesn't >>> >> exist. Creating new index... >>> >> >>> >> I've also tried relative paths, but to no avail. >>> >> >>> >> Is this a bug? >>> >> >>> >> Thanks, >>> >> -vivek >>> >> >>> >> On Wed, Apr 1, 2009 at 9:45 AM, vivek sar wrote: >>> >>> Thanks Shalin. >>> >>> >>> >>> Is it available in the latest nightly build? >>> >>> >>> >>> Is there any other way I can create cores dynamically (using CREATE >>> >>> service) which will use the same schema.xml and solrconfig.xml, but >>> >>> write to different data directories? >>> >>> >>> >>> Thanks, >>> >>> -vivek >>> >>> >>> >>> On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar >>> >>> wrote: >>> >>>> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar wrote: >>> >>>>> I'm using the latest released one - Solr 1.3. The wiki says passing >>> >>>>> dataDir to CREATE action (web service) should work, but that doesn't >>> >>>>> seem to be working. >>> >>>>> >>> >>>> >>> >>>> That is a Solr 1.4 feature (not released yet). >>> >>>> >>> >>>> -- >>> >>>> Regards, >>> >>>> Shalin Shekhar Mangar. >>> >>>> >>> >>> >>> >> >>> >> >>> > >>> > -- >>> > View this message in context: >>> http://www.nabble.com/Defining-DataDir-in-Multi-Core-tp22818543p23611179.html >>> > Sent from the Solr - User mailing list archive at Nabble.com. >>> > >>> > >>> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Servlet filter for Solr
Hi, I've to intercept every request to solr (search and update) and log some performance numbers. In order to do so I tried a Servlet filter and added this to Solr's web.xml, IndexFilter com.xxx.index.filter.IndexRequestFilter test-param This parameter is for testing. IndexFilter SolrUpdate SolrServer but, this doesn't seem to be working. Couple of questions, 1) What's wrong with my web.xml setting? 2) Is there any easier way to intercept calls to Solr without changing its web.xml? Basically can I just change the solrconfig.xml to do so (beside requesthandlers) so I don't have to customize the solr.war? Thanks, -vivek
Re: Servlet filter for Solr
I've tried both "url-pattern" (/*) and servlet-name in the filter mapping , but none of it seem to intercept the call. If I put (/*) only up to /solr gets intercepted. Since, I'm using multicore - calls like /solr/core0 don't get intercepted. I want both select and update to be monitored. Any ideas? Thanks, -vivek 2009/6/9 Noble Paul നോബിള് नोब्ळ् : > if you wish to intercept "read" calls ,a filter is the only way. > > > On Wed, Jun 10, 2009 at 6:35 AM, vivek sar wrote: >> Hi, >> >> I've to intercept every request to solr (search and update) and log >> some performance numbers. In order to do so I tried a Servlet filter >> and added this to Solr's web.xml, >> >> >> IndexFilter >> >> com.xxx.index.filter.IndexRequestFilter >> >> test-param >> This parameter is for >> testing. >> >> >> >> IndexFilter >> >> SolrUpdate >> SolrServer > > I guess you canot put servlets in the filter mapping >> >> >> but, this doesn't seem to be working. Couple of questions, >> >> 1) What's wrong with my web.xml setting? >> 2) Is there any easier way to intercept calls to Solr without changing >> its web.xml? Basically can I just change the solrconfig.xml to do so >> (beside requesthandlers) so I don't have to customize the solr.war? >> >> Thanks, >> -vivek >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Boosting for most recent documents
Hi, I'm trying to find a way to get the most recent entry for the searched word. For ex., if I have a document with field name "user". If I search for user:vivek, I want to get the document that was indexed most recently. Two ways I could think of, 1) Sort by some time stamp field - but with millions of documents this becomes a huge memory problem as we have seen OOM with sorting before 2) Boost the most recent document - I'm not sure how to do this. Basically, we want to have the most recent document score higher than any other and then we can retrieve just 10 records and sort in the application by time stamp field to get the most recent document matching the keyword. Any suggestion on how can this be done? Thanks, -vivek
Re: Boosting for most recent documents
Thanks Otis. I got a distributed index - using Solr multi-core. Basically, I got 6 indexer instances running on 3 different boxes. Couple of questions, 1) Is it possible to sort on document id for multiple-shards? How is that done? 2) How would boost by most recent doc at index time? Thanks, -vivek On Wed, Jul 8, 2009 at 7:47 PM, Otis Gospodnetic wrote: > > Sort by the internal Lucene document ID and pick the highest one. That might > do the job for you. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> Subject: Boosting for most recent documents >> >> Hi, >> >> I'm trying to find a way to get the most recent entry for the >> searched word. For ex., if I have a document with field name "user". >> If I search for user:vivek, I want to get the document that was >> indexed most recently. Two ways I could think of, >> >> 1) Sort by some time stamp field - but with millions of documents this >> becomes a huge memory problem as we have seen OOM with sorting before >> 2) Boost the most recent document - I'm not sure how to do this. >> Basically, we want to have the most recent document score higher than >> any other and then we can retrieve just 10 records and sort in the >> application by time stamp field to get the most recent document >> matching the keyword. >> >> Any suggestion on how can this be done? >> >> Thanks, >> -vivek > >
Re: Boosting for most recent documents
How do we sort by internal doc id (say on one index only) using Solr? I saw couple of threads saying it (Sort.INDEXORDER) was not supported in Solr, http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 Has the index order support been added in Solr 1.4? How do we use that - any documentation? Thanks, -vivek On Thu, Jul 9, 2009 at 2:21 PM, Otis Gospodnetic wrote: > > Ah, with multiple indices you can't rely on the max Lucene doc Id. I think > you have to do with the timestamp approach. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Thursday, July 9, 2009 1:13:54 PM >> Subject: Re: Boosting for most recent documents >> >> Thanks Otis. I got a distributed index - using Solr multi-core. >> Basically, I got 6 indexer instances running on 3 different boxes. >> Couple of questions, >> >> 1) Is it possible to sort on document id for multiple-shards? How is that >> done? >> 2) How would boost by most recent doc at index time? >> >> Thanks, >> -vivek >> >> >> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >> Gospodneticwrote: >> > >> > Sort by the internal Lucene document ID and pick the highest one. That >> > might >> do the job for you. >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> >> Subject: Boosting for most recent documents >> >> >> >> Hi, >> >> >> >> I'm trying to find a way to get the most recent entry for the >> >> searched word. For ex., if I have a document with field name "user". >> >> If I search for user:vivek, I want to get the document that was >> >> indexed most recently. Two ways I could think of, >> >> >> >> 1) Sort by some time stamp field - but with millions of documents this >> >> becomes a huge memory problem as we have seen OOM with sorting before >> >> 2) Boost the most recent document - I'm not sure how to do this. >> >> Basically, we want to have the most recent document score higher than >> >> any other and then we can retrieve just 10 records and sort in the >> >> application by time stamp field to get the most recent document >> >> matching the keyword. >> >> >> >> Any suggestion on how can this be done? >> >> >> >> Thanks, >> >> -vivek >> > >> > > >
Re: Boosting for most recent documents
Thanks Bill. Couple of questions, 1) Would the function query load all unique terms (for that field) in memory the way sort (field cache) does? If so, that wouldn't work for us as we can have over 5 billion records spread across multiple shards (up to 10 indexer instances), that would surely kill the process if it were to load everything in memory. 2) Would the function query work on multi-shard query? For ex., recip(rord(creationDate),1,1000,1000) would it automatically do the function on the combined result from all the shards or would it run on individual shard and get results from them? I would still be interested in knowing if Solr supports Sort.IndexOrder - if so, how? Thanks, -vivek On Thu, Jul 9, 2009 at 8:27 PM, Bill Au wrote: > With a time stamp you can use a function query to boost the score of newer > documents: > http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd > > Bill > > On Thu, Jul 9, 2009 at 5:58 PM, vivek sar wrote: > >> How do we sort by internal doc id (say on one index only) using Solr? >> I saw couple of threads saying it (Sort.INDEXORDER) was not supported >> in Solr, >> >> >> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 >> >> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 >> >> Has the index order support been added in Solr 1.4? How do we use that >> - any documentation? >> >> Thanks, >> -vivek >> >> On Thu, Jul 9, 2009 at 2:21 PM, Otis >> Gospodnetic wrote: >> > >> > Ah, with multiple indices you can't rely on the max Lucene doc Id. I >> think you have to do with the timestamp approach. >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Thursday, July 9, 2009 1:13:54 PM >> >> Subject: Re: Boosting for most recent documents >> >> >> >> Thanks Otis. I got a distributed index - using Solr multi-core. >> >> Basically, I got 6 indexer instances running on 3 different boxes. >> >> Couple of questions, >> >> >> >> 1) Is it possible to sort on document id for multiple-shards? How is >> that done? >> >> 2) How would boost by most recent doc at index time? >> >> >> >> Thanks, >> >> -vivek >> >> >> >> >> >> >> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >> >> Gospodneticwrote: >> >> > >> >> > Sort by the internal Lucene document ID and pick the highest one. >> That might >> >> do the job for you. >> >> > >> >> > Otis >> >> > -- >> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> > >> >> > >> >> > >> >> > - Original Message >> >> >> From: vivek sar >> >> >> To: solr-user >> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> >> >> Subject: Boosting for most recent documents >> >> >> >> >> >> Hi, >> >> >> >> >> >> I'm trying to find a way to get the most recent entry for the >> >> >> searched word. For ex., if I have a document with field name "user". >> >> >> If I search for user:vivek, I want to get the document that was >> >> >> indexed most recently. Two ways I could think of, >> >> >> >> >> >> 1) Sort by some time stamp field - but with millions of documents >> this >> >> >> becomes a huge memory problem as we have seen OOM with sorting before >> >> >> 2) Boost the most recent document - I'm not sure how to do this. >> >> >> Basically, we want to have the most recent document score higher than >> >> >> any other and then we can retrieve just 10 records and sort in the >> >> >> application by time stamp field to get the most recent document >> >> >> matching the keyword. >> >> >> >> >> >> Any suggestion on how can this be done? >> >> >> >> >> >> Thanks, >> >> >> -vivek >> >> > >> >> > >> > >> > >> >
Re: Boosting for most recent documents
Hi, Does anyone know if Solr supports sorting by internal document ids, i.e, like Sort.INDEXORDER in Lucene? If so, how? Also, if anyone have any insight on if function query loads up unique terms (like field sorts) in memory or not. Thanks, -vivek On Fri, Jul 10, 2009 at 10:26 AM, vivek sar wrote: > Thanks Bill. Couple of questions, > > 1) Would the function query load all unique terms (for that field) in > memory the way sort (field cache) does? If so, that wouldn't work for > us as we can have over 5 billion records spread across multiple shards > (up to 10 indexer instances), that would surely kill the process if it > were to load everything in memory. > > 2) Would the function query work on multi-shard query? For ex., > recip(rord(creationDate),1,1000,1000) would it automatically do the > function on the combined result from all the shards or would it run on > individual shard and get results from them? > > I would still be interested in knowing if Solr supports > Sort.IndexOrder - if so, how? > > Thanks, > -vivek > > On Thu, Jul 9, 2009 at 8:27 PM, Bill Au wrote: >> With a time stamp you can use a function query to boost the score of newer >> documents: >> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd >> >> Bill >> >> On Thu, Jul 9, 2009 at 5:58 PM, vivek sar wrote: >> >>> How do we sort by internal doc id (say on one index only) using Solr? >>> I saw couple of threads saying it (Sort.INDEXORDER) was not supported >>> in Solr, >>> >>> >>> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 >>> >>> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 >>> >>> Has the index order support been added in Solr 1.4? How do we use that >>> - any documentation? >>> >>> Thanks, >>> -vivek >>> >>> On Thu, Jul 9, 2009 at 2:21 PM, Otis >>> Gospodnetic wrote: >>> > >>> > Ah, with multiple indices you can't rely on the max Lucene doc Id. I >>> think you have to do with the timestamp approach. >>> > >>> > Otis >>> > -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > - Original Message >>> >> From: vivek sar >>> >> To: solr-user@lucene.apache.org >>> >> Sent: Thursday, July 9, 2009 1:13:54 PM >>> >> Subject: Re: Boosting for most recent documents >>> >> >>> >> Thanks Otis. I got a distributed index - using Solr multi-core. >>> >> Basically, I got 6 indexer instances running on 3 different boxes. >>> >> Couple of questions, >>> >> >>> >> 1) Is it possible to sort on document id for multiple-shards? How is >>> that done? >>> >> 2) How would boost by most recent doc at index time? >>> >> >>> >> Thanks, >>> >> -vivek >>> >> >>> >> >>> >> >>> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >>> >> Gospodneticwrote: >>> >> > >>> >> > Sort by the internal Lucene document ID and pick the highest one. >>> That might >>> >> do the job for you. >>> >> > >>> >> > Otis >>> >> > -- >>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >> > >>> >> > >>> >> > >>> >> > - Original Message >>> >> >> From: vivek sar >>> >> >> To: solr-user >>> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >>> >> >> Subject: Boosting for most recent documents >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> I'm trying to find a way to get the most recent entry for the >>> >> >> searched word. For ex., if I have a document with field name "user". >>> >> >> If I search for user:vivek, I want to get the document that was >>> >> >> indexed most recently. Two ways I could think of, >>> >> >> >>> >> >> 1) Sort by some time stamp field - but with millions of documents >>> this >>> >> >> becomes a huge memory problem as we have seen OOM with sorting before >>> >> >> 2) Boost the most recent document - I'm not sure how to do this. >>> >> >> Basically, we want to have the most recent document score higher than >>> >> >> any other and then we can retrieve just 10 records and sort in the >>> >> >> application by time stamp field to get the most recent document >>> >> >> matching the keyword. >>> >> >> >>> >> >> Any suggestion on how can this be done? >>> >> >> >>> >> >> Thanks, >>> >> >> -vivek >>> >> > >>> >> > >>> > >>> > >>> >> >