Problem After Modifying CoreContainer
Hi, I'm developing an application that requires a large number of cores, and since lazy loading / LRU caching won't be available until 1.5, I decided to modify CoreContainer to hold me over. Another requirement is that multiple Solr instances can access the same cores (on NAS, for instance), so the approach I'm using is to maintain a local registry / load balancer that assigns "active" cores to different machines and, when a machine has exceeded its limit, unload cores in LRU order. The modifications to CoreContainer are as follows: to avoid having to issue "create" requests every time we need to load an inactive core, getCore will attempt to create/open any core that it doesn't find in the cores map, unless name is "" or "admin", in which case it returns null as per the original implementation. The create function is overloaded to take a core name and creates a CoreDescriptor using some defaults added to solr.xml. Everything works fine until I try to make a core unload request, at which point I see the following: org.apache.solr.common.SolrException: Not Found Not Found request: /solr/NewUser0/admin/cores?action=UNLOAD&core=NewUser0&wt=javabin&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343) So, my guess is that my modifications are keeping admin cores from being created properly. Or maybe I don't know what I'm talking about. I stepped through execution with a debugger and watched it bounce around SolrDispatchHandler.doFilter() before giving up and forwarding the request to the RequestDispatcher, but all I could tell from that was that it wanted something it couldn't find. Can anyone shed some light on what my mods might be keeping Solr from doing that would cause this problem? Code pasted below. And don't mind the weird locking, this happens even with a single thread. Thanks, Dan public SolrCore getCore(String name) { ReentrantReadWriteLock lock = getCoreLock(name); try { lock.readLock().lock(); SolrCore core = cores.get(name); if (core != null) { core.open(); // increment the ref count while still synchronized return core; } else if ("".equals(name) || "admin".equals(name)) { return null; } else { try { lock.readLock().unlock(); lock.writeLock().lock(); SolrCore core1 = cores.get(name); if (core1 != null) { return core1; } log.info("Autocreating core: '" + name + "'"); core = create(name); cores.put(name, core); core.open(); return core; } catch (IOException e) { log.error("Autocreating core '" + name + "'", e); return null; } catch (ParserConfigurationException e) { log.error("Autocreating core '" + name + "'", e); return null; } catch (SAXException e) { log.error("Autocreating core '" + name + "'", e); return null; } finally { lock.readLock().lock(); lock.writeLock().unlock(); } } } finally { lock.readLock().unlock(); } } public SolrCore create(String coreName) throws IOException, ParserConfigurationException, SAXException { if (defaultConfigFile == null || defaultSchemaFile == null) { throw new RuntimeException("Cannot use autocreate unless both a default configuration file and a default schema file are specified"); } CoreDescriptor dcore = new CoreDescriptor(this, coreName, getInstanceDir(coreName)); dcore.setConfigName(defaultConfigFile); dcore.setSchemaName(defaultSchemaFile); return create(dcore); } // Eventually this will be overridden to do some intelligent management of a directory hierarchy so we //don't have hundreds of thousands of cores in the same directory public String getInstanceDir(String coreName) { return coreName; } -- View this message in context: http://www.nabble.com/Problem-After-Modifying-CoreContainer-tp24762199p24762199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem After Modifying CoreContainer
And, re-examining the URL, this is clearly my fault for improper use of SolrJ. Please ignore. danben wrote: > > Hi, > > I'm developing an application that requires a large number of cores, and > since lazy loading / LRU caching won't be available until 1.5, I decided > to modify CoreContainer to hold me over. > > Another requirement is that multiple Solr instances can access the same > cores (on NAS, for instance), so the approach I'm using is to maintain a > local registry / load balancer that assigns "active" cores to different > machines and, when a machine has exceeded its limit, unload cores in LRU > order. > > The modifications to CoreContainer are as follows: to avoid having to > issue "create" requests every time we need to load an inactive core, > getCore will attempt to create/open any core that it doesn't find in the > cores map, unless name is "" or "admin", in which case it returns null as > per the original implementation. The create function is overloaded to > take a core name and creates a CoreDescriptor using some defaults added to > solr.xml. > > Everything works fine until I try to make a core unload request, at which > point I see the following: > org.apache.solr.common.SolrException: Not Found > > Not Found > > request: > /solr/NewUser0/admin/cores?action=UNLOAD&core=NewUser0&wt=javabin&version=2.2 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343) > > So, my guess is that my modifications are keeping admin cores from being > created properly. Or maybe I don't know what I'm talking about. I > stepped through execution with a debugger and watched it bounce around > SolrDispatchHandler.doFilter() before giving up and forwarding the request > to the RequestDispatcher, but all I could tell from that was that it > wanted something it couldn't find. Can anyone shed some light on what my > mods might be keeping Solr from doing that would cause this problem? > > Code pasted below. And don't mind the weird locking, this happens even > with a single thread. > > Thanks, > Dan > > public SolrCore getCore(String name) { > ReentrantReadWriteLock lock = getCoreLock(name); > try { > lock.readLock().lock(); > SolrCore core = cores.get(name); > if (core != null) { > core.open(); // increment the ref count while still > synchronized > return core; > } else if ("".equals(name) || "admin".equals(name)) { > return null; > } else { > try { > lock.readLock().unlock(); > lock.writeLock().lock(); > SolrCore core1 = cores.get(name); > if (core1 != null) { > return core1; > } > log.info("Autocreating core: '" + name + "'"); > core = create(name); > cores.put(name, core); > core.open(); > return core; > } catch (IOException e) { > log.error("Autocreating core '" + name + "'", e); > return null; > } catch (ParserConfigurationException e) { > log.error("Autocreating core '" + name + "'", e); > return null; > } catch (SAXException e) { > log.error("Autocreating core '" + name + "'", e); > return null; > } finally { > lock.readLock().lock(); > lock.writeLock().unlock(); > } > } > } finally { > lock.readLock().unlock(); > } > } > > public SolrCore create(String coreName) throws IOException, > ParserConfigurationException, SAXException { > if (defaultConfigFile == null || defaultSchemaFile == null) { > throw new RuntimeException("Cannot use autocreate unless both a > default configuration file and a default schema file are specified"); > } > CoreDescriptor dcore = new CoreDescriptor(this, coreName, > getInstanceDir(coreName)); > dcore.setConfigName(defaultConfigFile); > dcore.setSchemaName(defaultSchemaFile); > return create(dcore); > } > > // Eventually this will be overridden to do some intelligent management of > a directory hierarchy so we //don't have hundreds of thousands of cores in > the same directory > public String getInstanceDir(String coreName) { > return coreName; > } > > -- View this message in context: http://www.nabble.com/Problem-After-Modifying-CoreContainer-tp24762199p24763149.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceted Search on Dynamic Fields?
I'm trying to perform a faceted query with the facet field referencing a field that is not in the schema but matches a dynamicField with its suffix. The query returns results but for some reason the facet list is always empty. When I change the facet field to one that is explicitly named in the schema I get the proper results. Is this expected behavior? I wasn't able to find anything in the docs about dynamic fields wrt faceting. One other thing I thought might have been causing the problem is that the values in this field are mostly distinct (that won't be the case in the actual application, I'm just doing it this way now to see how faceted queries behave). However, when I performed the same query with a static field with lots of distinct values I just got an OutOfMemoryError, which leads me back to my original hypothesis. So, is it the case that faceted queries are not permitted on dynamic facet fields, and if so, is there any workaround? -- View this message in context: http://www.nabble.com/Faceted-Search-on-Dynamic-Fields--tp25612887p25612887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceted Search on Dynamic Fields?
Also, here is the field definition in the schema -- View this message in context: http://www.nabble.com/Faceted-Search-on-Dynamic-Fields--tp25612887p25612936.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrException - Lock obtain timed out, no leftover locks
Hi, I'm running Solr 1.3.0 in multicore mode and feeding it data from which the core name is inferred from a specific field. My service extracts the core name and, if it has not seen it before, issues a create request for that core before attempting to add the document (via SolrJ). I have a pool of MyIndexers that run in parallel, taking documents from a queue and adding them via the add method on the SolrServer instance corresponding to that core (exactly one per core exists). Each core is in a separate data directory. My timeouts are set as such: 15000 25000 I remove the index directories, start the server, check that no locks exist, and generate ~500 documents spread across 5 cores for the MyIndexers to handle. Each time, I see one or more exceptions with a message like Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_... When the indexers have completed, no lock is left over. There is no discernible pattern as far as when the exception occurs (ie, it does not tend to happen on the first or last or any particular document). Interestingly, this problem does not happen when I have only a single MyIndexer, or if I have a pool of MyIndexers and am running in single core mode. I've looked at the other posts from users getting this exception but it always seemed to be a different case, such as the server having crashed previously and a lock file being left over. -- View this message in context: http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html Sent from the Solr - User mailing list archive at Nabble.com.
Question About Solr Cores
Hi, I'm building an application that dynamically instantiates a large number of solr cores on a single machine (large would ideally be as high as I can get it, in the millions, if it is possible to do so without significant performance degradation and/or system failure). I already tried this same use case as a single-core index and found that as my index grew large, performance became devastatingly slow, which I have so far not seen in the multicore setup. What I have seen, however, is that the number of open FDs steadily increases with the number of cores opened and files indexed, until I hit whatever upper bound happens to be set (currently 100k). Raising machine-imposed limits, using the compound file format, etc are only holdovers. I was thinking it would be nice if I could keep some kind of MRU cache of cores such that Solr only keeps open resources for the cores in the cache, but I'm not sure if this is allowed. I saw that SolrCore has a close() function, but if my understanding is correct, that isn't exposed to the client. Would anyone know if there are any ways to de/reallocate resources for different cores at runtime? Thanks, Dan -- View this message in context: http://www.nabble.com/Question-About-Solr-Cores-tp24432008p24432008.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems Issuing Parallel Queries with SolrJ
I have a running Solr (1.3) server that I want to query with SolrJ, and I'm running a benchmark that uses a pool of 10 threads to issue 1000 random queries to the server. Each query executes 7 searches in parallel. My first attempt was to use a single instance of CommonsHttpSolrServer, using the default MultiThreadedHttpConnectionManager, but (as mentioned in SOLR-861), I quickly ran out of memory as every created thread blocked indefinitely on MultiThreadedHttpConnectionManager. Then I tried creating a pool of CommonsHttpSolrServer in which each SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but execution of my test resulted in the following: Caused by: java.lang.IllegalStateException: Unexpected release of an unknown connection. at org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394) Looking into the httpclient code, I can see that this exception is only thrown when the connection manager attempts to release an HttpConnection that it is not currently referencing, but since I instantiate connection managers on a per-thread basis I'm not sure what would cause that. I assume that SolrJ must be used by someone to execute parallel queries; is there something obvious (or not) that I'm missing? -- View this message in context: http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems Issuing Parallel Queries with SolrJ
Actually, it's obvious that the second case wouldn't work after looking at SimpleHttpConnectionManager. So my question boils down to being able to use a single CommonsHttpSolrServer in a multithreaded fashion. danben wrote: > > I have a running Solr (1.3) server that I want to query with SolrJ, and > I'm running a benchmark that uses a pool of 10 threads to issue 1000 > random queries to the server. Each query executes 7 searches in parallel. > > My first attempt was to use a single instance of CommonsHttpSolrServer, > using the default MultiThreadedHttpConnectionManager, but (as mentioned in > SOLR-861), I quickly ran out of memory as every created thread blocked > indefinitely on MultiThreadedHttpConnectionManager. > > Then I tried creating a pool of CommonsHttpSolrServer in which each > SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but > execution of my test resulted in the following: > > Caused by: java.lang.IllegalStateException: Unexpected release of an > unknown connection. > at > org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225) > at > org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) > at > org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) > at > org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394) > > Looking into the httpclient code, I can see that this exception is only > thrown when the connection manager attempts to release an HttpConnection > that it is not currently referencing, but since I instantiate connection > managers on a per-thread basis I'm not sure what would cause that. > > I assume that SolrJ must be used by someone to execute parallel queries; > is there something obvious (or not) that I'm missing? > -- View this message in context: http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrException - Lock obtain timed out, no leftover locks
Sorry, I thought I had removed this posting. I am running Solr over HTTP, but (as you surmised) I had a concurrency bug. Thanks for the response. Dan hossman wrote: > > > My only guess here is that you are using SolrJ in an embedded sense, not > via HTTP, and something about the code you have in your MyIndexers class > causes two differnet threads to attempt to create two differnet cores (or > perhaps the same core) using identical data directories at the same time. > > either that: or maybe there is a bug in the CoreAdmin functionality for > creating/opening a new core resulting from improper synchronization. > > it would help to have the full stack trace of hte Lock timed out > exception, and to know more details about how exactly your code goes about > creating new cores on the fly. > > : I'm running Solr 1.3.0 in multicore mode and feeding it data from which > the > : core name is inferred from a specific field. My service extracts the > core > : name and, if it has not seen it before, issues a create request for that > : core before attempting to add the document (via SolrJ). I have a pool > of > : MyIndexers that run in parallel, taking documents from a queue and > adding > : them via the add method on the SolrServer instance corresponding to that > : core (exactly one per core exists). Each core is in a separate data > : directory. My timeouts are set as such: > : > : 15000 > : 25000 > : > : I remove the index directories, start the server, check that no locks > exist, > : and generate ~500 documents spread across 5 cores for the MyIndexers to > : handle. Each time, I see one or more exceptions with a message like > : > : > Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_... > : > : When the indexers have completed, no lock is left over. There is no > : discernible pattern as far as when the exception occurs (ie, it does not > : tend to happen on the first or last or any particular document). > : > : Interestingly, this problem does not happen when I have only a single > : MyIndexer, or if I have a pool of MyIndexers and am running in single > core > : mode. > : > : I've looked at the other posts from users getting this exception but it > : always seemed to be a different case, such as the server having crashed > : previously and a lock file being left over. > : > : -- > : View this message in context: > http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html > : Sent from the Solr - User mailing list archive at Nabble.com. > : > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24616034.html Sent from the Solr - User mailing list archive at Nabble.com.
Strange Behavior When Using CSVRequestHandler
The problem: Not all of the documents that I expect to be indexed are showing up in the index. The background: I start off with an empty index based on a schema with a single field named 'query', marked as unique and using the following analyzer: My input is a utf-8 encoded file with one sentence per line. Its total size is about 60MB. I would like each line of the file to correspond to a single document in the solr index. If I print the number of unique lines in the file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing the total number of lines in the file gives me around 2.7M. I use the following to start indexing: curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=\' When this command completes, I see numDocs is approximately 470k (which is what I find strange) and maxDocs is approximately 890k (which is fine since I know I have around 700k duplicates). Even more confusing is that if I run this exact command a second time without performing any other operations, numDocs goes up to around 610k, and a third time brings it up to about 750k. Can anyone tell me what might cause Solr not to index everything in my input file the first time, and why it would be able to index new documents the second and third times? I also have this line in solrconfig.xml, if it matters: Thanks, Dan -- View this message in context: http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange Behavior When Using CSVRequestHandler
Erick - thanks very much, all of this makes sense. But the one thing I still find puzzling is the fact that re-adding the file a second, third, fourth etc time causes numDocs to increase, and ALWAYS by the same amount (141,645). Any ideas as to what could cause that? Dan Erick Erickson wrote: > > I think the root of your problem is that unique fields should NOT > be multivalued. See > http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key) > > <http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)>In > this case, since you're tokenizing, your "query" field is > implicitly multi-valued, I don't know what the behavior will be. > > But there's another problem: > All the filters in your analyzer definition will mess up the > correspondence between the Unix uniq and numDocs even > if you got by the above. I.e > > StopFilter would make the lines "a problem" and "the problem" identical. > WordDelimiter would do all kinds of interesting things > LowerCaseFilter would make "Myproblem" and "myproblem" identical. > RemoveDuplicatesFilter would make "interesting interesting" and > "interesting" identical > > You could define a second field, make *that* one unique and NOT analyzer > it in any way... > > You could hash your sentences and define the hash as your unique key. > > You could > > HTH > Erick > > On Wed, Jan 6, 2010 at 1:06 PM, danben wrote: > >> >> The problem: >> >> Not all of the documents that I expect to be indexed are showing up in >> the >> index. >> >> The background: >> >> I start off with an empty index based on a schema with a single field >> named >> 'query', marked as unique and using the following analyzer: >> >> >> >>> words="stopwords.txt" enablePositionIncrements="true"/> >>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> >> >> >> >> My input is a utf-8 encoded file with one sentence per line. Its total >> size >> is about 60MB. I would like each line of the file to correspond to a >> single >> document in the solr index. If I print the number of unique lines in the >> file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing >> the total number of lines in the file gives me around 2.7M. >> >> I use the following to start indexing: >> >> curl >> ' >> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape= >> \' >> >> When this command completes, I see numDocs is approximately 470k (which >> is >> what I find strange) and maxDocs is approximately 890k (which is fine >> since >> I know I have around 700k duplicates). Even more confusing is that if I >> run >> this exact command a second time without performing any other operations, >> numDocs goes up to around 610k, and a third time brings it up to about >> 750k. >> >> Can anyone tell me what might cause Solr not to index everything in my >> input >> file the first time, and why it would be able to index new documents the >> second and third times? >> >> I also have this line in solrconfig.xml, if it matters: >> >> > multipartUploadLimitInKB="2048" /> >> >> Thanks, >> Dan >> >> -- >> View this message in context: >> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html Sent from the Solr - User mailing list archive at Nabble.com.