Problem After Modifying CoreContainer

2009-07-31 Thread danben

Hi,

I'm developing an application that requires a large number of cores, and
since lazy loading / LRU caching won't be available until 1.5, I decided to
modify CoreContainer to hold me over.

Another requirement is that multiple Solr instances can access the same
cores (on NAS, for instance), so the approach I'm using is to maintain a
local registry / load balancer that assigns "active" cores to different
machines and, when a machine has exceeded its limit, unload cores in LRU
order.

The modifications to CoreContainer are as follows:  to avoid having to issue
"create" requests every time we need to load an inactive core, getCore will
attempt to create/open any core that it doesn't find in the cores map,
unless name is "" or "admin", in which case it returns null as per the
original implementation.  The create function is overloaded to take a core
name and creates a CoreDescriptor using some defaults added to solr.xml.

Everything works fine until I try to make a core unload request, at which
point I see the following:
org.apache.solr.common.SolrException: Not Found

Not Found

request:
/solr/NewUser0/admin/cores?action=UNLOAD&core=NewUser0&wt=javabin&version=2.2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)

So, my guess is that my modifications are keeping admin cores from being
created properly.  Or maybe I don't know what I'm talking about.  I stepped
through execution with a debugger and watched it bounce around
SolrDispatchHandler.doFilter() before giving up and forwarding the request
to the RequestDispatcher, but all I could tell from that was that it wanted
something it couldn't find.  Can anyone shed some light on what my mods
might be keeping Solr from doing that would cause this problem?

Code pasted below.  And don't mind the weird locking, this happens even with
a single thread.

Thanks,
Dan

  public SolrCore getCore(String name) {
  ReentrantReadWriteLock lock = getCoreLock(name);
  try {
  lock.readLock().lock();
  SolrCore core = cores.get(name);
  if (core != null) {
  core.open();  // increment the ref count while still
synchronized
  return core;
  } else if ("".equals(name) || "admin".equals(name)) {
  return null;
  } else {
  try {
  lock.readLock().unlock();
  lock.writeLock().lock();
  SolrCore core1 = cores.get(name);
  if (core1 != null) {
  return core1;
  }
  log.info("Autocreating core: '" + name + "'");
  core = create(name);
  cores.put(name, core);
  core.open();
  return core;
  } catch (IOException e) {
  log.error("Autocreating core '" + name + "'", e);
  return null;
  } catch (ParserConfigurationException e) {
  log.error("Autocreating core '" + name + "'", e);
  return null;
  } catch (SAXException e) {
  log.error("Autocreating core '" + name + "'", e);
  return null;
  } finally {
  lock.readLock().lock();
  lock.writeLock().unlock();
  }
  }
  } finally {
  lock.readLock().unlock();
  }
  }

  public SolrCore create(String coreName) throws IOException,
ParserConfigurationException, SAXException {
  if (defaultConfigFile == null || defaultSchemaFile == null) {
  throw new RuntimeException("Cannot use autocreate unless both a
default configuration file and a default schema file are specified");
  }
  CoreDescriptor dcore = new CoreDescriptor(this, coreName,
getInstanceDir(coreName));
  dcore.setConfigName(defaultConfigFile);
  dcore.setSchemaName(defaultSchemaFile);
  return create(dcore);
  }

// Eventually this will be overridden to do some intelligent management of a
directory hierarchy so we //don't have hundreds of thousands of cores in the
same directory
public String getInstanceDir(String coreName) {
  return coreName;
}

-- 
View this message in context: 
http://www.nabble.com/Problem-After-Modifying-CoreContainer-tp24762199p24762199.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem After Modifying CoreContainer

2009-07-31 Thread danben

And, re-examining the URL, this is clearly my fault for improper use of
SolrJ.  Please ignore.


danben wrote:
> 
> Hi,
> 
> I'm developing an application that requires a large number of cores, and
> since lazy loading / LRU caching won't be available until 1.5, I decided
> to modify CoreContainer to hold me over.
> 
> Another requirement is that multiple Solr instances can access the same
> cores (on NAS, for instance), so the approach I'm using is to maintain a
> local registry / load balancer that assigns "active" cores to different
> machines and, when a machine has exceeded its limit, unload cores in LRU
> order.
> 
> The modifications to CoreContainer are as follows:  to avoid having to
> issue "create" requests every time we need to load an inactive core,
> getCore will attempt to create/open any core that it doesn't find in the
> cores map, unless name is "" or "admin", in which case it returns null as
> per the original implementation.  The create function is overloaded to
> take a core name and creates a CoreDescriptor using some defaults added to
> solr.xml.
> 
> Everything works fine until I try to make a core unload request, at which
> point I see the following:
> org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request:
> /solr/NewUser0/admin/cores?action=UNLOAD&core=NewUser0&wt=javabin&version=2.2
>   at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
> 
> So, my guess is that my modifications are keeping admin cores from being
> created properly.  Or maybe I don't know what I'm talking about.  I
> stepped through execution with a debugger and watched it bounce around
> SolrDispatchHandler.doFilter() before giving up and forwarding the request
> to the RequestDispatcher, but all I could tell from that was that it
> wanted something it couldn't find.  Can anyone shed some light on what my
> mods might be keeping Solr from doing that would cause this problem?
> 
> Code pasted below.  And don't mind the weird locking, this happens even
> with a single thread.
> 
> Thanks,
> Dan
> 
>   public SolrCore getCore(String name) {
>   ReentrantReadWriteLock lock = getCoreLock(name);
>   try {
>   lock.readLock().lock();
>   SolrCore core = cores.get(name);
>   if (core != null) {
>   core.open();  // increment the ref count while still
> synchronized
>   return core;
>   } else if ("".equals(name) || "admin".equals(name)) {
>   return null;
>   } else {
>   try {
>   lock.readLock().unlock();
>   lock.writeLock().lock();
>   SolrCore core1 = cores.get(name);
>   if (core1 != null) {
>   return core1;
>   }
>   log.info("Autocreating core: '" + name + "'");
>   core = create(name);
>   cores.put(name, core);
>   core.open();
>   return core;
>   } catch (IOException e) {
>   log.error("Autocreating core '" + name + "'", e);
>   return null;
>   } catch (ParserConfigurationException e) {
>   log.error("Autocreating core '" + name + "'", e);
>   return null;
>   } catch (SAXException e) {
>   log.error("Autocreating core '" + name + "'", e);
>   return null;
>   } finally {
>   lock.readLock().lock();
>   lock.writeLock().unlock();
>   }
>   }
>   } finally {
>   lock.readLock().unlock();
>   }
>   }
> 
>   public SolrCore create(String coreName) throws IOException,
> ParserConfigurationException, SAXException {
>   if (defaultConfigFile == null || defaultSchemaFile == null) {
>   throw new RuntimeException("Cannot use autocreate unless both a
> default configuration file and a default schema file are specified");
>   }
>   CoreDescriptor dcore = new CoreDescriptor(this, coreName,
> getInstanceDir(coreName));
>   dcore.setConfigName(defaultConfigFile);
>   dcore.setSchemaName(defaultSchemaFile);
>   return create(dcore);
>   }
> 
> // Eventually this will be overridden to do some intelligent management of
> a directory hierarchy so we //don't have hundreds of thousands of cores in
> the same directory
> public String getInstanceDir(String coreName) {
>   return coreName;
> }
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-After-Modifying-CoreContainer-tp24762199p24763149.html
Sent from the Solr - User mailing list archive at Nabble.com.



Faceted Search on Dynamic Fields?

2009-09-25 Thread danben

I'm trying to perform a faceted query with the facet field referencing a
field that is not in the schema but matches a dynamicField with its suffix. 
The query returns results but for some reason the facet list is always
empty.  When I change the facet field to one that is explicitly named in the
schema I get the proper results.  Is this expected behavior?  I wasn't able
to find anything in the docs about dynamic fields wrt faceting.

One other thing I thought might have been causing the problem is that the
values in this field are mostly distinct (that won't be the case in the
actual application, I'm just doing it this way now to see how faceted
queries behave).  However, when I performed the same query with a static
field with lots of distinct values I just got an OutOfMemoryError, which
leads me back to my original hypothesis.

So, is it the case that faceted queries are not permitted on dynamic facet
fields, and if so, is there any workaround?
-- 
View this message in context: 
http://www.nabble.com/Faceted-Search-on-Dynamic-Fields--tp25612887p25612887.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Faceted Search on Dynamic Fields?

2009-09-25 Thread danben

Also, here is the field definition in the schema




-- 
View this message in context: 
http://www.nabble.com/Faceted-Search-on-Dynamic-Fields--tp25612887p25612936.html
Sent from the Solr - User mailing list archive at Nabble.com.



SolrException - Lock obtain timed out, no leftover locks

2009-07-08 Thread danben

Hi,

I'm running Solr 1.3.0 in multicore mode and feeding it data from which the
core name is inferred from a specific field.  My service extracts the core
name and, if it has not seen it before, issues a create request for that
core before attempting to add the document (via SolrJ).  I have a pool of
MyIndexers that run in parallel, taking documents from a queue and adding
them via the add method on the SolrServer instance corresponding to that
core (exactly one per core exists).  Each core is in a separate data
directory.  My timeouts are set as such:

15000
25000

I remove the index directories, start the server, check that no locks exist,
and generate ~500 documents spread across 5 cores for the MyIndexers to
handle.  Each time, I see one or more exceptions with a message like 

Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_...

When the indexers have completed, no lock is left over.  There is no
discernible pattern as far as when the exception occurs (ie, it does not
tend to happen on the first or last or any particular document).

Interestingly, this problem does not happen when I have only a single
MyIndexer, or if I have a pool of MyIndexers and am running in single core
mode.  

I've looked at the other posts from users getting this exception but it
always seemed to be a different case, such as the server having crashed
previously and a lock file being left over.

-- 
View this message in context: 
http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html
Sent from the Solr - User mailing list archive at Nabble.com.



Question About Solr Cores

2009-07-10 Thread danben

Hi,

I'm building an application that dynamically instantiates a large number of
solr cores on a single machine (large would ideally be as high as I can get
it, in the millions, if it is possible to do so without significant
performance degradation and/or system failure).  I already tried this same
use case as a single-core index and found that as my index grew large,
performance became devastatingly slow, which I have so far not seen in the
multicore setup.

What I have seen, however, is that the number of open FDs steadily increases
with the number of cores opened and files indexed, until I hit whatever
upper bound happens to be set (currently 100k).  Raising machine-imposed
limits, using the compound file format, etc are only holdovers.  I was
thinking it would be nice if I could keep some kind of MRU cache of cores
such that Solr only keeps open resources for the cores in the cache, but I'm
not sure if this is allowed.  I saw that SolrCore has a close() function,
but if my understanding is correct, that isn't exposed to the client.

Would anyone know if there are any ways to de/reallocate resources for
different cores at runtime?

Thanks,
Dan
-- 
View this message in context: 
http://www.nabble.com/Question-About-Solr-Cores-tp24432008p24432008.html
Sent from the Solr - User mailing list archive at Nabble.com.



Problems Issuing Parallel Queries with SolrJ

2009-07-16 Thread danben

I have a running Solr (1.3) server that I want to query with SolrJ, and I'm
running a benchmark that uses a pool of 10 threads to issue 1000 random
queries to the server.  Each query executes 7 searches in parallel.

My first attempt was to use a single instance of CommonsHttpSolrServer,
using the default MultiThreadedHttpConnectionManager, but (as mentioned in
SOLR-861), I quickly ran out of memory as every created thread blocked
indefinitely on MultiThreadedHttpConnectionManager.

Then I tried creating a pool of CommonsHttpSolrServer in which each
SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but
execution of my test resulted in the following:

Caused by: java.lang.IllegalStateException: Unexpected release of an unknown
connection.
at
org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225)
at
org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
at
org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
at
org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394)

Looking into the httpclient code, I can see that this exception is only
thrown when the connection manager attempts to release an HttpConnection
that it is not currently referencing, but since I instantiate connection
managers on a per-thread basis I'm not sure what would cause that.

I assume that SolrJ must be used by someone to execute parallel queries; is
there something obvious (or not) that I'm missing?
-- 
View this message in context: 
http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522927.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problems Issuing Parallel Queries with SolrJ

2009-07-16 Thread danben

Actually, it's obvious that the second case wouldn't work after looking at
SimpleHttpConnectionManager.  So my question boils down to being able to use
a single CommonsHttpSolrServer in a multithreaded fashion.


danben wrote:
> 
> I have a running Solr (1.3) server that I want to query with SolrJ, and
> I'm running a benchmark that uses a pool of 10 threads to issue 1000
> random queries to the server.  Each query executes 7 searches in parallel.
> 
> My first attempt was to use a single instance of CommonsHttpSolrServer,
> using the default MultiThreadedHttpConnectionManager, but (as mentioned in
> SOLR-861), I quickly ran out of memory as every created thread blocked
> indefinitely on MultiThreadedHttpConnectionManager.
> 
> Then I tried creating a pool of CommonsHttpSolrServer in which each
> SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but
> execution of my test resulted in the following:
> 
> Caused by: java.lang.IllegalStateException: Unexpected release of an
> unknown connection.
>   at
> org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225)
>   at
> org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
>   at
> org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
>   at
> org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186)
>   at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394)
> 
> Looking into the httpclient code, I can see that this exception is only
> thrown when the connection manager attempts to release an HttpConnection
> that it is not currently referencing, but since I instantiate connection
> managers on a per-thread basis I'm not sure what would cause that.
> 
> I assume that SolrJ must be used by someone to execute parallel queries;
> is there something obvious (or not) that I'm missing?
> 

-- 
View this message in context: 
http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522973.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrException - Lock obtain timed out, no leftover locks

2009-07-22 Thread danben

Sorry, I thought I had removed this posting.  I am running Solr over HTTP,
but (as you surmised) I had a concurrency bug.  Thanks for the response.

Dan


hossman wrote:
> 
> 
> My only guess here is that you are using SolrJ in an embedded sense, not 
> via HTTP, and something about the code you have in your MyIndexers class 
> causes two differnet threads to attempt to create two differnet cores (or 
> perhaps the same core) using identical data directories at the same time.
> 
> either that: or maybe there is a bug in the CoreAdmin functionality for 
> creating/opening a new core resulting from improper synchronization.
> 
> it would help to have the full stack trace of hte Lock timed out 
> exception, and to know more details about how exactly your code goes about 
> creating new cores on the fly.
> 
> : I'm running Solr 1.3.0 in multicore mode and feeding it data from which
> the
> : core name is inferred from a specific field.  My service extracts the
> core
> : name and, if it has not seen it before, issues a create request for that
> : core before attempting to add the document (via SolrJ).  I have a pool
> of
> : MyIndexers that run in parallel, taking documents from a queue and
> adding
> : them via the add method on the SolrServer instance corresponding to that
> : core (exactly one per core exists).  Each core is in a separate data
> : directory.  My timeouts are set as such:
> : 
> : 15000
> : 25000
> : 
> : I remove the index directories, start the server, check that no locks
> exist,
> : and generate ~500 documents spread across 5 cores for the MyIndexers to
> : handle.  Each time, I see one or more exceptions with a message like 
> : 
> :
> Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_...
> : 
> : When the indexers have completed, no lock is left over.  There is no
> : discernible pattern as far as when the exception occurs (ie, it does not
> : tend to happen on the first or last or any particular document).
> : 
> : Interestingly, this problem does not happen when I have only a single
> : MyIndexer, or if I have a pool of MyIndexers and am running in single
> core
> : mode.  
> : 
> : I've looked at the other posts from users getting this exception but it
> : always seemed to be a different case, such as the server having crashed
> : previously and a lock file being left over.
> : 
> : -- 
> : View this message in context:
> http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html
> : Sent from the Solr - User mailing list archive at Nabble.com.
> : 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24616034.html
Sent from the Solr - User mailing list archive at Nabble.com.



Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread danben

The problem:

Not all of the documents that I expect to be indexed are showing up in the
index.

The background:

I start off with an empty index based on a schema with a single field named
'query', marked as unique and using the following analyzer:









My input is a utf-8 encoded file with one sentence per line.  Its total size
is about 60MB.  I would like each line of the file to correspond to a single
document in the solr index.  If I print the number of unique lines in the
file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
the total number of lines in the file gives me around 2.7M.

I use the following to start indexing:

curl
'http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=\'

When this command completes, I see numDocs is approximately 470k (which is
what I find strange) and maxDocs is approximately 890k (which is fine since
I know I have around 700k duplicates).  Even more confusing is that if I run
this exact command a second time without performing any other operations,
numDocs goes up to around 610k, and a third time brings it up to about 750k.

Can anyone tell me what might cause Solr not to index everything in my input
file the first time, and why it would be able to index new documents the
second and third times?

I also have this line in solrconfig.xml, if it matters:



Thanks,
Dan

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange Behavior When Using CSVRequestHandler

2010-01-07 Thread danben

Erick - thanks very much, all of this makes sense.  But the one thing I still
find puzzling is the fact that re-adding the file a second, third, fourth
etc time causes numDocs to increase, and ALWAYS by the same amount
(141,645).  Any ideas as to what could cause that?

Dan


Erick Erickson wrote:
> 
> I think the root of your problem is that unique fields should NOT
> be multivalued. See
> http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
> 
> <http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)>In
> this case, since you're tokenizing, your "query" field is
> implicitly multi-valued, I don't know what the behavior will be.
> 
> But there's another problem:
> All the filters in your analyzer definition will mess up the
> correspondence between the Unix uniq and numDocs even
> if you got by the above. I.e
> 
> StopFilter would make the lines "a problem" and "the problem" identical.
> WordDelimiter would do all kinds of interesting things
> LowerCaseFilter would make "Myproblem" and "myproblem" identical.
> RemoveDuplicatesFilter would make "interesting interesting" and
> "interesting" identical
> 
> You could define a second field, make *that* one unique and NOT analyzer
> it in any way...
> 
> You could hash your sentences and define the hash as your unique key.
> 
> You could
> 
> HTH
> Erick
> 
> On Wed, Jan 6, 2010 at 1:06 PM, danben  wrote:
> 
>>
>> The problem:
>>
>> Not all of the documents that I expect to be indexed are showing up in
>> the
>> index.
>>
>> The background:
>>
>> I start off with an empty index based on a schema with a single field
>> named
>> 'query', marked as unique and using the following analyzer:
>>
>> 
>>
>>> words="stopwords.txt" enablePositionIncrements="true"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>
>> 
>>
>> My input is a utf-8 encoded file with one sentence per line.  Its total
>> size
>> is about 60MB.  I would like each line of the file to correspond to a
>> single
>> document in the solr index.  If I print the number of unique lines in the
>> file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
>> the total number of lines in the file gives me around 2.7M.
>>
>> I use the following to start indexing:
>>
>> curl
>> '
>> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=
>> \'
>>
>> When this command completes, I see numDocs is approximately 470k (which
>> is
>> what I find strange) and maxDocs is approximately 890k (which is fine
>> since
>> I know I have around 700k duplicates).  Even more confusing is that if I
>> run
>> this exact command a second time without performing any other operations,
>> numDocs goes up to around 610k, and a third time brings it up to about
>> 750k.
>>
>> Can anyone tell me what might cause Solr not to index everything in my
>> input
>> file the first time, and why it would be able to index new documents the
>> second and third times?
>>
>> I also have this line in solrconfig.xml, if it matters:
>>
>> > multipartUploadLimitInKB="2048" />
>>
>> Thanks,
>> Dan
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html
Sent from the Solr - User mailing list archive at Nabble.com.