Dataimport handler exception when migrating from 4.4 to 4.6. Help needed

2013-12-22 Thread William Pierce
Hello, all:

My configurations works nicely with solr 4.4. I am encountering a configuration 
error when I try to upgrade from 4.4 to 4.6.  All I did was the following:

a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib 
folder. I am using version 6.0.36 of tomcat.
b) I replaced the solr-dataimporthandler-4.4.0.jar and 
solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6 counterparts 
in the collection/lib folder.

I restarted tomcat.   I get the following stack trace (full log is also given 
below) – there are no other warnings/errors in my log.  I have gone through the 
4.5 changes to see if I needed to add/modify my DIH configuration – but I am 
stymied.  Any help will be greatly appreciated.

ERROR - 2013-12-22 08:05:09.824; 
org.apache.solr.handler.dataimport.DataImportHandler; Exception while loading 
DataImporter
java.lang.NoSuchMethodError: 
org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema;
at 
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:103)
at 
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:103)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:616)
at org.apache.solr.core.SolrCore.(SolrCore.java:816)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

The full solr.log (until the exception) is as follows:

INFO  - 2013-12-22 08:05:08.261; org.apache.solr.servlet.SolrDispatchFilter; 
SolrDispatchFilter.init()
INFO  - 2013-12-22 08:05:08.277; org.apache.solr.core.SolrResourceLoader; Using 
JNDI solr.home: c:\tomcatweb\postingsmaster
INFO  - 2013-12-22 08:05:08.292; 
org.apache.solr.core.CoreContainer$Initializer; looking for solr config file: 
c:\tomcatweb\postingsmaster\solr.xml
INFO  - 2013-12-22 08:05:08.292; org.apache.solr.core.CoreContainer; New 
CoreContainer 20804623
INFO  - 2013-12-22 08:05:08.292; 
org.apache.solr.core.CoreContainer$Initializer; no solr.xml found. using 
default old-style solr.xml
INFO  - 2013-12-22 08:05:08.292; org.apache.solr.core.CoreContainer; Loading 
CoreContainer using Solr Home: 'c:\tomcatweb\postingsmaster\'
INFO  - 2013-12-22 08:05:08.292; org.apache.solr.core.SolrResourceLoader; new 
SolrResourceLoader for directory: 'c:\tomcatweb\postingsmaster\'
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting 
socketTimeout to: 0
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting urlScheme 
to: http://
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting connTimeout 
to: 0
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting 
maxConnectionsPerHost to: 20
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting corePoolSize 
to: 0
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting 
maximumPoolSize to: 2147483647
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting 
maxThreadIdleTime to: 5
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting sizeOfQueue 
to: -1
INFO  - 2013-12-22 08:05:08.605; 
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting 
fairnessPolicy to: false
INFO  - 2013-12-22 08:05:08.621; 
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
INFO  - 2013-12-22 08:05:08.761; org.apache.solr.core.CoreContainer; 
Registering Log Listener
INFO  - 2013-12-22 08:05:08.792; org.apache.solr.core.CoreContainer; Creating 
SolrCore 'collection1' using instanceDir: 
c:\tomcatweb\postingsmaster\collection1
INFO  - 2013-12-22 08:05:08.792; org.apache.solr.core.SolrResourceLoader; new 
SolrResourceLoader for directory: 'c:\tomcatweb\postingsmaster\collection1\'
INFO  - 2013-12-22 08:05:08.792; org.apache.solr.core.SolrResourceLoader; 
Adding 
'file:/c:/tomcatweb/postingsmaster/col

Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-08 Thread William Pierce
Hi, 

I have gotten solr 4.3 up and running on tomcat7/windows7.  I have added the 
two dataimport handler jars (found in the dist folder of my solr 4.3 download) 
to the tomcat/lib folder (where I also placed the solr.war).   

Then I added the following line to my solrconfig.xml:



  dih-config.xml



When I start tomcat, I get the stack trace shown below (commenting out the 
above lines causes tomcat & solr to start up just fine).  

ERROR - 2013-05-08 10:43:48.185; org.apache.solr.core.CoreContainer; Unable to 
create core: collection1
org.apache.solr.common.SolrException: org/apache/solr/util/plugin/SolrCoreAware
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/util/plugin/SolrCoreAware
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1700)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:154)
at org.apache.solr.core.SolrCore.(SolrCore.java:758)
... 13 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.util.plugin.SolrCoreAware
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 40 more
ERROR - 2013-05-08 10:43:48.189; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: Unable to create core: collection1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: 
org/apache/solr/util/plugin/SolrCoreAware
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.ap

Re: Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-08 Thread William Pierce
Thanks, Alex.  I have tried placing the jars in a folder under solrhome/lib 
or under the instanceDir/lib with appropriate declarations in the 
solrconfig.xml.  I can see the jars being loaded in the logs.  But neither 
configuration seems to work.


Bill

-Original Message- 
From: Alexandre Rafalovitch

Sent: Wednesday, May 08, 2013 11:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 fails in startup when dataimporthandler declaration is 
included in solrconfig.xml


Could be classloader issue. E.g. the jars in tomcat/lib not visible to
whatever is trying to load DIH. Have you tried putting those jars
somewhere else and using "lib" directive in solrconfig.xml instead to
point to them?

Regards,
  Alex.
On Wed, May 8, 2013 at 2:07 PM, William Pierce  
wrote:
I have gotten solr 4.3 up and running on tomcat7/windows7.  I have added 
the two dataimport handler jars (found in the dist folder of my solr 4.3 
download) to the tomcat/lib folder (where I also placed the solr.war).


Then I added the following line to my solrconfig.xml:

class="org.apache.solr.handler.dataimport.DataImportHandler">


  dih-config.xml



When I start tomcat, I get the stack trace shown below (commenting out the 
above lines causes tomcat & solr to start up just fine).




Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book) 



Re: Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-08 Thread William Pierce
   at org.apache.solr.core.SolrCore.(SolrCore.java:618)
   at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)

   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
   ... 10 more
Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/util/plugin/SolrCoreAware

   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(Unknown Source)
   at java.security.SecureClassLoader.defineClass(Unknown Source)
   at java.net.URLClassLoader.defineClass(Unknown Source)
   at java.net.URLClassLoader.access$100(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Unknown Source)
   at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1700)

   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Unknown Source)
   at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
   at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)

   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
   at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
   at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:154)

   at org.apache.solr.core.SolrCore.(SolrCore.java:758)
   ... 13 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.util.plugin.SolrCoreAware

   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   ... 40 more

Thanks,

Bill


-Original Message- 
From: Jan Høydahl

Sent: Wednesday, May 08, 2013 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 fails in startup when dataimporthandler declaration is 
included in solrconfig.xml


Why did you place solr.war in tomcat/lib?

Can you detail the specific errors you get when you place your DIH jars in 
solr-home/lib or instanceDir/lib?


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

8. mai 2013 kl. 21:15 skrev William Pierce :

Thanks, Alex.  I have tried placing the jars in a folder under 
solrhome/lib or under the instanceDir/lib with appropriate declarations in 
the solrconfig.xml.  I can see the jars being loaded in the logs.  But 
neither configuration seems to work.


Bill

-Original Message- From: Alexandre Rafalovitch
Sent: Wednesday, May 08, 2013 11:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 fails in startup when dataimporthandler declaration 
is included in solrconfig.xml


Could be classloader issue. E.g. the jars in tomcat/lib not visible to
whatever is trying to load DIH. Have you tried putting those jars
somewhere else and using "lib" directive in solrconfig.xml instead to
point to them?

Regards,
 Alex.
On Wed, May 8, 2013 at 2:07 PM, William Pierce  
wrote:
I have gotten solr 4.3 up and running on tomcat7/windows7.  I have added 
the two dataimport handler jars (found in the dist folder of my solr 4.3 
download) to the tomcat/lib folder (where I also placed the solr.war).


Then I added the following line to my solrconfig.xml:

class="org.apache.solr.handler.dataimport.DataImportHandler">

   
 dih-config.xml
   


When I start tomcat, I get the stack trace shown below (commenting out 
the above lines causes tomcat & solr to start up just fine).




Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)




Re: Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-09 Thread William Pierce
I got this to work (thanks, Jan, and all).  It turns out that DIH jars need 
to be included explicitly by specifying in solrconfig.xml or placed in some 
default path under solr.home.  I placed these jars in instanceDir/lib and it 
worked.  Previously I had reported it as not working - this was because I 
had mistakenly left a copy of the jars under tomcat/lib.


Bill

-Original Message- 
From: Jan Høydahl

Sent: Thursday, May 09, 2013 2:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 fails in startup when dataimporthandler declaration is 
included in solrconfig.xml


My question was: When you move DIH libs to Solr's classloader (e.g. 
instanceDir/lib and refer from solrconfig.xml), and remove solr.war from 
tomcat/lib, what error msg do you then get?


Also make sure to delete the old tomcat/webapps/solr folder just to make 
sure you're starting from scratch


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. mai 2013 kl. 01:54 skrev William Pierce :

The reason I placed the solr.war in tomcat/lib was -- I guess -- because 
that's way I had always done it since 1.3 days.  Our tomcat instance(s) 
run nothing other than solr - so that seemed as good a place as any.


The DIH jars that I placed in the tomcat/lib are: 
solr-dataimporthandler-4.3.0.jar and 
solr-dataimporthandler-extras-4.3.0.jar.  Are there any dependent jars 
that also need to be added that I am unaware of?


On the specific errors - I get a stack trace noted in the first email that 
began this thread but repeated here for convenience:


ERROR - 2013-05-08 10:43:48.185; org.apache.solr.core.CoreContainer; 
Unable to create core: collection1
org.apache.solr.common.SolrException: 
org/apache/solr/util/plugin/SolrCoreAware

  at org.apache.solr.core.SolrCore.(SolrCore.java:821)
  at org.apache.solr.core.SolrCore.(SolrCore.java:618)
  at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)

  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/util/plugin/SolrCoreAware

  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(Unknown Source)
  at java.security.SecureClassLoader.defineClass(Unknown Source)
  at java.net.URLClassLoader.defineClass(Unknown Source)
  at java.net.URLClassLoader.access$100(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Unknown Source)
  at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1700)

  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Unknown Source)
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)

  at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
  at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
  at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:154)

  at org.apache.solr.core.SolrCore.(SolrCore.java:758)
  ... 13 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.util.plugin.SolrCoreAware

  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  ... 40 more
ERROR - 2013-05-08 10:43:48.189; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: Unable to create core: 
c

Re: indexing mysql database

2010-10-17 Thread William Pierce
Two suggestions:  a) Noticed that your dih spec in the solrconfig.xml seems 
to to refer to "db-data-config.xml" but you said that your file was 
db-config.xml.   You may want to check this to make sure that your file 
names are correct.  b) what does your log say when you ran the import 
process?


- Bill

-Original Message- 
From: do3do3

Sent: Sunday, October 17, 2010 8:29 AM
To: solr-user@lucene.apache.org
Subject: indexing mysql database


i try to index table in mysql database,
1st i create db-config.xml file which contain

followed by

and defining of table like


2nd i add this field in schema.xml file
and finally decide in solronfig.xml file the db-config.xml file as

   
   db-data-config.xml
   
 
i found index folder which contain only segment.gen & segment_1 files
and when try to search no result i got
any body can present a help ???
thanks in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-mysql-database-tp1719883p1719883.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Advice on updating solr indexes

2009-08-15 Thread William Pierce

Folks:

In our app we index approx 50 M documents every so often.  One of the fields 
in each document is called "CompScore" which is a score that our back-end 
computes for each document.  The computation of this score is heavy-weight 
and is done only approximately once every few days.When documents are 
retrieved during a search we return results sorted by the Solr score first 
and then the CompScore.


The issue we have this:  Every week or so when the back-end routines run to 
compute "CompScore"  we need to delete and insert these 50 M documents into 
the index.   This happens even though the a majority of the documents have 
not changed.


I think there is no way in Solr to simply update a field in the index.

If others have encountered a similar issue,  I'd be interested in hearing 
about their solutions!


Best,

- Bill 



When to optimize?

2009-09-13 Thread William Pierce

Folks:

Are there good rules of thumb for when to optimize?  We have a large index 
consisting of approx 7M documents and we currently have it set to optimize 
once a day.  But sometimes there are very few changes that have been 
committed during a day and it seems like a waste to optimize (esp. since our 
servers are pretty well loaded).


So I was looking to get some good rules of thumb for when it makes sense to 
optimize:   Optimize when x% of the documents have been changed since the 
last optimize or some such.


Any ideas would be greatly appreciated!

-- Bill 



Tips on speeding up indexing needed...

2009-10-10 Thread William Pierce

Folks:

I have a corpus of approx 6 M documents each of approx 4K bytes. 
Currently, the way indexing is set up I read documents from a database and 
issue solr post requests in batches (batches are set up so that the 
maxPostSize of tomcat which is set to 2MB is adhered to).  This means that 
in each batch we write approx 600 or so documents to SOLR.  What I am seeing 
is that I am able to push about 2500 docs per minute or approx 40 or so per 
second.


I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 docs/sec 
have been achieved.  Needless to say I am sure that performance numbers vary 
widely and are dependent on the domain, machine configurations, etc.


I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.

Any tips on what I can do to speed this up?

Thanks,

Bill 



Re: Tips on speeding up indexing needed...

2009-10-10 Thread William Pierce
Oh and one more thing...For historical reasons our apps run using msft 
technologies, so using SolrJ would be next to impossible at the present 
time


Thanks in advance for your help!

-- Bill

--
From: "William Pierce" 
Sent: Saturday, October 10, 2009 5:47 PM
To: 
Subject: Tips on speeding up indexing needed...


Folks:

I have a corpus of approx 6 M documents each of approx 4K bytes. 
Currently, the way indexing is set up I read documents from a database and 
issue solr post requests in batches (batches are set up so that the 
maxPostSize of tomcat which is set to 2MB is adhered to).  This means that 
in each batch we write approx 600 or so documents to SOLR.  What I am 
seeing is that I am able to push about 2500 docs per minute or approx 40 
or so per second.


I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 
docs/sec have been achieved.  Needless to say I am sure that performance 
numbers vary widely and are dependent on the domain, machine 
configurations, etc.


I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.

Any tips on what I can do to speed this up?

Thanks,

Bill



Re: Tips on speeding up indexing needed...

2009-10-11 Thread William Pierce
Thanks, Lance.  I already commit at the end.  I will take a look at the data 
import handler.   Thanks again!


-- Bill

--
From: "Lance Norskog" 
Sent: Saturday, October 10, 2009 7:58 PM
To: 
Subject: Re: Tips on speeding up indexing needed...


A few things off the bat:
1) do not commit until the end.
2) use the DataImportHandler - it runs inside Solr and reads the
database. This cuts out the HTTP transfer/XML xlation overheads.
3) examine your schema. Some of the text analyzers are quite slow.

Solr tips:
http://wiki.apache.org/solr/SolrPerformanceFactors

Lucene tips:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

And, what you don't want to hear: for jobs like this, Solr/Lucene is
disk-bound. The Windows NTFS file system is much slower than what is
available for Linux or the Mac, and these numbers are for those
machines.

Good luck!

Lance Norskog


On Sat, Oct 10, 2009 at 5:57 PM, William Pierce  
wrote:

Oh and one more thing...For historical reasons our apps run using msft
technologies, so using SolrJ would be next to impossible at the present
time

Thanks in advance for your help!

-- Bill

------
From: "William Pierce" 
Sent: Saturday, October 10, 2009 5:47 PM
To: 
Subject: Tips on speeding up indexing needed...


Folks:

I have a corpus of approx 6 M documents each of approx 4K bytes.
Currently, the way indexing is set up I read documents from a database 
and

issue solr post requests in batches (batches are set up so that the
maxPostSize of tomcat which is set to 2MB is adhered to).  This means 
that
in each batch we write approx 600 or so documents to SOLR.  What I am 
seeing
is that I am able to push about 2500 docs per minute or approx 40 or so 
per

second.

I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000
docs/sec have been achieved.  Needless to say I am sure that performance
numbers vary widely and are dependent on the domain, machine 
configurations,

etc.

I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.

Any tips on what I can do to speed this up?

Thanks,

Bill







--
Lance Norskog
goks...@gmail.com



Dynamically compute document scores...

2009-10-13 Thread William Pierce

Folks:

During query time, I want to dynamically compute a document score as 
follows:


  a) Take the SOLR score for the document -- call it S.
  b) Lookup the "business logic" score for this document.  Call it L.
  c) Compute a new score T = func(S, L)
  d) Return the documents sorted by T.

I have looked at function queries but in my limited/quick review of it,  I 
could not see a quick way of doing this.


Is this possible?

Thanks,

- Bill




Re: Tips on speeding up indexing needed...

2009-10-13 Thread William Pierce
OopsMy bad!  I didn't realize that by changing the subject line I was 
still "part" of the thread whose subject I changed!


Sorry folks!  Thanks, Hoss for pointing this out!

- Bill

--
From: "Chris Hostetter" 
Sent: Tuesday, October 13, 2009 11:07 AM
To: 
Subject: Re: Tips on speeding up indexing needed...



: References: <4acb30d2.2010...@umich.edu>
: <69de18140910070109m27e50d2sc82a7c7bdd683...@mail.gmail.com>
: <4acc95a3.5000...@umich.edu>
: 
: <4acfc943.4040...@umich.edu>
: In-Reply-To: <4acfc943.4040...@umich.edu>
: Subject: Tips on speeding up indexing needed...

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss




Adding callback url to data import handler...Is this possible?

2009-10-14 Thread William Pierce
Folks:

I am pretty happy with DIH -- it seems to work very well for my situation.
Thanks!!!

The one issue I see has to do with the fact that I need to keep polling 
<>/dataimport to check if the data import completed successfully.   I need 
to know when/if the import is completed (successfully or otherwise) so that I 
can update appropriate structures in our app.  

What I would like is something like what Google Checkout API offers -- a 
callback URL.  That is, I should be able to pass along a URL to DIH.  Once it 
has completed the import, it can invoke the provided URL.  This provides a 
callback mechanism for those of us who don't have the liberty to change SOLR 
source code.  We can then do the needful upon receiving this callback.

If this functionality is already provided in some form/fashion, I'd love to 
know.

All in all, great functionality that has significantly helped me out!

Cheers,

- Bill

Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread William Pierce
Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I 
mentioned this would require us to write Java code.


Our app(s) are entirely windows/asp.net/C# so while we could add Java in a 
pinch,  we'd prefer to stick to using SOLR using its convenient REST-style 
interfaces which makes no demand on our app environment.


Thanks again for your suggestion!

Cheers,

Bill

--
From: "Avlesh Singh" 
Sent: Wednesday, October 14, 2009 10:59 AM
To: 
Subject: Re: Adding callback url to data import handler...Is this possible?


Had a look at EventListeners in
DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:21 PM, William Pierce 
wrote:



Folks:

I am pretty happy with DIH -- it seems to work very well for my 
situation.

   Thanks!!!

The one issue I see has to do with the fact that I need to keep polling
<>/dataimport to check if the data import completed successfully. 
I
need to know when/if the import is completed (successfully or otherwise) 
so

that I can update appropriate structures in our app.

What I would like is something like what Google Checkout API offers -- a
callback URL.  That is, I should be able to pass along a URL to DIH. 
Once
it has completed the import, it can invoke the provided URL.  This 
provides

a callback mechanism for those of us who don't have the liberty to change
SOLR source code.  We can then do the needful upon receiving this 
callback.


If this functionality is already provided in some form/fashion, I'd love 
to

know.

All in all, great functionality that has significantly helped me out!

Cheers,

- Bill




Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread William Pierce
If the JavaScript support enables me to invoke a URL,  it's really OK with 
me.


Cheers,

- Bill

--
From: "Avlesh Singh" 
Sent: Wednesday, October 14, 2009 11:01 PM
To: 
Subject: Re: Adding callback url to data import handler...Is this possible?



But a callback url is a very specific requirement. We plan to extend
javascript support to the EventListener callback.


I would say the latter is more specific than the former.

People who are comfortable writing JAVA wouldn't need any of these but the
second best thing for others would be a capability to handle it in their 
own

applications. A url can be the simplest way to invoke things in respective
application. Doing it via javascript sounds like a round-about way of 
doing

it.

Cheers
Avlesh

2009/10/15 Noble Paul നോബിള്‍ नोब्ळ् 


I can understand the concern that you do not wish to write Java code .
But a callback url is a very specific requirement. We plan to extend
javascript support to the EventListener callback . Will it help?

On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh  wrote:
> Hmmm ... I think this is a valid use case and it might be a good idea 
> to

> support it in someway.
> I will post this thread on the dev-mailing list to seek opinion.
>
> Cheers
> Avlesh
>
> On Wed, Oct 14, 2009 at 11:39 PM, William Pierce wrote:
>
>> Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I
>> mentioned this would require us to write Java code.
>>
>> Our app(s) are entirely windows/asp.net/C# so while we could add Java
in a
>> pinch,  we'd prefer to stick to using SOLR using its convenient
REST-style
>> interfaces which makes no demand on our app environment.
>>
>> Thanks again for your suggestion!
>>
>> Cheers,
>>
>> Bill
>>
>> --
>> From: "Avlesh Singh" 
>> Sent: Wednesday, October 14, 2009 10:59 AM
>> To: 
>> Subject: Re: Adding callback url to data import handler...Is this
possible?
>>
>>
>>  Had a look at EventListeners in
>>> DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners
>>>
>>> Cheers
>>> Avlesh
>>>
>>> On Wed, Oct 14, 2009 at 11:21 PM, William Pierce <
evalsi...@hotmail.com
>>> >wrote:
>>>
>>>  Folks:
>>>>
>>>> I am pretty happy with DIH -- it seems to work very well for my
>>>> situation.
>>>>   Thanks!!!
>>>>
>>>> The one issue I see has to do with the fact that I need to keep
polling
>>>> <>/dataimport to check if the data import completed 
>>>> successfully.

I
>>>> need to know when/if the import is completed (successfully or
otherwise)
>>>> so
>>>> that I can update appropriate structures in our app.
>>>>
>>>> What I would like is something like what Google Checkout API 
>>>> offers --

a
>>>> callback URL.  That is, I should be able to pass along a URL to DIH.
Once
>>>> it has completed the import, it can invoke the provided URL.  This
>>>> provides
>>>> a callback mechanism for those of us who don't have the liberty to
change
>>>> SOLR source code.  We can then do the needful upon receiving this
>>>> callback.
>>>>
>>>> If this functionality is already provided in some form/fashion, I'd
love
>>>> to
>>>> know.
>>>>
>>>> All in all, great functionality that has significantly helped me 
>>>> out!

>>>>
>>>> Cheers,
>>>>
>>>> - Bill
>>>>
>>>
>>>
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com





Using DIH's special commands....Help needed

2009-10-15 Thread William Pierce
Folks:

I see in the DIH wiki that there are special commands which according to the 
wiki 

"Special commands can be given to DIH by adding certain variables to the row 
returned by any of the components . "

In my use case,  my db contains rows that are marked "PendingDelete".   How do 
I use the $deleteDocByQuery special command to delete these rows using DIH?
In other words,  where/how do I specify this?  

Thanks,

- Bill

Re: Using DIH's special commands....Help needed

2009-10-15 Thread William Pierce
Thanks, Shalin.   I am sorry if I phrased it incorrectly.  Yes,  I want to 
know how to delete documents in the solr index using the $deleteDocByQuery 
special command.   I looked in the wiki doc and could not find out how to do 
this


Sorry if this is self-evident...

Cheers,

- Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 10:03 AM
To: 
Subject: Re: Using DIH's special commandsHelp needed

On Thu, Oct 15, 2009 at 6:25 PM, William Pierce 
wrote:



Folks:

I see in the DIH wiki that there are special commands which according to
the wiki

"Special commands can be given to DIH by adding certain variables to the
row returned by any of the components . "

In my use case,  my db contains rows that are marked "PendingDelete". 
How

do I use the $deleteDocByQuery special command to delete these rows using
DIH?In other words,  where/how do I specify this?


The $deleteDocByQuery is for deleting Solr documents by a Solr query and 
not

DB rows.

--
Regards,
Shalin Shekhar Mangar.



Re: Using DIH's special commands....Help needed

2009-10-15 Thread William Pierce
Thanks for your help.  Here is my DIH config fileI'd appreciate any 
help/pointers you may give me.  No matter what I do the documents are not 
getting deleted from the index.  My db has rows whose 'IndexingStatus' field 
has values of either 1 (which means add it to solr), or 4 (which means 
delete the document with the primary key from SOLR index).  I have two 
transformers running.  Not sure what I am doing wrong.



 <![CDATA[
   function DeleteRow(row){
   var jis = row.get('IndexingStatus');
   var jid = row.get('Id');
   if ( jis == 4 ) {
row.put('$deleteDocById', jid);
}
   return row;
   }
   ]]>

 
 
  query=" select  Id, a, b, c, IndexingStatus from  prod_table 
where (IndexingStatus = 1 or IndexingStatus = 4) ">




   
 



Thanks,

- Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 11:03 AM
To: 
Subject: Re: Using DIH's special commandsHelp needed

On Thu, Oct 15, 2009 at 10:42 PM, William Pierce 
wrote:


Thanks, Shalin.   I am sorry if I phrased it incorrectly.  Yes,  I want 
to
know how to delete documents in the solr index using the 
$deleteDocByQuery
special command.   I looked in the wiki doc and could not find out how to 
do

this



Sorry, I misunderstood your intent. These special flag variables can be
emitted by Transformers. So what you can do is write a Transformer which
checks if the current row contains "PendingDelete" in the column and add a
key/value pair to the Map. The key should be "$deleteDocByQuery" and value
should be the Solr query to be used for deletion. You can write the
transformer in Java as well as Javascript.

--
Regards,
Shalin Shekhar Mangar.



Re: Using DIH's special commands....Help needed

2009-10-15 Thread William Pierce
Thanks for your reply!  I tried your suggestion.  No luck.  I have verified 
that I have version  1.6.0_05-b13 of java installed.  I am running with the 
nightly bits of October 7.  I am pretty much out of ideas at the present 
timeI'd appreciate any tips/pointers.


Thanks,

- Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 1:42 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed

On Fri, Oct 16, 2009 at 12:46 AM, William Pierce 
wrote:



Thanks for your help.  Here is my DIH config fileI'd appreciate any
help/pointers you may give me.  No matter what I do the documents are not
getting deleted from the index.  My db has rows whose 'IndexingStatus' 
field

has values of either 1 (which means add it to solr), or 4 (which means
delete the document with the primary key from SOLR index).  I have two
transformers running.  Not sure what I am doing wrong.


 <![CDATA[
  function DeleteRow(row){
  var jis = row.get('IndexingStatus');
  var jid = row.get('Id');
  if ( jis == 4 ) {
   row.put('$deleteDocById', jid);
   }
  return row;
  }
  ]]>

 
 
  
   
   
   
  
 



One thing I'd try is to use '4' for comparison rather than the number 4 
(the

type would depend on the sql type). Also, for javascript transformers to
work, you must use JDK 6 which has javascript support. Rest looks fine to
me.

--
Regards,
Shalin Shekhar Mangar.



Re: Using DIH's special commands....Help needed

2009-10-16 Thread William Pierce

Folks:

Continuing my saga with DIH and use of its special commands.  I have 
verified that the script functionality is indeed working.I also verified 
that '$skipRow' is working.But I don't think that '$deleteDocById' is 
working.


My script now looks as follows:


<![CDATA[
function DeleteRow(row) {
   var jid = row.get('Id');
var jis = row.get('IndexingStatus');
if ( jis == 4 ) {
   row.put('$deleteDocById', jid);
   row.remove('Col1');
   row.put('Col1', jid);
  }
   return row;
   }
 ]]>
 

The theory is that rows whose 'IndexingStatus' value is 4 should be deleted 
from solr index.  Just to be sure that javascript syntax was correct and 
checked out,  I intentionally overwrite a field called 'Col1' in my schema 
with primary key of the document to be deleted.


On a clean and empty index, I import 47 rows from my dummy db.   Everything 
checks out correctly since IndexingStatus for each row is 1.  There are no 
rows to delete.I then go into the db and set one row with the 
IndexingStatus = 4.   When I execute the dataimport,  I find that all 47 
documents are imported correctly.   However,  for the row for which 
'IndexingStatus' was set to 4,  the Col1 value is set correctly by the 
script transformer to be the primary key value for that row/document. 
However,  I should not be seeing that document  since the '$deleteDocById 
should have deleted this from solr.


Could this be a bug in solr?  Or, am I misunderstanding how $deleteDocById 
works?


By the way, Noble, I tried to set the LogTransformer, and add logging per 
your suggestion.  That did not work either.  I set logLevel="debug", and 
also turned on solr logging in the admin console to be the max value 
(finest) and still no output.


Thanks,

- Bill



--
From: "Noble Paul ???  ??" 
Sent: Thursday, October 15, 2009 10:05 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


use  LogTransformer to see if the value is indeed set



this should print out the entire row after the transformations



On Fri, Oct 16, 2009 at 3:04 AM, William Pierce  
wrote:
Thanks for your reply!  I tried your suggestion.  No luck.  I have 
verified
that I have version  1.6.0_05-b13 of java installed.  I am running with 
the

nightly bits of October 7.  I am pretty much out of ideas at the present
timeI'd appreciate any tips/pointers.

Thanks,

- Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 1:42 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


On Fri, Oct 16, 2009 at 12:46 AM, William Pierce
wrote:


Thanks for your help.  Here is my DIH config fileI'd appreciate any
help/pointers you may give me.  No matter what I do the documents are 
not

getting deleted from the index.  My db has rows whose 'IndexingStatus'
field
has values of either 1 (which means add it to solr), or 4 (which means
delete the document with the primary key from SOLR index).  I have two
transformers running.  Not sure what I am doing wrong.


 <![CDATA[
 function DeleteRow(row){
 var jis = row.get('IndexingStatus');
 var jid = row.get('Id');
 if ( jis == 4 ) {
  row.put('$deleteDocById', jid);
  }
 return row;
 }
 ]]>

 
 
 
  
  
  
 
 




One thing I'd try is to use '4' for comparison rather than the number 4
(the
type would depend on the sql type). Also, for javascript transformers to
work, you must use JDK 6 which has javascript support. Rest looks fine 
to

me.

--
Regards,
Shalin Shekhar Mangar.







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Using DIH's special commands....Help needed

2009-10-16 Thread William Pierce

Shalin,

Many thanks for your tipBut it did not seem to help!

Do you think I can use postDeleteImportQuery for this task?

Should I file a bug report?

Cheers,

Bill

--
From: "Shalin Shekhar Mangar" 
Sent: Friday, October 16, 2009 1:16 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed

On Fri, Oct 16, 2009 at 5:54 PM, William Pierce 
wrote:



Folks:

Continuing my saga with DIH and use of its special commands.  I have
verified that the script functionality is indeed working.I also 
verified

that '$skipRow' is working.But I don't think that '$deleteDocById' is
working.

My script now looks as follows:


   <![CDATA[
   function DeleteRow(row) {
  var jid = row.get('Id');
   var jis = row.get('IndexingStatus');
   if ( jis == 4 ) {
  row.put('$deleteDocById', jid);
  row.remove('Col1');
  row.put('Col1', jid);
 }
  return row;
  }
]]>
 

The theory is that rows whose 'IndexingStatus' value is 4 should be 
deleted

from solr index.  Just to be sure that javascript syntax was correct and
checked out,  I intentionally overwrite a field called 'Col1' in my 
schema

with primary key of the document to be deleted.

On a clean and empty index, I import 47 rows from my dummy db. 
Everything
checks out correctly since IndexingStatus for each row is 1.  There are 
no

rows to delete.I then go into the db and set one row with the
IndexingStatus = 4.   When I execute the dataimport,  I find that all 47
documents are imported correctly.   However,  for the row for which
'IndexingStatus' was set to 4,  the Col1 value is set correctly by the
script transformer to be the primary key value for that row/document.
However,  I should not be seeing that document  since the '$deleteDocById
should have deleted this from solr.

Could this be a bug in solr?  Or, am I misunderstanding how 
$deleteDocById

works?



Would the row which has IndexingStatus=4 also create a document with the
same uniqueKey which you would delete using the transformer? If yes, that
can explain what is happening and you can avoid that by adding a $skipDoc
flag in addition to the $deleteDocById flag.

I know this is a basic question but you are using Solr 1.4, aren't you?

--
Regards,
Shalin Shekhar Mangar.



Re: Using DIH's special commands....Help needed

2009-10-19 Thread William Pierce

Lance, Noble:

I set logLevel="debug" in my dihconfig.xml at the entity level.   Got no 
output!   I then gave up digging into this further because I was pressed for 
time to dig into how to increase the speed of importing into solr with 
dih...


Cheers,

- Bill
--
From: "Noble Paul നോബിള്‍  नोब्ळ्" 
Sent: Monday, October 19, 2009 1:05 AM
To: 
Subject: Re: Using DIH's special commandsHelp needed


The accepted logLevel values are
error, deubug,warn,trace,info

2009/10/18 Noble Paul നോബിള്‍  नोब्ळ् :

On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog  wrote:

I had this problem also, but I was using the Jetty exampl. I fail at
logging configurations about 90% of the time, so I assumed it was my
fault.

did you set the logLevel atribute also in the entity? if you set
logLevel="severe" it should definitely be printed


2009/10/17 Noble Paul നോബിള്‍  नोब्ळ् :

It is strange that LogTransformer did not log the data. .

On Fri, Oct 16, 2009 at 5:54 PM, William Pierce  
wrote:

Folks:

Continuing my saga with DIH and use of its special commands.  I have
verified that the script functionality is indeed working.I also 
verified
that '$skipRow' is working.But I don't think that '$deleteDocById' 
is

working.

My script now looks as follows:


   <![CDATA[
   function DeleteRow(row) {
  var jid = row.get('Id');
   var jis = row.get('IndexingStatus');
   if ( jis == 4 ) {
  row.put('$deleteDocById', jid);
  row.remove('Col1');
  row.put('Col1', jid);
 }
  return row;
  }
]]>
 

The theory is that rows whose 'IndexingStatus' value is 4 should be 
deleted
from solr index.  Just to be sure that javascript syntax was correct 
and
checked out,  I intentionally overwrite a field called 'Col1' in my 
schema

with primary key of the document to be deleted.

On a clean and empty index, I import 47 rows from my dummy db. 
Everything
checks out correctly since IndexingStatus for each row is 1.  There 
are no

rows to delete.I then go into the db and set one row with the
IndexingStatus = 4.   When I execute the dataimport,  I find that all 
47

documents are imported correctly.   However,  for the row for which
'IndexingStatus' was set to 4,  the Col1 value is set correctly by the
script transformer to be the primary key value for that row/document.
However,  I should not be seeing that document  since the 
'$deleteDocById

should have deleted this from solr.

Could this be a bug in solr?  Or, am I misunderstanding how 
$deleteDocById

works?

By the way, Noble, I tried to set the LogTransformer, and add logging 
per
your suggestion.  That did not work either.  I set logLevel="debug", 
and

also turned on solr logging in the admin console to be the max value
(finest) and still no output.

Thanks,

- Bill



--
From: "Noble Paul ???  ??" 
Sent: Thursday, October 15, 2009 10:05 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


use  LogTransformer to see if the value is indeed set



this should print out the entire row after the transformations



On Fri, Oct 16, 2009 at 3:04 AM, William Pierce 


wrote:


Thanks for your reply!  I tried your suggestion.  No luck.  I have
verified
that I have version  1.6.0_05-b13 of java installed.  I am running 
with

the
nightly bits of October 7.  I am pretty much out of ideas at the 
present

timeI'd appreciate any tips/pointers.

Thanks,

- Bill

------
From: "Shalin Shekhar Mangar" 
Sent: Thursday, October 15, 2009 1:42 PM
To: 
Subject: Re: Using DIH's special commandsHelp needed


On Fri, Oct 16, 2009 at 12:46 AM, William Pierce
wrote:

Thanks for your help.  Here is my DIH config fileI'd 
appreciate any
help/pointers you may give me.  No matter what I do the documents 
are

not
getting deleted from the index.  My db has rows whose 
'IndexingStatus'

field
has values of either 1 (which means add it to solr), or 4 (which 
means
delete the document with the primary key from SOLR index).  I have 
two

transformers running.  Not sure what I am doing wrong.


 <![CDATA[
function DeleteRow(row){
var jis = row.get('IndexingStatus');
var jid = row.get('Id');
if ( jis == 4 ) {
 row.put('$deleteDocById', jid);
 }
return row;
}
]]>

 
 
 query=" select  Id, a, b, c, IndexingStatus from 

Re: Our new international sites powered by SOLR and wrapped by DOTNET are up and out! Yay!

2009-10-22 Thread William Pierce
Congratulations on thisWhat dotnet library did you use?   We are also 
using solr in our windows2003/C# environment but currently simply use HTTP 
to query and the Dataimport handler to update the indices...


- Bill

--
From: "Robert Petersen" 
Sent: Thursday, October 22, 2009 4:39 PM
To: 
Subject: Our new international sites powered by SOLR and wrapped by DOTNET 
are up and out!  Yay!



We are *very* happy with the lucid imagination distro of SOLR.  Here is
our official press release.  These were done with one core per
Language/Country btw...  :)

BUY.COM(r) LAUNCHES INTERNATIONAL ONLINE RETAILING WEBSITES

Just in Time for the Holidays, Leading Online Retailer Delivers
the Best Deals, Free Shipping to Shoppers in Canada and Europe


ALISO VIEJO, Calif., October 21, 2009 - Buy.com(r), The Internet
Superstore(tm), today announced its international expansion with
dedicated e-commerce sites for customers in Canada, France, Germany and
the United Kingdom, giving shoppers access to the best deals online and
free shipping on today's hottest consumer electronics and technology
products.

As part of its international expansion, Buy.com also will sell on eBay
sites in these four countries. By the end of the year, the company will
open e-commerce sites for Italy and Spain, with plans to eventually
increase its footprint worldwide and expand its product offerings with
new categories and the addition of Marketplace sellers.

Local, in-country e-commerce warehouses enable shoppers to take
advantage of speedier delivery time and customized product information
in local languages and currencies compared to shopping on Buy.com's U.S.
site.

"We plan to replicate our successful online retailing model throughout
the world," said Neel Grover, President and CEO, Buy.com.  "Like
Buy.com's U.S. roots, we're introducing our core offerings in
electronics first, but we plan to deliver robust product catalogs
featuring a variety of categories and the same value-added services to
our global customers."

Just in time for the holiday season, international shoppers have a new
destination for a great online shopping experience.  To help shoppers
make the best purchasing decisions, Buy.com offers user reviews, free
shipping and special deals and promotions.


In addition, Buy.com also provides an environmentally friendly shopping
alternative.  Carnegie Mellon University's Green Design Institute
recently conducted a retail environmental impact study, showing that
shopping online via Buy.com's virtual e-commerce model reduces energy
consumption and carbon emissions by 35 percent compared to shopping at
traditional brick-and-mortar outlets.

Shoppers can visit Buy.com's new international websites at ca.buy.com
(Canada); fr.buy.com (France); de.buy.com (Germany) and uk.buy.com
(United Kingdom).




About Buy.com
With more than 12 million customer accounts, Buy.com is a leading retail
marketplace, focused on providing its customers with a rewarding
shopping experience and a broad selection of high-quality technology and
entertainment retail goods at competitive prices. Buy.com offers
millions of products in a range of categories, including consumer
electronics, computer hardware and software, cell phones, books, music,
videos, games, toys, bags, fragrance, home and outdoor, baby, jewelry,
shoes, apparel and sporting goods. Founded in June of 1997, Buy.com is
headquartered in Aliso Viejo, California. Buy.com(r) and The Internet
Superstore(tm) are trademarks of Buy.com Inc. Buy.com currently competes
with a variety of companies that can be divided into two broad
categories: (i) retailers and ecommerce marketplaces such as Wal-Mart
and (ii) specialty retailers or manufacturers such as Barnes & Noble,
Best Buy and Dell.





Is optimized?

2009-10-23 Thread William Pierce
Folks:

If I issue two  requests with no intervening changes to the index,  
will the second optimize request be smart enough to not do anything?

Thanks,

Bill

DIH out of memory exception

2009-10-27 Thread William Pierce
Folks:

My db contains approx 6M records -- on average each is approx 1K bytes.   When 
I use the DIH,  I reliably get an OOM exception.   The machine has 4 GB ram,  
my tomcat is set to use max heap of 2G.  

The option of increasing memory is not tenable coz as the number of documents 
grows I will be back in this situation.  

Is there a way to batch the documents?  I tried setting the batchsize parameter 
to 500 on the  tag where I specify the jdbc parameters.   This had 
no effect.

Best,

- Bill

Re: DIH out of memory exception

2009-10-27 Thread William Pierce

Hi, Gilbert:

Thanks for your tip!  I just tried it.  Unfortunately, it does not work for 
me.  I still get the OOM exception.


How large was your dataset?  And what were your machine specs?

Cheers,

- Bill

--
From: "Gilbert Boyreau" 
Sent: Tuesday, October 27, 2009 11:54 AM
To: 
Subject: Re: DIH out of memory exception


Hi,

I got the same problem using DIH with a large dataset in MySql database.

Following : 
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-implementation-notes.html,
and looking at the java code, it appears that DIH use PreparedStatement in 
the JdbcDataSource.


I set the batchsize parameter to -1 and it solved my problem.

Regards.
Gilbert.

William Pierce a écrit :

Folks:

My db contains approx 6M records -- on average each is approx 1K bytes. 
When I use the DIH,  I reliably get an OOM exception.   The machine has 4 
GB ram,  my tomcat is set to use max heap of 2G.
The option of increasing memory is not tenable coz as the number of 
documents grows I will be back in this situation.
Is there a way to batch the documents?  I tried setting the batchsize 
parameter to 500 on the  tag where I specify the jdbc 
parameters.   This had no effect.


Best,

- Bill






Re: data import with transformer

2009-10-29 Thread William Pierce
I'd recommend two ways:   The way I do it in my app is that I have written a 
MySql function to transform the column as part of the select statement.   In 
this approach, your select query would like so:
  select  col1, col2, col3, spPrettyPrintCategory(category) as X, col4, 
col5,  from table where 


 

The  element is used to map the column "X" into the solr field name 
which I am assuming is the same as your "category" name.


The second approach is to write the JavaScript transformer.  The relevant 
code is in the wiki:



      // get the second element of this array...do a trim if 
needed...
   var catname = pieces[1];
   row.remove('category');
   row.put('category', catname);
   return row;
   }
   ]]>
   
   query="select * from X">

   
   
   


- Bill

--
From: "Joel Nylund" 
Sent: Thursday, October 29, 2009 9:18 AM
To: 
Subject: data import with transformer


Hi, I have been reading the solr book and wiki, but I cant find any
similar examples to what Im looking for.

I have a database field called category, this field needs some text
manipulation before it goes in the index

here is the java code for what im trying to do:

// categories look like this "prefix category suffix"
// I want to turn them into "category" remove prefix and suffix and
spaces before and after
 public static String getPrettyCategoryName(String categoryName)
{
String result;

if (categoryName == null || categoryName.equals(""))
{
// nothing to do; just return what was passed in.
result = categoryName;
}
else
{
result = categoryName.toLowerCase();

if (result.startsWith(startString))
{
result = result.substring(startString.length());
}

if (result.endsWith(endString))
{
result = result.substring(0, (result.length() -
endString
.length()));
}

if (result.length() > 0)
{
result = Character.toUpperCase(result.charAt(0))
+ result.substring(1);
}
}

return result;
}


Can I have a transformer call a java method?

It seems like I can, but how do I transform must one column. If
someone can point me to a complete example that transforms a column
using java or javascript im sure I can figure this out


thanks
Joel




Solr Internal exception on startup...

2009-11-09 Thread William Pierce
Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 box,  
running tomcat 6.  I have deposited the solr nightly bits (as of October 7) 
into the folder: /usr/share/tomcat6/lib

The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib 
read)
at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my windows 
installation).   In any event, I created an empty "lib' folder and I am still 
getting this same exception.   (I gave the lib folder 777 permission.)


   


Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr folders 
(conf, data).  

Can anybody help me here with what looks like a basic configuration error on my 
part?

Thanks,

- Bill



Solr Internal exception on startup...

2009-11-09 Thread William Pierce
Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 box,  
running tomcat 6.  I have deposited the solr nightly bits (as of October 7) 
into the folder: /usr/share/tomcat6/lib

The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib 
read)
at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my windows 
installation).   In any event, I created an empty "lib' folder and I am still 
getting this same exception.   (I gave the lib folder 777 permission.)


   


Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr folders 
(conf, data).  

Can anybody help me here with what looks like a basic configuration error on my 
part?

Thanks,

- Bill



Re: Solr Internal exception on startup...

2009-11-09 Thread William Pierce
Sorry...folks...I saw that there were two copies sent outBeen having 
some email snafus at my end...so apologize in advance for the duplicate 
email


- Bill

--
From: "William Pierce" 
Sent: Monday, November 09, 2009 12:49 PM
To: 
Subject: Solr Internal exception on startup...


Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 
box,  running tomcat 6.  I have deposited the solr nightly bits (as of 
October 7) into the folder: /usr/share/tomcat6/lib


The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext 
filterStart

SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: 
java.security.AccessControlException: access denied 
(java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)



This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my 
windows installation).   In any event, I created an empty "lib' folder and 
I am still getting this same exception.   (I gave the lib folder 777 
permission.)


debug="0" crossContext="true" >
  value="/home/ubuntu/apps/solr/tomcatweb/prod" override="true" />



Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr 
folders (conf, data).


Can anybody help me here with what looks like a basic configuration error 
on my part?


Thanks,

- Bill




Re: Solr Internal exception on startup...

2009-11-09 Thread William Pierce

All,

I realized that the stack trace I had sent in my previous email was 
truncated to not include the solr portions.here is the fuller stack 
trace:


SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission 
/home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
   at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
   at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
   at 
org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
   at 
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)

   at java.security.AccessController.doPrivileged(Native Method)
   at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769)
   at 
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
   at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
   at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
   at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
   at 
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
   at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
   at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

Nov 9, 2009 9:08:57 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission 
/home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


Cheers,

- Bill

--
From: "William Pierce" 
Sent: Monday, November 09, 2009 12:49 PM
To: 
Subject: Solr Internal exception on startup...


Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 
box,  running tomcat 6.  I have deposited the solr nightly bits (as of 
October 7) into the folder: /usr/share/tomcat6/lib


The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext 
filterStart

SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: 
java.security.AccessControlException: access denied 
(java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Clas

ext3 vs ext4 vs xfs for solr....recommendations needed...

2009-11-16 Thread William Pierce
Folks:

For those of your experienced linux-solr hands, I am seeking recommendations 
for which file system you think would work best with solr.  We are currently 
running with Ubuntu 9.04 on an amazon ec2 instance.  The default file system I 
think is ext3.  

 I am of course seeking, of course, to ensure good performance with stability.  
What I have been reading is that ext4 may be a little too "bleeding edge"...but 
I defer to those of you who know more about this...

Thanks,

- Bill

Thought that masterUrl in slave solrconfig.xml is optional...

2009-11-30 Thread William Pierce
Folks:

Reading the wiki,  I saw the following statement:
  "Force a fetchindex on slave from master command : 
http://slave_host:port/solr/replication?command=fetchindex 

  It is possible to pass on extra attribute 'masterUrl' or other attributes 
like 'compression' (or any other parameter which is specified in the  tag) to do a one time replication from a master. This obviates 
the need for hardcoding the master in the slave. "
In my case, I cannot hardcode the masterurl in the config file.  I want a cron 
job to issue the replication commands for each of the slaves.

So I issued the following command:

http://localhost/postings/replication?command=fetchIndex&masterUrl=http%3a%2f%2flocalhost%2fpostingsmaster

I got the following exception:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for 
more detailed information on what may be wrong. If you want solr to continue 
after configuration errors, change: 
false in null 
- 

org.apache.solr.common.SolrException: 'masterUrl' is required for a slave at 
org.apache.solr.handler.SnapPuller.(SnapPuller.java:126) at 

other lines removed

Why is error message asking me to specify the masterUrl in the config file when 
the wiki states that this is optional?   Or, am I understanding this 
incorrectly?

Thanks,

- Bill




How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce
Folks:

I do not want to hardcode the masterUrl in the solrconfig.xml of my slave.  If 
the masterUrl tag is missing from the config file, I am getting an exception in 
solr saying that the masterUrl is required.  So I set it to some dummy value,  
comment out the poll interval element,  and issue a replication command 
manually like so:

http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication

Now no internal exception,  solr responds with a status "OK" for the above 
request,  the tomcat logs show no error but the index is not replicated.  When 
I issue the details command to the slave,  I see that it ignored the masterUrl 
on the command line but instead complains that the master url in the config 
file (which I had set to a dummy value) is not correct.

(Just fyi, I have tried sending in the masterUrl to the above command with url 
encoding and also without.  in both cases, I got the same result.)

Show exactly do I avoid hardcoding the masterUrl in the config file?  
Any pointers/help will be greatly appreciated!

- Bill

Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce

Folks:

Sorry for this repost!  It looks like this email went out twice

Thanks,

- Bill

--
From: "William Pierce" 
Sent: Monday, November 30, 2009 1:47 PM
To: 
Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?


Folks:

I do not want to hardcode the masterUrl in the solrconfig.xml of my slave. 
If the masterUrl tag is missing from the config file, I am getting an 
exception in solr saying that the masterUrl is required.  So I set it to 
some dummy value,  comment out the poll interval element,  and issue a 
replication command manually like so:


http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication

Now no internal exception,  solr responds with a status "OK" for the above 
request,  the tomcat logs show no error but the index is not replicated. 
When I issue the details command to the slave,  I see that it ignored the 
masterUrl on the command line but instead complains that the master url in 
the config file (which I had set to a dummy value) is not correct.


(Just fyi, I have tried sending in the masterUrl to the above command with 
url encoding and also without.  in both cases, I got the same result.)


Show exactly do I avoid hardcoding the masterUrl in the config 
file?  Any pointers/help will be greatly appreciated!


- Bill 




Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce

Hi, Joe:

I tried with the "fetchIndex" all lower-cased, and still the same result. 
What do you specify for masterUrl in the solrconfig.xml on the slave?   it 
seems to me that if I remove the element,  I get the exception I wrote 
about.   If I set  it to some dummy url, then I get an invalid url message 
when I run the command=details on the slave replication handler.


What I am doing does not look out of the ordinary.   I want to control the 
masterurl and the time of replication by myself.  As such I want neither the 
masterUrl nor the polling interval in the config file.  Can you share 
relevant snippets of your config file and the exact url your code is 
generating?


Thanks,

- Bill

--
From: "Joe Kessel" 
Sent: Monday, November 30, 2009 3:45 PM
To: 
Subject: RE: How to avoid hardcoding masterUrl in slave solrconfig.xml?



I do something very similar and it works for me.  I noticed on your URL 
that you have a mixed case fetchIndex, which the request handler is 
checking for fetchindex, all lowercase.  If it is not that simple I can 
try to see the exact url my code is generating.




Hope it helps,

Joe


From: evalsi...@hotmail.com
To: solr-user@lucene.apache.org
Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
Date: Mon, 30 Nov 2009 13:48:38 -0800

Folks:

Sorry for this repost! It looks like this email went out twice

Thanks,

- Bill

------
From: "William Pierce" 
Sent: Monday, November 30, 2009 1:47 PM
To: 
Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?

> Folks:
>
> I do not want to hardcode the masterUrl in the solrconfig.xml of my 
> slave.

> If the masterUrl tag is missing from the config file, I am getting an
> exception in solr saying that the masterUrl is required. So I set it to
> some dummy value, comment out the poll interval element, and issue a
> replication command manually like so:
>
> 
http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication
>
> Now no internal exception, solr responds with a status "OK" for the 
> above

> request, the tomcat logs show no error but the index is not replicated.
> When I issue the details command to the slave, I see that it ignored 
> the
> masterUrl on the command line but instead complains that the master url 
> in

> the config file (which I had set to a dummy value) is not correct.
>
> (Just fyi, I have tried sending in the masterUrl to the above command 
> with

> url encoding and also without. in both cases, I got the same result.)
>
> Show exactly do I avoid hardcoding the masterUrl in the config
> file? Any pointers/help will be greatly appreciated!
>
> - Bill



_
Bing brings you maps, menus, and reviews organized in one place.
http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1 




Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-12-01 Thread William Pierce

Thanks, NobleThat did the trick!

- Bill

--
From: "Noble Paul നോബിള്‍  नोब्ळ्" 
Sent: Monday, November 30, 2009 10:20 PM
To: 
Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

remove the  section from your solrconfig. It should be 
fine


On Tue, Dec 1, 2009 at 6:59 AM, William Pierce  
wrote:

Hi, Joe:

I tried with the "fetchIndex" all lower-cased, and still the same result.
What do you specify for masterUrl in the solrconfig.xml on the slave? 
it

seems to me that if I remove the element,  I get the exception I wrote
about.   If I set  it to some dummy url, then I get an invalid url 
message

when I run the command=details on the slave replication handler.

What I am doing does not look out of the ordinary.   I want to control 
the
masterurl and the time of replication by myself.  As such I want neither 
the

masterUrl nor the polling interval in the config file.  Can you share
relevant snippets of your config file and the exact url your code is
generating?

Thanks,

- Bill

--
From: "Joe Kessel" 
Sent: Monday, November 30, 2009 3:45 PM
To: 
Subject: RE: How to avoid hardcoding masterUrl in slave solrconfig.xml?



I do something very similar and it works for me.  I noticed on your URL
that you have a mixed case fetchIndex, which the request handler is 
checking
for fetchindex, all lowercase.  If it is not that simple I can try to 
see

the exact url my code is generating.



Hope it helps,

Joe


From: evalsi...@hotmail.com
To: solr-user@lucene.apache.org
Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
Date: Mon, 30 Nov 2009 13:48:38 -0800

Folks:

Sorry for this repost! It looks like this email went out twice

Thanks,

- Bill

------
From: "William Pierce" 
Sent: Monday, November 30, 2009 1:47 PM
To: 
Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?

> Folks:
>
> I do not want to hardcode the masterUrl in the solrconfig.xml of my >
> slave.
> If the masterUrl tag is missing from the config file, I am getting an
> exception in solr saying that the masterUrl is required. So I set it 
> to

> some dummy value, comment out the poll interval element, and issue a
> replication command manually like so:
>
>
> 
http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication
>
> Now no internal exception, solr responds with a status "OK" for the >
> above
> request, the tomcat logs show no error but the index is not 
> replicated.
> When I issue the details command to the slave, I see that it ignored 
>  >

> the
> masterUrl on the command line but instead complains that the master 
> url

> > in
> the config file (which I had set to a dummy value) is not correct.
>
> (Just fyi, I have tried sending in the masterUrl to the above command 
>  >

> with
> url encoding and also without. in both cases, I got the same result.)
>
> Show exactly do I avoid hardcoding the masterUrl in the 
> config

> file? Any pointers/help will be greatly appreciated!
>
> - Bill



_
Bing brings you maps, menus, and reviews organized in one place.

http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com



Re: search on tomcat server

2009-12-04 Thread William Pierce

Have you gone through the solr tomcat wiki?

http://wiki.apache.org/solr/SolrTomcat

I found this very helpful when I did our solr installation on tomcat.

- Bill

--
From: "Jill Han" 
Sent: Friday, December 04, 2009 8:54 AM
To: 
Subject: RE: search on tomcat server

I went through all the links on 
http://wiki.apache.org/solr/#Search_and_Indexing

And still have no clue as how to proceed.
1. do I have to do some implementation in order to get solr to search doc. 
on tomcat server?
2. if I have files, such as .doc, docx, .pdf, .jsp, .html, etc under 
window xp, c:/tomcat/webapps/test1, /webapps/test2,

  What should I do to make solr search those directories
3. since I am using tomcat, instead of jetty, is there any demo that shows 
the solr searching features, and real searching result?


Thanks,
Jill


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Monday, November 30, 2009 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: search on tomcat server

On Mon, Nov 30, 2009 at 9:55 PM, Jill Han  wrote:


I got solr running on the tomcat server,
http://localhost:8080/solr/admin/

After I enter a search word, such as, solr, then hit Search button, it
will go to

http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&in
dent=on

 and display

  

-
>
 <

-
>
 <  

 <0

 <0

-
>
 <

 <  10

 <  0

 <  on

 <  solr

 <  2.2



  

 <  

 

 My question is what is the next step to search files on tomcat server?




Looks like you have not added any documents to Solr. See the "Indexing
Documents" section at http://wiki.apache.org/solr/#Search_and_Indexing

--
Regards,
Shalin Shekhar Mangar.



Best way to handle bitfields in solr...

2009-12-04 Thread William Pierce
Folks:

In my db I currently have fields that represent bitmasks.   Thus, for example, 
a value of the mask of 48 might represent an "undergraduate" (value = 16) and 
"graduate" (value = 32).   Currently,  the corresponding field in solr is a 
multi-valued string field called "EdLevel" which will have 
Undergraduate and Graduate  as its two values 
(for this example).   I do the conversion from the int into the list of values 
as I do the indexing.

Ideally, I'd like solr to have bitwise operations so that I could store the int 
value, and then simply search by using bit operations.  However, given that 
this is not possible,  and that there have been recent threads speaking to 
performance issues with multi-valued fields,  is there something better I could 
do?

TIA,

- Bill

Exception encountered during replication on slave....Any clues?

2009-12-07 Thread William Pierce
Folks:

I am seeing this exception in my logs that is causing my replication to fail.   
 I start with  a clean slate (empty data directory).  I index the data on the 
postingsmaster using the dataimport handler and it succeeds.  When the 
replication slave attempts to replicate it encounters this error. 

Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost/postingsmaster/replication is not 
available. Index fetch failed. Exception: Invalid version or the data in not in 
'javabin' format

Any clues as to what I should look for to debug this further?  

Replication is enabled as follows:

The postingsmaster solrconfig.xml looks as follows:



  
  commit
  
  

  

The postings slave solrconfig.xml looks as follows:




http://localhost/postingsmaster/replication 
 

00:05:00  
 
  


Thanks,

- Bill




Re: Exception encountered during replication on slave....Any clues?

2009-12-07 Thread William Pierce

tck,

thanks for your quick response.  I am running on the default port (8080). 
If I copy that exact string given in the masterUrl and execute it in the 
browser I get a response from solr:



- 
- 
 0
 0
 
 OK
 No command
 

So the masterUrl is reachable/accessible so far as I am able to tell

Thanks,

- Bill

--
From: "TCK" 
Sent: Monday, December 07, 2009 1:50 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny clues?


are you missing the port number in the master's url ?

-tck



On Mon, Dec 7, 2009 at 4:44 PM, William Pierce 
wrote:



Folks:

I am seeing this exception in my logs that is causing my replication to
fail.I start with  a clean slate (empty data directory).  I index the
data on the postingsmaster using the dataimport handler and it succeeds.
 When the replication slave attempts to replicate it encounters this 
error.


Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

SEVERE: Master at: http://localhost/postingsmaster/replication is not
available. Index fetch failed. Exception: Invalid version or the data in 
not

in 'javabin' format

Any clues as to what I should look for to debug this further?

Replication is enabled as follows:

The postingsmaster solrconfig.xml looks as follows:


   
 
 commit
 
 
   
 

The postings slave solrconfig.xml looks as follows:


   
   
   http://localhost/postingsmaster/replication

   

   00:05:00

 


Thanks,

- Bill







Re: Exception encountered during replication on slave....Any clues?

2009-12-07 Thread William Pierce
Just to make doubly sure,  per tck's suggestion,  I went in and explicitly 
added in the port in the masterurl so that it now reads:


http://localhost:8080/postingsmaster/replication

Still getting the same exception...

I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6.

Thanks,

- Bill

--
From: "William Pierce" 
Sent: Monday, December 07, 2009 2:03 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny clues?


tck,

thanks for your quick response.  I am running on the default port (8080). 
If I copy that exact string given in the masterUrl and execute it in the 
browser I get a response from solr:



- 
- 
 0
 0
 
 OK
 No command
 

So the masterUrl is reachable/accessible so far as I am able to tell

Thanks,

- Bill

--
From: "TCK" 
Sent: Monday, December 07, 2009 1:50 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny 
clues?



are you missing the port number in the master's url ?

-tck



On Mon, Dec 7, 2009 at 4:44 PM, William Pierce 
wrote:



Folks:

I am seeing this exception in my logs that is causing my replication to
fail.I start with  a clean slate (empty data directory).  I index 
the

data on the postingsmaster using the dataimport handler and it succeeds.
 When the replication slave attempts to replicate it encounters this 
error.


Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

SEVERE: Master at: http://localhost/postingsmaster/replication is not
available. Index fetch failed. Exception: Invalid version or the data in 
not

in 'javabin' format

Any clues as to what I should look for to debug this further?

Replication is enabled as follows:

The postingsmaster solrconfig.xml looks as follows:


   
 
 commit
 
 
   
 

The postings slave solrconfig.xml looks as follows:


   
   
   http://localhost/postingsmaster/replication

   

   00:05:00

 


Thanks,

- Bill









Re: Exception encountered during replication on slave....Any clues?

2009-12-08 Thread William Pierce

Hi, Noble:

When I hit the masterUrl from the slave box at

http://localhost:8080/postingsmaster/replication

I get the following xml response:


   - 
   - 
0
   0
   
   OK
No command
 

And then when I look in the logs,  I see the exception that I mentioned. 
What exactly does this error mean that "replication is not available".By 
the way, when I go to the admin url for the slave and click on replication, 
I see a screen with the master url listed (as above) and the word 
"unreachable" after it.And, of course, the same exception shows up in 
the tomcat logs.


Thanks,

- Bill

--
From: "Noble Paul നോബിള്‍  नोब्ळ्" 
Sent: Monday, December 07, 2009 9:20 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny clues?


are you able to hit the
http://localhost:8080/postingsmaster/replication using a browser from
the slave box. if you are able to hit it what do you see?


On Tue, Dec 8, 2009 at 3:42 AM, William Pierce 
wrote:

Just to make doubly sure,  per tck's suggestion,  I went in and
explicitly
added in the port in the masterurl so that it now reads:

http://localhost:8080/postingsmaster/replication

Still getting the same exception...

I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6.

Thanks,

- Bill

----------
From: "William Pierce" 
Sent: Monday, December 07, 2009 2:03 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny
clues?


tck,

thanks for your quick response.  I am running on the default port
(8080).
If I copy that exact string given in the masterUrl and execute it in the
browser I get a response from solr:


- 
- 
 0
 0
 
 OK
 No command
 

So the masterUrl is reachable/accessible so far as I am able to tell

Thanks,

- Bill

--
From: "TCK" 
Sent: Monday, December 07, 2009 1:50 PM
To: 
Subject: Re: Exception encountered during replication on slaveAny
clues?


are you missing the port number in the master's url ?

-tck



On Mon, Dec 7, 2009 at 4:44 PM, William Pierce
wrote:


Folks:

I am seeing this exception in my logs that is causing my replication
to
fail.I start with  a clean slate (empty data directory).  I index
the
data on the postingsmaster using the dataimport handler and it
succeeds.
 When the replication slave attempts to replicate it encounters this
error.

Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller
fetchLatestIndex
SEVERE: Master at: http://localhost/postingsmaster/replication is not
available. Index fetch failed. Exception: Invalid version or the data
in
not
in 'javabin' format

Any clues as to what I should look for to debug this further?

Replication is enabled as follows:

The postingsmaster solrconfig.xml looks as follows:


  

commit


  
 

The postings slave solrconfig.xml looks as follows:


  
  
  http://localhost/postingsmaster/replication

  
  00:05:00
   
 


Thanks,

- Bill













--
-
Noble Paul | Systems Architect| AOL | http://aol.com



Solr 1.3 does not recognize Solr home...

2008-09-22 Thread William Pierce

Folks:

I have an odd situation that I am hoping someone can shed light on.

I have a solr apps running under tomcat 6.0.14 (on a windows xp sp3 
machine).

The app is declared in the tomcat config file as follows:

In file "merchant.xml" for the "merchant" app:


  


I have of course created the folders:  c:\tomcatweb\merchant under which you 
can find the "data", "conf" and

"bin" folders.

Now this configuration worked with 1.2.  It also worked with the 1.3
solr.war but still using the older 1.2 config files that I have been using.

The problem comes when I use the 1.3 solr.war and use the new 1.3
solrconfig.xml files.  I used as a base the config files found in the
"example" folder of the 1.3 bits I downloaded.  I then modified these files
by including
the various fields, field types, and the request handler definitions, etc
that are particular to my configuration.

When I start up tomcat,  I notice that the home directory is set to
"Windows\system32\solr" and the requisite index files are being created
under that.

I have temporarily solved the problem by hardcoding the folders in the
 element like so:

C:\tomcatweb\merchant\data   (in the solrconfig.xml)

Any ideas of what I am doing wrong?





Advice needed on master-slave configuration

2008-10-22 Thread William Pierce

Folks:

I have two instances of solr running one on the master (U) and the other on 
the slave (Q).  Q is used for queries only, while U is where updates/deletes 
are done.   I am running on Windows so unfortunately I cannot use the 
distribution scripts.


Every N hours when changes are committed and the index on U is updated,  I 
want to copy the files from the master to the slave.Do I need to halt 
the solr server on Q while the index is being updated?  If not,  how do I 
copy the files into the data folder while the server is running? Any 
pointers would be greatly appreciated!


Thanks!

- Bill 



Re: Advice needed on master-slave configuration

2008-10-22 Thread William Pierce

Otis,

Yes,  I had forgotten that Windows will not permit me to overwrite files 
currently in use.   So my copy scripts are failing.  Windows will not even 
allow a rename of a folder containing a file in use so I am not sure how to 
do this


I am going to dig around and see what I can come up with short of 
stopping/restarting tomcat...


Thanks,
- Bill


--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 2:30 PM
To: 
Subject: Re: Advice needed on master-slave configuration

Normally you don't have to start Q, but only "reload" Solr searcher when 
the index has been copied.
However, you are on Windows, and its FS has the tendency not to let you 
delete/overwrite files that another app (Solr/java) has opened.  Are you 
able to copy the index from U to Q?  How are you doing it?  Are you 
deleting index files from the index dir on Q that are no longer in the 
index dir on U?



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, October 22, 2008 5:24:28 PM
Subject: Advice needed on master-slave configuration

Folks:

I have two instances of solr running one on the master (U) and the other 
on
the slave (Q).  Q is used for queries only, while U is where 
updates/deletes

are done.   I am running on Windows so unfortunately I cannot use the
distribution scripts.

Every N hours when changes are committed and the index on U is updated, 
I

want to copy the files from the master to the slave.Do I need to halt
the solr server on Q while the index is being updated?  If not,  how do I
copy the files into the data folder while the server is running? Any
pointers would be greatly appreciated!

Thanks!

- Bill





Re: Advice needed on master-slave configuration

2008-10-23 Thread William Pierce
It appears that the only solution (outside of Noble Paul's suggestion for 
using solr's replication handler) is for me to restart tomcat.To 
minimize the effect of this downtime, I propose to do the following:


a) Generate the new index on the master periodically.   When the new index 
has been created,  the apps that do the search,  will now be made to point 
their queries at this master.


b) Because queries are now going to the master, tomcat on the slave can be 
stopped,  the index file copied over and tomcat is restarted.


c) The apps that do search will be made to point their queries back at the 
slave.


Does anyone see a better approach?

Also, when will solr's replication handler release in an official release? 
Can it be released as a patch on 1.3?  It is terribly useful functionality 
and if there's a way to get it out sooner,  I'd sure appreciate it!


Thanks,
- Bill

--
From: "Noble Paul ??? ??" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 10:51 PM
To: 
Subject: Re: Advice needed on master-slave configuration


If you are using a nightly you can try the new SolrReplication feature
http://wiki.apache.org/solr/SolrReplication


On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> 
wrote:

Otis,

Yes,  I had forgotten that Windows will not permit me to overwrite files
currently in use.   So my copy scripts are failing.  Windows will not 
even
allow a rename of a folder containing a file in use so I am not sure how 
to

do this

I am going to dig around and see what I can come up with short of
stopping/restarting tomcat...

Thanks,
- Bill


--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 2:30 PM
To: 
Subject: Re: Advice needed on master-slave configuration


Normally you don't have to start Q, but only "reload" Solr searcher when
the index has been copied.
However, you are on Windows, and its FS has the tendency not to let you
delete/overwrite files that another app (Solr/java) has opened.  Are you
able to copy the index from U to Q?  How are you doing it?  Are you 
deleting
index files from the index dir on Q that are no longer in the index dir 
on

U?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, October 22, 2008 5:24:28 PM
Subject: Advice needed on master-slave configuration

Folks:

I have two instances of solr running one on the master (U) and the 
other

on
the slave (Q).  Q is used for queries only, while U is where
updates/deletes
are done.   I am running on Windows so unfortunately I cannot use the
distribution scripts.

Every N hours when changes are committed and the index on U is updated, 
I
want to copy the files from the master to the slave.Do I need to 
halt
the solr server on Q while the index is being updated?  If not,  how do 
I
copy the files into the data folder while the server is running? 
Any

pointers would be greatly appreciated!

Thanks!

- Bill









--
--Noble Paul



Re: Advice needed on master-slave configuration

2008-10-23 Thread William Pierce

I tried the nightly build from 10/18 -- I did the following:

a) I downloaded the nightly build of 10/18 (the zip file).

b) I unpacked it and copied the war file to my tomcat lib folder.

c) I made the relevant changes in the config files per the instructions 
shown in the wiki.


When tomcat starts, I see the error message in tomcat logs...

Caused by: java.lang.ClassNotFoundException: solr.ReplicationHandler
	at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1358)
	at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1204)

at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
	at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:258)

... 36 more

Where do I get the nightly bits that will enable me to try this replication 
handler?


Thanks,
- Bill

--
From: "Noble Paul ??? ??" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 10:51 PM
To: 
Subject: Re: Advice needed on master-slave configuration


If you are using a nightly you can try the new SolrReplication feature
http://wiki.apache.org/solr/SolrReplication


On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> 
wrote:

Otis,

Yes,  I had forgotten that Windows will not permit me to overwrite files
currently in use.   So my copy scripts are failing.  Windows will not 
even
allow a rename of a folder containing a file in use so I am not sure how 
to

do this

I am going to dig around and see what I can come up with short of
stopping/restarting tomcat...

Thanks,
- Bill


--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 2:30 PM
To: 
Subject: Re: Advice needed on master-slave configuration


Normally you don't have to start Q, but only "reload" Solr searcher when
the index has been copied.
However, you are on Windows, and its FS has the tendency not to let you
delete/overwrite files that another app (Solr/java) has opened.  Are you
able to copy the index from U to Q?  How are you doing it?  Are you 
deleting
index files from the index dir on Q that are no longer in the index dir 
on

U?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, October 22, 2008 5:24:28 PM
Subject: Advice needed on master-slave configuration

Folks:

I have two instances of solr running one on the master (U) and the 
other

on
the slave (Q).  Q is used for queries only, while U is where
updates/deletes
are done.   I am running on Windows so unfortunately I cannot use the
distribution scripts.

Every N hours when changes are committed and the index on U is updated, 
I
want to copy the files from the master to the slave.Do I need to 
halt
the solr server on Q while the index is being updated?  If not,  how do 
I
copy the files into the data folder while the server is running? 
Any

pointers would be greatly appreciated!

Thanks!

- Bill









--
--Noble Paul



Re: Advice needed on master-slave configuration

2008-10-27 Thread William Pierce

Folks:

The replication handler works wonderfully!  Thanks all!   Now can someone 
point me at a wiki so I can submit a jira issue lobbying for the inclusion 
of this replication functionality in a 1.3 patch?


Thanks,
- Bill

--
From: "Noble Paul ??? ??" <[EMAIL PROTECTED]>
Sent: Thursday, October 23, 2008 10:34 PM
To: 
Subject: Re: Advice needed on master-slave configuration


It was committed on 10/21

take the latest 10/23 build
http://people.apache.org/builds/lucene/solr/nightly/solr-2008-10-23.zip

On Fri, Oct 24, 2008 at 2:27 AM, William Pierce <[EMAIL PROTECTED]> 
wrote:

I tried the nightly build from 10/18 -- I did the following:

a) I downloaded the nightly build of 10/18 (the zip file).

b) I unpacked it and copied the war file to my tomcat lib folder.

c) I made the relevant changes in the config files per the instructions
shown in the wiki.

When tomcat starts, I see the error message in tomcat logs...

Caused by: java.lang.ClassNotFoundException: solr.ReplicationHandler
   at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1358)
   at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1204)
   at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:258)
   ... 36 more

Where do I get the nightly bits that will enable me to try this 
replication

handler?

Thanks,
- Bill

--
From: "Noble Paul ??? ??" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 10:51 PM
To: 
Subject: Re: Advice needed on master-slave configuration


If you are using a nightly you can try the new SolrReplication feature
http://wiki.apache.org/solr/SolrReplication


On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]>
wrote:


Otis,

Yes,  I had forgotten that Windows will not permit me to overwrite 
files

currently in use.   So my copy scripts are failing.  Windows will not
even
allow a rename of a folder containing a file in use so I am not sure 
how

to
do this

I am going to dig around and see what I can come up with short of
stopping/restarting tomcat...

Thanks,
- Bill


--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 2:30 PM
To: 
Subject: Re: Advice needed on master-slave configuration

Normally you don't have to start Q, but only "reload" Solr searcher 
when

the index has been copied.
However, you are on Windows, and its FS has the tendency not to let 
you
delete/overwrite files that another app (Solr/java) has opened.  Are 
you

able to copy the index from U to Q?  How are you doing it?  Are you
deleting
index files from the index dir on Q that are no longer in the index 
dir

on
U?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, October 22, 2008 5:24:28 PM
Subject: Advice needed on master-slave configuration

Folks:

I have two instances of solr running one on the master (U) and the
other
on
the slave (Q).  Q is used for queries only, while U is where
updates/deletes
are done.   I am running on Windows so unfortunately I cannot use the
distribution scripts.

Every N hours when changes are committed and the index on U is 
updated,

I
want to copy the files from the master to the slave.Do I need to
halt
the solr server on Q while the index is being updated?  If not,  how 
do

I
copy the files into the data folder while the server is running? Any
pointers would be greatly appreciated!

Thanks!

- Bill









--
--Noble Paul







--
--Noble Paul



Re: Preferred Tomcat version on Windows 2003 (64 bits)

2008-11-06 Thread William Pierce
I am using tomcat 6.0.14 without any problems on windows 2003 R2 server.   I 
am also using the 1.3 patch (using the nightly build of 10/23) for 
master-slave replication... That's been working great!


-- Bill

--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Thursday, November 06, 2008 8:40 AM
To: 
Subject: Re: Preferred Tomcat version on Windows 2003 (64 bits)

I don't think there are preferences.  If going with the brand new setup 
why not go with Tomcat 6.0.
Also be aware that if you want master-slave setup Windows you will need to 
use post 1.3 version of Solr (nightly) that includes functionality from 
SOLR-561.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Jaco <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, November 6, 2008 11:32:04 AM
Subject: Preferred Tomcat version on Windows 2003 (64 bits)

Hello,

I am planning a brand new environment for Solr running on a Windows 2003
Server 64 bits platform. I want to use Tomcat, and was wondering whether
there is any preference in general for using Tomcat 5.5 or Tomcat 6.0 
with

Solr.

Any suggestions would be appreciated!

Thanks, bye,

Jaco.





Fatal exception in solr 1.3+ replication

2008-11-14 Thread William Pierce
Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the replication 
handler.   I am running on windows 2003 server with tomcat 6.0.14.   Everything 
was running fine until I noticed that certain updated records were not showing 
up on the slave.  Further investigation showed me that the failures have indeed 
been occurring since early this morning with a fatal exceptionhere is a 
segment of the tomcat log:
  INFO: Total time taken for download : 0 secs
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
  INFO: Conf files are not downloaded or are in sync
  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler doSnapPull
  SEVERE: SnapPull failed 
  org.apache.solr.common.SolrException: Snappull failed : 
   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
   at 
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
   at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
   at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
   ... 11 more
  Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is 
closed
   at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
   at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
   at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
   at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
   at 
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
   at 
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
   ... 14 more
  Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 commit

Any ideas, anyone?

-- Bill

Re: Fatal exception in solr 1.3+ replication

2008-11-15 Thread William Pierce

Mark,

Thanks for your response --- I do appreciate all you volunteers working to 
provide such a nice system!


Anyway,  I will try the trunk bits as you said.  The only problem is that 
the later the trunk I use from 1.3,  the more of post 1.3 capability I get. 
And I feel somewhat exposed running these bits in our production 
environment...


Not sure if there's another approach?

Thanks,
-Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Hey William, sorry about the trouble. I have to look at this further, but 
I think the issue is fixed if you grab the latest trunk build. Solr-465 
should inadvertently fix things - before that patch, a deprecated 
constructor for solrsearcher was being called - this constructor caused 
the underlying IndexReader to close its own Directory, and since 
IndexReaders are reopened, we don't want that.


Mark Miller wrote:
Looks like there might be an issue with the reopen - I'm not seeing what 
it could be offhand though. Have to find what could be closing a 
Directory unexpectedly...I'll try to take a further look over the 
weekend.


- Mark

William Pierce wrote:

Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the 
replication handler.   I am running on windows 2003 server with tomcat 
6.0.14.   Everything was running fine until I noticed that certain 
updated records were not showing up on the slave.  Further investigation 
showed me that the failures have indeed been occurring since early this 
morning with a fatal exceptionhere is a segment of the tomcat log:

  INFO: Total time taken for download : 0 secs
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

  INFO: Conf files are not downloaded or are in sync
  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
commit

  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
doSnapPull
  SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
Snappull failed :at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
   at 
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)

   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)

   at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed

   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
   at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)

   at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
   at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)

   ... 11 more
  Caused by: org.apache.lucene.store.AlreadyClosedException: this 
Directory is closed

   at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
   at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
   at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
   at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
   at 
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
   at 
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)

   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
   ... 14 more
  Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
commit


Any ideas, anyone?

-- Bill








Re: Fatal exception in solr 1.3+ replication

2008-11-15 Thread William Pierce

Mark,

Thanks for your help and lucid exposition of what may be going onI will 
await to hear from you before I plough ahead with the latest trunk bits


Best regards,

-- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Hang on. The more I look at this, the more I am thinking that was not the 
problem. Directories are pretty strictly managed by Lucene at the moment, 
and it should actually be pretty difficult to have one closed out from 
under you. They are singletons and reference counted. The IndexReader 
would have to have been closed if it was responsible for closing the 
Directory, and in that case, we would not be trying to reopen it. The 
Searcher we get the IndexReader from has been inc ref'd to ensure it won't 
close. All I can think is that something is grabbing/stealing a Directory 
that didn't directly ask for it from FSDirectory.getDirectory, and is then 
closing it. I'm trying to hunt where that could be happening now. Hope to 
spot something, but appears pretty mysterious at the moment. I suppose 
another option is a nasty little race condition - I've been trying to 
repeat the error by sending lots of update/search requests from multiple 
threads with no luck though. Looks like I may have to throw in snap puller 
code (havn't look too heavily into any of that before).


At worst, if/when a fix is discovered, you will probably be able to apply 
just the fix to the revision your working with.


- Mark

William Pierce wrote:

Mark,

Thanks for your response --- I do appreciate all you volunteers working 
to provide such a nice system!


Anyway,  I will try the trunk bits as you said.  The only problem is that 
the later the trunk I use from 1.3,  the more of post 1.3 capability I 
get. And I feel somewhat exposed running these bits in our production 
environment...


Not sure if there's another approach?

Thanks,
-Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Hey William, sorry about the trouble. I have to look at this further, 
but I think the issue is fixed if you grab the latest trunk build. 
Solr-465 should inadvertently fix things - before that patch, a 
deprecated constructor for solrsearcher was being called - this 
constructor caused the underlying IndexReader to close its own 
Directory, and since IndexReaders are reopened, we don't want that.


Mark Miller wrote:
Looks like there might be an issue with the reopen - I'm not seeing 
what it could be offhand though. Have to find what could be closing a 
Directory unexpectedly...I'll try to take a further look over the 
weekend.


- Mark

William Pierce wrote:

Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the 
replication handler.   I am running on windows 2003 server with tomcat 
6.0.14.   Everything was running fine until I noticed that certain 
updated records were not showing up on the slave.  Further 
investigation showed me that the failures have indeed been occurring 
since early this morning with a fatal exceptionhere is a segment 
of the tomcat log:

  INFO: Total time taken for download : 0 secs
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

  INFO: Conf files are not downloaded or are in sync
  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
commit

  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
doSnapPull
  SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
Snappull failed :at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
   at 
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)

   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)

   at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.RuntimeExc

Re: Fatal exception in solr 1.3+ replication

2008-11-15 Thread William Pierce
Trunk may actually still hide the issue (possibly), but something really 
funky seems to have gone on and I can't find it yet. Do you have any 
custom code interacting with solr?


None whatsoever...I am using out-of-the-box solr 1.3 (build of 10/23).  I am 
using my C# app to http requests to my solr instance.


Is there something you want me to try at my end that might give you a clue? 
Let me know and I can try to help out.


Best,

- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 10:59 AM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Havn't given up, but this has really got me so far. For every path, 
FSDirectory allows just one instance of FSDirectory to exist, and it keeps 
a ref count of how many have been returned from openDirectory for a given 
path. An FSDirectory will not actually be closed unless all references to 
it are released (it doesn't actually even do anything in close, other than 
drop the reference). So pretty much, the only way to get into trouble is 
to call close enough times to equal how many times you called 
openDirectory, and then try to use the FSDirectory again. This is what 
your stack trace indicates is happening.


So we can get the call hierarchy for directory.close() in solr, and we 
find that everything is pretty matched up...at worst it looks that a 
reference might not be closed - but that doesn't hurt 
anything...FSDirectory will just think there is something out there 
holding onto a reference for that directory and allow you to continue 
using it, even though no such reference might exist. Its only when enough 
closes are called that an instance will be marked as closed (a further 
openDirectory will return a new open instance). So to get your error, 
something has to close a Directory that it did not get from openDirectory.


We know that the IndexReader that is trying to be reopened must have 
called openDirectory (or something called openDirectory for it) and we 
know that it hasn't called close (the IndexReader has already held up to 
an ensureOpen call on itself in that stack trace). Something else must 
have closed it. I can't find this happening. Nothing else calls close on 
Directory unless it called openDirectory (that I can find using all of 
Eclipses magical goodness).


So how does the refcount on the Directory hit 0? I can't find or duplicate 
yet...


Trunk may actually still hide the issue (possibly), but something really 
funky seems to have gone on and I can't find it yet. Do you have any 
custom code interacting with solr?


- Mark

William Pierce wrote:

Mark,

Thanks for your help and lucid exposition of what may be going onI 
will await to hear from you before I plough ahead with the latest trunk 
bits


Best regards,

-- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Hang on. The more I look at this, the more I am thinking that was not 
the problem. Directories are pretty strictly managed by Lucene at the 
moment, and it should actually be pretty difficult to have one closed 
out from under you. They are singletons and reference counted. The 
IndexReader would have to have been closed if it was responsible for 
closing the Directory, and in that case, we would not be trying to 
reopen it. The Searcher we get the IndexReader from has been inc ref'd 
to ensure it won't close. All I can think is that something is 
grabbing/stealing a Directory that didn't directly ask for it from 
FSDirectory.getDirectory, and is then closing it. I'm trying to hunt 
where that could be happening now. Hope to spot something, but appears 
pretty mysterious at the moment. I suppose another option is a nasty 
little race condition - I've been trying to repeat the error by sending 
lots of update/search requests from multiple threads with no luck 
though. Looks like I may have to throw in snap puller code (havn't look 
too heavily into any of that before).


At worst, if/when a fix is discovered, you will probably be able to 
apply just the fix to the revision your working with.


- Mark

William Pierce wrote:

Mark,

Thanks for your response --- I do appreciate all you volunteers working 
to provide such a nice system!


Anyway,  I will try the trunk bits as you said.  The only problem is 
that the later the trunk I use from 1.3,  the more of post 1.3 
capability I get. And I feel somewhat exposed running these bits in our 
production environment...


Not sure if there's another approach?

Thanks,
-Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: 
Subject: Re: Fatal exception in solr 1.3+ replica

Re: Fatal exception in solr 1.3+ replication

2008-11-15 Thread William Pierce

Mark,

That sounds great!Good luck with the cleaning :-)

Let me know how I can get a patch --- I'd prefer not do a solr build from 
source since we are not Java savvy here:-(


- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 12:43 PM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Okay, I'm fairly certain I've found it. As usual, take a walk and the 
solution pops into your head out of the blue.


It looks like Lucene's IndexReader reopen call is not very friendly with 
the FSDirectory implementation. If you call reopen and it returns a new 
IndexReader, it creates a new reference on the Directory - so if you 
reopen an IndexReader that was originally opened with a non Directory 
parameter (String or File instead), both Readers (the reopened one and the 
one your reopening on) will close the Directory when they close. Thats not 
right. Thats how we get to 0 faster than we should. So its kind of a 
Lucene issue.


My guess that this is hidden in the trunk was right, because I think we 
are no longer using String, File based IndexReader opens, which means our 
IndexReaders don't attempt to close their underlying Directories now.


I can probably send you patch for the revision your on to hide this as 
well, but I'm already in the doghouse on cleaning right now ; ) The way my 
brain works, I'll probably be back to this later though.


- Mark


William Pierce wrote:
Trunk may actually still hide the issue (possibly), but something really 
funky seems to have gone on and I can't find it yet. Do you have any 
custom code interacting with solr?


None whatsoever...I am using out-of-the-box solr 1.3 (build of 10/23).  I 
am using my C# app to http requests to my solr instance.


Is there something you want me to try at my end that might give you a 
clue? Let me know and I can try to help out.


Best,

- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 10:59 AM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Havn't given up, but this has really got me so far. For every path, 
FSDirectory allows just one instance of FSDirectory to exist, and it 
keeps a ref count of how many have been returned from openDirectory for 
a given path. An FSDirectory will not actually be closed unless all 
references to it are released (it doesn't actually even do anything in 
close, other than drop the reference). So pretty much, the only way to 
get into trouble is to call close enough times to equal how many times 
you called openDirectory, and then try to use the FSDirectory again. 
This is what your stack trace indicates is happening.


So we can get the call hierarchy for directory.close() in solr, and we 
find that everything is pretty matched up...at worst it looks that a 
reference might not be closed - but that doesn't hurt 
anything...FSDirectory will just think there is something out there 
holding onto a reference for that directory and allow you to continue 
using it, even though no such reference might exist. Its only when 
enough closes are called that an instance will be marked as closed (a 
further openDirectory will return a new open instance). So to get your 
error, something has to close a Directory that it did not get from 
openDirectory.


We know that the IndexReader that is trying to be reopened must have 
called openDirectory (or something called openDirectory for it) and we 
know that it hasn't called close (the IndexReader has already held up to 
an ensureOpen call on itself in that stack trace). Something else must 
have closed it. I can't find this happening. Nothing else calls close on 
Directory unless it called openDirectory (that I can find using all of 
Eclipses magical goodness).


So how does the refcount on the Directory hit 0? I can't find or 
duplicate yet...


Trunk may actually still hide the issue (possibly), but something really 
funky seems to have gone on and I can't find it yet. Do you have any 
custom code interacting with solr?


- Mark

William Pierce wrote:

Mark,

Thanks for your help and lucid exposition of what may be going onI 
will await to hear from you before I plough ahead with the latest trunk 
bits


Best regards,

-- Bill

--
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication

Hang on. The more I look at this, the more I am thinking that was not 
the problem. Directories are pretty strictly managed by Lucene at the 
moment, and it should actually be pretty difficult to have one closed 
out from under you. They are singletons and reference counted. The 
IndexReader would have to have been closed if it was responsi

Re: Fatal exception in solr 1.3+ replication

2008-11-16 Thread William Pierce
Not easily no...It has occurred twice on my machine but what triggers it I 
do not know.  Mark Miller has provided some explanations for what may be 
going on in Lucene that may be causing thisCf. his last email


- Bill

--
From: "Noble Paul ??? ??" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 11:40 PM
To: 
Subject: Re: Fatal exception in solr 1.3+ replication


Is this issue visible for consistently ? I mean are you able to
reproduce this easily?

On Fri, Nov 14, 2008 at 11:15 PM, William Pierce <[EMAIL PROTECTED]> 
wrote:

Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the 
replication handler.   I am running on windows 2003 server with tomcat 
6.0.14.   Everything was running fine until I noticed that certain 
updated records were not showing up on the slave.  Further investigation 
showed me that the failures have indeed been occurring since early this 
morning with a fatal exceptionhere is a segment of the tomcat log:

 INFO: Total time taken for download : 0 secs
 Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

 INFO: Conf files are not downloaded or are in sync
 Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
commit

 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
doSnapPull

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Snappull failed :
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
  at 
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)

  at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
  at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)

  at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed

  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
  at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)

  at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)

  ... 11 more
 Caused by: org.apache.lucene.store.AlreadyClosedException: this 
Directory is closed

  at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
  at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
  at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
  at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
  at 
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
  at 
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)

  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
  ... 14 more
 Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
commit


Any ideas, anyone?

-- Bill




--
--Noble Paul



Re: Plans for 1.3.1?

2009-01-07 Thread William Pierce
That is fantastic!  Will the Java replication support be included in this 
release?


Thanks,

- Bill

--
From: "Ryan McKinley" 
Sent: Wednesday, January 07, 2009 11:42 AM
To: 
Subject: Re: Plans for 1.3.1?

there are plans for a regular release (1.4) later this month.  No  plans 
for bug fix release.


If there are critical bugs there would be a bug fix release, but not  for 
minor ones.



On Jan 7, 2009, at 11:06 AM, Jerome L Quinn wrote:



Hi, all.  Are there any plans for putting together a bugfix  release? 
I'm

not looking for particular bugs, but would like to know if bug fixes  are
only going to be done mixed in with new features.

Thanks,
Jerry Quinn





Re: Plans for 1.3.1?

2009-01-07 Thread William Pierce

Thanks, Ryan!

It is great that Solr replication (SOLR-561) is included in this release. 
One thing I want to confirm (if Noble, Shalin et al) can help:


I had encountered an issue a while back (in late October I believe) with 
using SOLR-561.  I was getting an error (AlreadyClosedException) from the 
slave code which caused the replication to fail.  I was wondering if this 
had been fixed.


Mark Miller had helped diagnose the problem and suggested a source code 
change.


http://www.nabble.com/forum/ViewPost.jtp?post=20505307&framed=y

Thanks,

- Bill




Re: Plans for 1.3.1?

2009-01-07 Thread William Pierce

Hi, Mark:

Thanks for the updateLooking forward to 1.4!

Cheers,

- Bill

--
From: "Mark Miller" 
Sent: Wednesday, January 07, 2009 4:48 PM
To: 
Subject: Re: Plans for 1.3.1?


William Pierce wrote:

Thanks, Ryan!

It is great that Solr replication (SOLR-561) is included in this 
release. One thing I want to confirm (if Noble, Shalin et al) can help:


I had encountered an issue a while back (in late October I believe) 
with using SOLR-561.  I was getting an error (AlreadyClosedException) 
from the slave code which caused the replication to fail.  I was 
wondering if this had been fixed.


Mark Miller had helped diagnose the problem and suggested a source 
code change.


http://www.nabble.com/forum/ViewPost.jtp?post=20505307&framed=y

Thanks,

- Bill


Hey Bill, Ill update you on this. It was a bug in Lucene that is 
sidestepped in solr 1.4 (as I mentioned in the original thread, a patch 
switched to using Lucene methods that don't tickle the bug), so it will 
be fixed in 1.4. Also though, the original bug was fixed in Lucene, and 
solr 1.4 will contain a version of Lucene with that fix, so we should be 
doubly fixed here ;)


- Mark



Re: Google Commerce Search

2010-01-17 Thread William Pierce
I have used solr extensively for our sites (and for the clients I work 
with).  I think it is great!  If you do an item-by-item feature list 
comparison,  I think you will find that solr stacks up quite well.  And the 
price, of course, cannot be beat!


However, there are a few intangibles that make me recommend (somewhat 
heretically) the google solution:


First:  No one got fired for recommending Google :-)

Second and more important:  In my experience getting search done is about 
95% tuning and tweaking and semantic understanding.  Only 5% or so is the 
actual part of getting your intended feature list working.   (The exact 
numbers may vary and you may debate it but search is largely a semantic 
problem, and those who excel at semantic analysis and can map that to the 
problem domain quickly and efficiently will win.)   I think Google excels at 
these intangibles in ways that no one has been able to match.


Let me give you an example from my own personal experience.We submit 
data feed of products from my clients to various shopping engines:  Froogle 
(from Google), shopping.com, Yahoo Shopping, etc etc.   Each week we get 
sales reports.  The differences between google and others is breathtaking: 
where the others generate may be a few hundred dollars in sales,  Froogle 
consistently outperforms them by a FACTOR (yes, that's right) of 10 or more. 
And neither shopping.com (owned by ebay) nor Yahoo are engineering slouches 
by any means!


The downsides of Google:  a) too much of your client's data is at google 
(adwords, product feeds, and now search patterns of their visitors).  b) 
cost.


Cheers,

- Bill

--
From: "mrbelvedr" 
Sent: Sunday, January 17, 2010 2:30 AM
To: 
Subject: Google Commerce Search



Our customer is a Fortune 5 big time company. They have millions of
vendors/products they work with daily. They have budget for whatever we
recommend but we like to use open source if it is a great alternative to
Google Search Appliance or Google Commerce Search.

Google has recently introduced "Google Commerce Search" which allows
ecommerce merchants to have their products indexed by Google and shoppers
may search for products easily.

Here is the URL of their new offering:

http://www.google.com/commercesearch/#utm_source=en-et-na-us-merchants&utm_medium=et&utm_campaign=merchants

Obviously this is a great solution. It offers all the great things like
spell checking, product synonyms, etc.  Is Solr able to do these features:

* Index our MS Sql Server 2008 product table

* Spell check for product brand names - user enters brand "sharpee" and 
the

search engine will reply "Did you mean 'Sharpie'? "

* We have 2 million products stored in our MS Sql Server 2008, will Solr
handle that many products and give fast search results?

Please advise if Solr will work as well as Google product?

Thx!
--
View this message in context: 
http://old.nabble.com/Google-Commerce-Search-tp27197509p27197509.html

Sent from the Solr - User mailing list archive at Nabble.com.




Re: strategy for snapshottig Solr data directory on EC2

2010-01-24 Thread William Pierce

Our setup on ec2 is as follows:

a) mysql master on ebs volume.
b) solr master on its own ebs volume
c) solr slaves do not use ebs -- but rather use the ephemeral instance 
stores.  There is a small period of time where the solr slave has to re-sync 
the data from the solr master.


Cheers,

Bill

--
From: "athir nuaimi" 
Sent: Sunday, January 24, 2010 7:35 AM
To: 
Subject: strategy for snapshottig Solr data directory on EC2

We are running two Solr servers (master/slave) on EC2 and have the solr 
home directories on EBS drives that we snapshot every 12 hours.  While 
that will mean that we will lose at most 12 hours of data, I wondered if 
there was a way I could reduce the window of data loss.   With our mysql 
servers, we snapshot every 12 hours but also copy the binary logs to S3 
every 5 minutes.


We are doing commits every 10 minutes on the master and will be using the 
built-in java replication (today we are using snapshotting to replicate 
but are in the process of migrating from 1.3 to 1.4).


On a related note, are we doing the right thing in having our slave solr 
home directory on an EBS volume?  If the slave were to die and we had to 
create a fresh one, will it just resync the entire index from the master? 
is the reason to have the slave on an EBS volume so that the slave has 
less data to resync on startup?


thanks in advance
Athir 




Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce
Hi,

I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 
6.0.14 but same problems exist).  Here is the situation:  I have an ASP.net 
application where I am trying to  and  a single document to an 
index.   After I add the document and issue the  I can see (in the 
solr stats page) that the commit count has been increment but the docsPending 
is 1,  and my document is still not visible from a search perspective. 

When I issue another ,  the commit counter increments,  docsPending is 
now zero,  and my document is visible and searchable.

I saw that someone was observing problems with 6.0.16 tomcat,  so I reverted 
back to 6.0.14.  Same problem.

Can anyone help?

-- Bill

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce

Thanks for the comments

The reason I am just adding one document followed by a commit is for this 
particular test --- in actuality,  I will be loading documents from a db. 
But thanks for the pointer on the ?commit=true on the add command.


Now on the  problem itself,  I am still confused:  Doesn't the 
commit count of 1 indicate that the commit is completed?


In any event,  just for testing purposes,  I started everything from scratch 
(deleted all documents, stopped/restarted tomcat).  I noticed that the only 
files in my index folder were:  segments.gen and segments_1.


Then I did the add followed by  and noticed that there were now 
three files:  segments.gen, segments_1 and write.lock.


Now it is 7 minutes later, and when I query the index using the 
"http://localhost:59575/splus1/admin/"; url, I still do not see the document.


Again, when I issue another  command everything seems to work. 
Why are TWO commit commands apparently required?


Thanks,

Sridhar

--
From: "Yonik Seeley" <[EMAIL PROTECTED]>
Sent: Tuesday, May 13, 2008 6:42 AM
To: 
Subject: Re: Commit problems on Solr 1.2 with Tomcat


By default, a commit won't return until a new searcher has been opened
and the results are visible.
So just make sure you wait for the commit command to return before 
querying.


Also, if you are committing every add, you can avoid a separate commit
command by putting ?commit=true in the URL of the add command.

-Yonik

On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim
<[EMAIL PROTECTED]> wrote:

Maybe a delay in commit? How may time elapsed between commits?

 2008/5/13 William Pierce <[EMAIL PROTECTED]>:



 > Hi,
 >
 > I am having problems with Solr 1.2 running tomcat version 6.0.16 (I 
also
 > tried 6.0.14 but same problems exist).  Here is the situation:  I have 
an

 > ASP.net application where I am trying to  and  a single
 > document to an index.   After I add the document and issue the /> I
 > can see (in the solr stats page) that the commit count has been 
increment

 > but the docsPending is 1,  and my document is still not visible from a
 > search perspective.
 >
 > When I issue another ,  the commit counter increments,
 >  docsPending is now zero,  and my document is visible and searchable.
 >
 > I saw that someone was observing problems with 6.0.16 tomcat,  so I
 > reverted back to 6.0.14.  Same problem.
 >
 > Can anyone help?
 >
 > -- Bill




 --
 Alexander Ramos Jardim





Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce

Erik:  I am indeed issuing multiple Solr requests.

Here is my code snippet (deletexml and addxml are the strings that contain 
the  and  strings for the items to be added or deleted).   For 
our simple example,  nothing is being deleted so "stufftodelete" is always 
false.


//we are done...we now need to post the requests...
   if (stufftodelete)
   {
   SendSolrIndexingRequest(deletexml);
   }
   if (stufftoadd)
   {
   SendSolrIndexingRequest(addxml);
   }

   if ( stufftodelete || stufftoadd)
   {
   SendSolrIndexingRequest("waitSearcher=\"true\"/>");

   }

I am using the full form of the commit here just to see if the  
was somehow not working.


The SendSolrIndexingRequest is the routine that takes the string argument 
and issues the POST request to the update URL.


Thanks,

Bill

--
From: "Erik Hatcher" <[EMAIL PROTECTED]>
Sent: Tuesday, May 13, 2008 7:40 AM
To: 
Subject: Re: Commit problems on Solr 1.2 with Tomcat

I'm not sure if you are issuing a separate  _request_ after  your 
, or putting a  into the same request.  Solr only  supports 
one command (add or commit, but not both) per request.


Erik


On May 13, 2008, at 10:36 AM, William Pierce wrote:


Thanks for the comments

The reason I am just adding one document followed by a commit is  for 
this particular test --- in actuality,  I will be loading  documents from 
a db. But thanks for the pointer on the ?commit=true  on the add command.


Now on the  problem itself,  I am still confused:   Doesn't the 
commit count of 1 indicate that the commit is completed?


In any event,  just for testing purposes,  I started everything  from 
scratch (deleted all documents, stopped/restarted tomcat).  I  noticed 
that the only files in my index folder were:  segments.gen  and 
segments_1.


Then I did the add followed by  and noticed that there  were 
now three files:  segments.gen, segments_1 and write.lock.


Now it is 7 minutes later, and when I query the index using the 
"http://localhost:59575/splus1/admin/"; url, I still do not see the 
document.


Again, when I issue another  command everything seems to  work. 
Why are TWO commit commands apparently required?


Thanks,

Sridhar

--
From: "Yonik Seeley" <[EMAIL PROTECTED]>
Sent: Tuesday, May 13, 2008 6:42 AM
To: 
Subject: Re: Commit problems on Solr 1.2 with Tomcat


By default, a commit won't return until a new searcher has been  opened
and the results are visible.
So just make sure you wait for the commit command to return before 
querying.


Also, if you are committing every add, you can avoid a separate  commit
command by putting ?commit=true in the URL of the add command.

-Yonik

On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim
<[EMAIL PROTECTED]> wrote:

Maybe a delay in commit? How may time elapsed between commits?

 2008/5/13 William Pierce <[EMAIL PROTECTED]>:



 > Hi,
 >
 > I am having problems with Solr 1.2 running tomcat version  6.0.16 (I 
also
 > tried 6.0.14 but same problems exist).  Here is the  situation:  I 
have an
 > ASP.net application where I am trying to  and  a 
single
 > document to an index.   After I add the document and issue the 
 I
 > can see (in the solr stats page) that the commit count has  been 
increment
 > but the docsPending is 1,  and my document is still not  visible 
from a

 > search perspective.
 >
 > When I issue another ,  the commit counter increments,
 >  docsPending is now zero,  and my document is visible and 
searchable.

 >
 > I saw that someone was observing problems with 6.0.16 tomcat,   so I
 > reverted back to 6.0.14.  Same problem.
 >
 > Can anyone help?
 >
 > -- Bill




 --
 Alexander Ramos Jardim






Some advice on scalability

2008-05-15 Thread William Pierce
Folks:

We are building a search capability into our web and plan to use Solr.  While 
we have the initial prototype version up and running on Solr 1.2,  we are now 
turning our attention to sizing/scalability.  

Our app in brief:  We get merchant sku files (in either xml/csv) which we 
process and index and make available to our site visitors to search.   Our 
current plan calls for us to support approx 10,000 merchants each with an 
average of 50,000 sku's.   This will make a total of approx 500 Million SKUs.  
In addition,  we assume that on a daily basis approx 5-10% of the SKUs need to 
be updated (either added/deleted/modified).   (Assume each sku will be approx 
4K)

Here are a few questions that we are thinking about and would value any 
insights you all may have:

a) Should we have just one giant master index (containing all the sku's) and 
then have multiple slaves to handle the search queries?In this case, the 
master index will be approx 2 TB in size.  Not being an expert in solr/lucene,  
I am thinking that this may be a bad idea to let one index become so large.   
What size limit should we assume for each index?

b) Or, should we partition the 10,000 merchants into N buckets and have a 
master index for each of the N buckets?   We could partition the merchants 
depending on their type or some other simple algorithm.   Then,  we could have 
slaves setup for each of the N masters.  The trick here will be to partition 
the merchants carefully.  Ideally we would like a search for any product type 
to hit only one index but this may not be possible always.   For example, a 
search for "Harry Potter" may result in hits in "books", "dvds", "memorabilia", 
etc etc.  

With N masters we will have to plan for having a distributed search across the 
N indices (and then some mechanism for weighting the results across the results 
that come back).   Any recommendations for a distributed search solution?   I 
saw some references to Katta.  Is this viable?

In the extreme case, we could have one master for each of the merchants (if 
there are 1 merchants there will be 10,000 master indices).   The advantage 
here is that indices will have to be updated only for every merchant who 
submits a new data file.  The others remain unchanged.

c) By the way,  for those of you who have deployed solr on a production 
environment can you give me your hardware configuration and the rough number of 
search queries that can be handled per second by a single solr instance -- 
assuming a dedicated box?

d) Our plan is to release a beta version Spring 2009.  Should we plan on using 
Solr 1.2 or else move to solr 1.3 now?

Any insights/thoughts/whitepapers will be greatly appreciated!

Cheers,

Bill





Re: Some advice on scalability

2008-05-15 Thread William Pierce

Otis:

I will take a look at the DistributedSearch page on solr wiki.

Thanks,

Bill

--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Thursday, May 15, 2008 12:54 PM
To: 
Subject: Re: Some advice on scalability


Bill,

Quick feedback:

1) use 1.3-dev or 1.3 when it comes out, not 1.2

2) you did not mention Solr's distributed search functionality explicitly, 
so I get a feeling you are not aware of it.  See DistributedSearch page on 
the Solr wiki


3) you definitely don't want a single 500M docs index that's 2TB in size - 
think about the index size : RAM amount ratio


4) you can try logically sharding your index, but I suspect that will 
result in uneven term distribution that will not yield optimal 
relevancy-based ordering.  Instead, you may have to assign 
records/documents to shards in some more random fashion (see ML archives 
for some recent discussion on this (search for MD5 and SHA-1 -- Lance, 
want to put that on the Wiki?)



5) Hardware recommendations are hard to do.  While people may make 
suggestions, the only way to know how *your* hardware works with *your* 
data and *your* shards and *your* type of queries is by benchmarking.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 

From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, May 15, 2008 12:23:03 PM
Subject: Some advice on scalability

Folks:

We are building a search capability into our web and plan to use Solr. 
While we
have the initial prototype version up and running on Solr 1.2,  we are 
now

turning our attention to sizing/scalability.

Our app in brief:  We get merchant sku files (in either xml/csv) which we
process and index and make available to our site visitors to search. 
Our

current plan calls for us to support approx 10,000 merchants each with an
average of 50,000 sku's.   This will make a total of approx 500 Million 
SKUs.
In addition,  we assume that on a daily basis approx 5-10% of the SKUs 
need to
be updated (either added/deleted/modified).   (Assume each sku will be 
approx

4K)

Here are a few questions that we are thinking about and would value any 
insights

you all may have:

a) Should we have just one giant master index (containing all the sku's) 
and
then have multiple slaves to handle the search queries?In this case, 
the
master index will be approx 2 TB in size.  Not being an expert in 
solr/lucene,
I am thinking that this may be a bad idea to let one index become so 
large.

What size limit should we assume for each index?

b) Or, should we partition the 10,000 merchants into N buckets and have a 
master
index for each of the N buckets?   We could partition the merchants 
depending on
their type or some other simple algorithm.   Then,  we could have slaves 
setup
for each of the N masters.  The trick here will be to partition the 
merchants
carefully.  Ideally we would like a search for any product type to hit 
only one
index but this may not be possible always.   For example, a search for 
"Harry

Potter" may result in hits in "books", "dvds", "memorabilia", etc etc.

With N masters we will have to plan for having a distributed search 
across the N
indices (and then some mechanism for weighting the results across the 
results
that come back).   Any recommendations for a distributed search solution? 
I

saw some references to Katta.  Is this viable?

In the extreme case, we could have one master for each of the merchants 
(if
there are 1 merchants there will be 10,000 master indices).   The 
advantage
here is that indices will have to be updated only for every merchant who 
submits

a new data file.  The others remain unchanged.

c) By the way,  for those of you who have deployed solr on a production
environment can you give me your hardware configuration and the rough 
number of
search queries that can be handled per second by a single solr 
instance -- 
assuming a dedicated box?


d) Our plan is to release a beta version Spring 2009.  Should we plan on 
using

Solr 1.2 or else move to solr 1.3 now?

Any insights/thoughts/whitepapers will be greatly appreciated!

Cheers,

Bill