Re: Indexing HTML and other doc types

2007-07-04 Thread Peter Manis

A coworker of mine posted the code that we used for adding pdf, doc, xls,
etc documents into solr.  You can find the files at the following location.

https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Just apply the patch and put the lib files in the lib directory, run `ant
compile`, yada yada and you should be good to go.  If the build fails update
to revision 552853, that is the latest revision I have compiled with the
patch so I know it works.  Usually if the build fails it is something
unrelated to Eric's code and will be fixed in a new few revisions.

.

Peter Manis

On 7/3/07, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:


Solr looks very good for indexing and searching strcutured data.
But I noticed there is no tool in the Solr distribution with which
documents
of other doc types can be indexed.  Are there other side projects that
develop Solr clients for indexing documents of other doc types?

Or is the generic full-text search really a wrong area to apply Solr, and
should I be using something like Nutch?
-kuro



java.lang.ExceptionInInitializerError - Can't find resource 'solrconfig.xml'

2007-07-04 Thread Karen Loughran


Hi there,

I have a standalone java/solr embedded application (based on Embedded Solr).
I can call it from the command prompt by passing solr.home as system property 
( -Dsolr.solr.home=/opt/all/solr ) and all works fine.

But if I put a webservice infront of it ,which essentially provides an webapp 
api to the standalone (deployed in tomcat 5.5.23) and set up solr.home via 
JNDI I get the exception below.   The trace indicates that it is correctly 
using JNDI solr.home "/opt/all/solr" which does have conf directory with 
solrconfig.xml, etc.  It is the same solr.home which works for testing 
standalone.  But the stack trace below reports that it can't find 
solrconfig.xml at this location.

Any ideas as to what is happening ? 

Many Thanks
Karen

PS. Using Solr 1.2



04-Jul-2007 11:16:39 org.apache.solr.core.Config getInstanceDir
INFO: Using JNDI solr.home: "/opt/all/solr"
04-Jul-2007 11:16:39 org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to '"/opt/all/solr"/'
java.lang.ExceptionInInitializerError
at 
org.apache.solr.update.SolrIndexConfig.(SolrIndexConfig.java:36)
at org.apache.solr.core.SolrCore.(SolrCore.java:84)
at uk.ac.besc.prism.searcher.impl.searcher.(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.axis.utils.ClassUtils$1.run(ClassUtils.java:127)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.axis.utils.ClassUtils.forName(ClassUtils.java:122)
at org.apache.axis.utils.cache.ClassCache.lookup(ClassCache.java:85)
at 
org.apache.axis.providers.java.JavaProvider.getServiceClass(JavaProvider.java:428)
at 
org.apache.axis.providers.java.JavaProvider.initServiceDesc(JavaProvider.java:461)
at 
org.apache.axis.handlers.soap.SOAPService.getInitializedServiceDesc(SOAPService.java:286)
at 
org.apache.axis.deployment.wsdd.WSDDService.makeNewInstance(WSDDService.java:500)
at 
org.apache.axis.deployment.wsdd.WSDDDeployableItem.getNewInstance(WSDDDeployableItem.java:274)
at 
org.apache.axis.deployment.wsdd.WSDDDeployableItem.getInstance(WSDDDeployableItem.java:260)
at 
org.apache.axis.deployment.wsdd.WSDDDeployment.getService(WSDDDeployment.java:427)
at 
org.apache.axis.configuration.FileProvider.getService(FileProvider.java:231)
at org.apache.axis.AxisEngine.getService(AxisEngine.java:311)
at 
org.apache.axis.MessageContext.setTargetService(MessageContext.java:756)
at org.apache.axis.handlers.http.URLMapper.invoke(URLMapper.java:50)
at 
org.apache.axis.handlers.http.URLMapper.generateWSDL(URLMapper.java:58)
at 
org.apache.axis.strategies.WSDLGenStrategy.visit(WSDLGenStrategy.java:33)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.generateWSDL(SimpleChain.java:104)
at org.apache.axis.server.AxisServer.generateWSDL(AxisServer.java:446)
at 
org.apache.axis.transport.http.QSWSDLHandler.invoke(QSWSDLHandler.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.axis.transport.http.AxisServlet.processQuery(AxisServlet.java:1226)
at 
org.apache.axis.transport.http.AxisServlet.doGet(AxisServlet.java:249)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
at 
org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:327)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.r

Can solr search any lucene index?

2007-07-04 Thread Saurabh Dani

Just like Luke, can Solr search any Lucene index by just changing
"something" in the configuration or Solr stores any specific information in
the indexes which must be there in order to do searches using Solr?

Thanks
Saurabh


Re: Processor load

2007-07-04 Thread Yonik Seeley

On 7/3/07, Michael Thessel <[EMAIL PROTECTED]> wrote:

--
 208973 SEVERE: Error during auto-warming of
key:[EMAIL PROTECTED]:java
 208974 at
org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:97)
 208975 at


The Exception and exception message seem to be missing from the stack trace.
Could you check your log and see if they are all like this?

-Yonik


Re: Not enough space

2007-07-04 Thread Yonik Seeley

On 7/4/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: I set up solr1.2 to run snapshooter each time after a commit/optimize.
: It worked fine for a while, but later I got the error message below
: after sending the commit request. It seems jboss(4.0.GA) had problem
: running snapshooter. The index size is 290m, the file system that solr
: data directory is on has 2g free space. The swap space(/tmp) has 420m

The message from the native code used by your JVM to do the forkAndExec
seems to be a little missleading, googling for "IOException: Not enough
space forkAndExec" turned up a bunch of examples where people have seem to
have run into similar problems and the root cause is not enough swap.

You may have 420MB of swap available, and your index size may only be
290MB but the size of your index isn't the issue -- what matters is that
in order to to exec shapshooter, the JVM has to fork first, which causes
your OS to make a copy of all the memory allocated to your JVM -- so
whatever your heap size is, unless you have that much free mem at the time
the forkAndExec call is made, it will try to use swap, if that much swap
isn't available, it will fail with this error.


Right... except it's not as expensive as it sounds.  OS's like Linux
don't actually copy all the memory, they implement copy-on-write so
all that physical ram is not used (except for the extra page tables)
and swap is not actually used.  But because at the time of fork() the
kernel doesn't know if it will actually be followed by exec(), it
checks if enough memory+swap *would* be available if needed (to avoid
memory overcommit).

-Yonik


Re: java.lang.ExceptionInInitializerError - Can't find resource 'solrconfig.xml'

2007-07-04 Thread Chris Hostetter

: But if I put a webservice infront of it ,which essentially provides an webapp
: api to the standalone (deployed in tomcat 5.5.23) and set up solr.home via
: JNDI I get the exception below.   The trace indicates that it is correctly
: using JNDI solr.home "/opt/all/solr" which does have conf directory with

note the quote characters in the logging and read further down in the
stack trace...

: 04-Jul-2007 11:16:39 org.apache.solr.core.Config getInstanceDir
: INFO: Using JNDI solr.home: "/opt/all/solr"
: 04-Jul-2007 11:16:39 org.apache.solr.core.Config setInstanceDir
: INFO: Solr home set to '"/opt/all/solr"/'
...
: Caused by: java.lang.RuntimeException: Error in solrconfig.xml
:   at org.apache.solr.core.SolrConfig.(SolrConfig.java:90)
:   ... 49 more
: Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
: classpath or '"/opt/all/solr"/conf/', cwd=/opt/all/apache-tomcat-5.5.23


...it looks like when you set solr home using JNDI, the mechanism you are
using is actaully putting the quote characters in the value, so it's
trying to find a file with the path...
"/opt/all/solr"/conf/




-Hoss



Re: differen locations for config files and Data files if using Java System Properties

2007-07-04 Thread Chris Hostetter

: in solrconf.xml I found this entry, which is now uncomented
:   ${solr.data.dir:./solr/data}
: before it was
:   
:
: Don't know if this is the desired behaviour. How should I change the entry
: not to have the data in the working directory and not to uncomment the entry

that file is an *example* of what a solrconfig.xml might look like, it
serves as a good starting point for you to make your own solrconfig.xml to
customize -- but when upgrading solr you should not throw out your
existing solrconfig.xml and use the latest example (anymore then you
should throw away your schema.xml)

keep using your old solrconfig.xml and everything will be fine.



-Hoss



Re: differen locations for config files and Data files if using Java System Properties

2007-07-04 Thread Ryan McKinley



in solrconf.xml I found this entry, which is now uncomented
  ${solr.data.dir:./solr/data} 
before it was 
  

Don't know if this is the desired behaviour. How should I change the entry
not to have the data in the working directory and not to uncomment the entry
!


Just comment out the dataDir bit and the path will be relative to solr 
home.  If you leave it in, it will be relative to the running directory 
so use an absolute path.


I ran in to this problem a while ago... we talked about changing it, but 
i don't think anything has happened yet.


http://www.nabble.com/Re%3A-svn-commit%3A-r544356lucene-solr-trunk-example-solr-conf-solrconfig.xml-tf3905591.html#a11073210

ryan




Re: Can solr search any lucene index?

2007-07-04 Thread Ryan McKinley

Saurabh Dani wrote:

Just like Luke, can Solr search any Lucene index by just changing
"something" in the configuration or Solr stores any specific information in
the indexes which must be there in order to do searches using Solr?



solr uses regular lucene indexes.  It can search an index created elsewhere.

The only hitch is to make sure the analyzers in solrconfig.xml match the 
analyzers used to create the index.


Re: java.lang.ExceptionInInitializerError - Can't find resource 'solrconfig.xml'

2007-07-04 Thread Karen Loughran

Chris,
When I remove the surrounding quotes from solr path in my web.xml it works !
Thanks for your help
Karen


On Wednesday 04 July 2007 16:56:40 Chris Hostetter wrote:
> : But if I put a webservice infront of it ,which essentially provides an
> : webapp api to the standalone (deployed in tomcat 5.5.23) and set up
> : solr.home via JNDI I get the exception below.   The trace indicates that
> : it is correctly using JNDI solr.home "/opt/all/solr" which does have conf
> : directory with
>
> note the quote characters in the logging and read further down in the
> stack trace...
>
> : 04-Jul-2007 11:16:39 org.apache.solr.core.Config getInstanceDir
> : INFO: Using JNDI solr.home: "/opt/all/solr"
> : 04-Jul-2007 11:16:39 org.apache.solr.core.Config setInstanceDir
> : INFO: Solr home set to '"/opt/all/solr"/'
>
>   ...
>
> : Caused by: java.lang.RuntimeException: Error in solrconfig.xml
> : at org.apache.solr.core.SolrConfig.(SolrConfig.java:90)
> : ... 49 more
> : Caused by: java.lang.RuntimeException: Can't find resource
> : 'solrconfig.xml' in classpath or '"/opt/all/solr"/conf/',
> : cwd=/opt/all/apache-tomcat-5.5.23
>
> ...it looks like when you set solr home using JNDI, the mechanism you are
> using is actaully putting the quote characters in the value, so it's
> trying to find a file with the path...
>   "/opt/all/solr"/conf/
>
>
>
>
> -Hoss




Re: Can solr search any lucene index?

2007-07-04 Thread Saurabh Dani

Thanks will give it a try.

Should we worry about disabling any index creation / updating / commit
configurations, if they run on schedule / start-up to avoid conflicts with
our update code, or all the index update operations must be explicitly
called (in such a case, we will not have to worry about those?)

Thanks again.


On 7/4/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:


Saurabh Dani wrote:
> Just like Luke, can Solr search any Lucene index by just changing
> "something" in the configuration or Solr stores any specific information
in
> the indexes which must be there in order to do searches using Solr?
>

solr uses regular lucene indexes.  It can search an index created
elsewhere.

The only hitch is to make sure the analyzers in solrconfig.xml match the
analyzers used to create the index.