Re: Newbie question about search

2008-02-23 Thread x8nnn

I checked docsPending. I get following
commits : 0
autocommit maxDocs : 1
autocommit maxTime : 1000ms
autocommits : 0
optimizes : 0
docsPending : 0
deletesPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0

Most surprising is Once I add document I see numDocs and maxDoc increasing.
But I can not search them using admin console.

x8nnn wrote:
> 
> Recently I installed Solr.
> 
> I made changes to schema.xml, added following entries
> 
> 
>
>
>
>
> 
> Now I post a document like this:
> 0A0A1BC3:01183F59ADDC:CBFA:008AEED0
> 
>  Interoperability Demonstration Project Report
> 
> 
> 
> 
> 110 page of text...
> 
> 
> 
> 
> Once I post it I see following entry in my catalina.out. However when I go
> to solr search page and try to search any token in content sectionI do not
> get any thing returned. basically
> 
> 
> 
> 
> am I missing something?
> 
> SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8, other encodings are not currently supported
> Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
> update
> INFO: added id={0A0A1BC3:01183F59ADDC:CBFA:008AEED0} in 187ms
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
> INFO: /update  0 202
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> doDeletions
> INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> doDeletions
> INFO: DirectUpdateHandler2 docs deleted=0
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] main
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: Registered new searcher [EMAIL PROTECTED] main
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: end_commit_flush
> Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
> update
> INFO: commit 0 56
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
> INF

RE: Indexing very large files.

2008-02-23 Thread Jon Lehto
Dave

You may want to break large docs into chunks, say by chapter or other
logical segment.

This will help in 
 - relevance ranking - the term frequency of large docs will cause
   uneven weighting unless the relevance calculation does log normalization
 - finer granularity of retrieval - for example a dictionary, thesaurus, and
   Encyclopedia probably have what you want, but how to get it quickly?
 - post-processing - like high-lighting, can be a performance killer, as the
   search/replace scans the entire large file for matching strings

Jon

-Original Message-
From: David Thibault [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 21, 2008 7:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing very large files.

All,
A while back I was running into an issue with a Java heap out of memory
error while indexing large files.  I figured out that was my own error due
to a misconfiguration of my Netbeans memory settings.

However, now that is fixed and I have stumbled upon a new error.  When
trying to upload files which include a Solr TextField value of 32MB or more
in size, I get the following error (uploading with SimplePostTool):


Solr returned an error: error reading input, returned 0
javax.xml.stream.XMLStreamException: error reading input, returned 0  at
com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3709)  at
com.bea.xml.stream.MXParser.more(MXParser.java:3715)  at
com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1936)  at
com.bea.xml.stream.MXParser.next(MXParser.java:1333)  at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(
XmlUpdateRequestHandler.java:318)  at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(
XmlUpdateRequestHandler.java:195)  at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
XmlUpdateRequestHandler.java:123)  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:117)  at org.apache.solr.core.SolrCore.execute(
SolrCore.java:902)  at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:280)  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
237)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:235)  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:206)  at
org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:233)  at
org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:175)  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)
 at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:109)  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)  at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)  at
java.lang.Thread.run(Thread.java:613)

I suspect there's a setting somewhere that I'm overlooking that is causing
this, but after peering through the solrconfig.xml and schema.xml files I am
not seeing anything obvious (to me, anyway...=).  The second line of the
error shows it's crashing in MXParser.fillBuf, which implies that I'm
overloading the buffer (I assume due to too large of a string).

Thanks in advance for any assistance,
Dave



Re: Newbie question about search

2008-02-23 Thread Reece
Not sure what's wrong, but I'll share some of the debugging I did when
getting my implementation to work these past 2 weeks:

1) Change schema.xml to suit your needs.  I basically just changed the
fields to ones I needed and didn't touch the fieldtypes at first.

2) Stop SOLR, delete the index, and start it again so it rebuilds the
index according to the changed schema.  Otherwise it won't recognize
the field names and will try to save your data in the little
"catch-all" field things which I don't really know much about.

3) insert some data using XML post, but be *sure* to check the
response code.  Anything except 200 usually means it didn't go through
- and view the response body for the actual error.  I had to run
through some weird errors - like it complains that it doesn't
recognize the encoding "utf-8" if you specify that in the headers.
Instead, specify it in the doctype of the XML you post.

4) Ensure it says the document was added and committed in the stats

5) search on just the id - just to verify you can find it at all.

Once you can get it to return something, you can build on that.

-Reece



On Sat, Feb 23, 2008 at 8:05 AM, x8nnn <[EMAIL PROTECTED]> wrote:
>
>  I checked docsPending. I get following
>  commits : 0
>  autocommit maxDocs : 1
>  autocommit maxTime : 1000ms
>  autocommits : 0
>  optimizes : 0
>  docsPending : 0
>  deletesPending : 0
>  adds : 0
>  deletesById : 0
>  deletesByQuery : 0
>
>  Most surprising is Once I add document I see numDocs and maxDoc increasing.
>  But I can not search them using admin console.
>
>
>
> x8nnn wrote:
>  >
>  > Recently I installed Solr.
>  >
>  > I made changes to schema.xml, added following entries
>  >
>  > 
>  >
>  >
>  >
>  >
>  >
>  > Now I post a document like this:
>  > 0A0A1BC3:01183F59ADDC:CBFA:008AEED0
>  > 
>  >  Interoperability Demonstration Project Report
>  > 
>  > 
>  > 
>  >
>  > 110 page of text...
>  >
>  > 
>  > 
>  >
>  > Once I post it I see following entry in my catalina.out. However when I go
>  > to solr search page and try to search any token in content sectionI do not
>  > get any thing returned. basically
>  >
>  > 
>  >
>  >
>  > am I missing something?
>  >
>  > SimplePostTool: WARNING: Make sure your XML documents are encoded in
>  > UTF-8, other encodings are not currently supported
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
>  > update
>  > INFO: added id={0A0A1BC3:01183F59ADDC:CBFA:008AEED0} in 187ms
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
>  > INFO: /update  0 202
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > commit
>  > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > doDeletions
>  > INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
>  > INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > doDeletions
>  > INFO: DirectUpdateHandler2 docs deleted=0
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
>  > INFO: Opening [EMAIL PROTECTED] main
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming result for [EMAIL PROTECTED] main
>  >
>  > 
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming result for [EMAIL PROTECTED] main
>  >
>  > 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb

will hardlinks work across partitions?

2008-02-23 Thread Brian Whitman
Will the hardlink snapshot scheme work across physical disk  
partitions? Can I snapshoot to a different partition than the one  
holding the live solr index?


Re: Apache web server logs in solr

2008-02-23 Thread Kim Pepper

Hi Andrew,

I thought the same thing. Any feedback from your question?

-- Kim
-- 
View this message in context: 
http://www.nabble.com/Apache-web-server-logs-in-solr-tp12280450p15660102.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing very large files.

2008-02-23 Thread David Thibault
Thanks.  I'm trying to do a general purpose secure enterprise search system.
 Specifically, it needs to be able to crawl web pages (which are almost all
small files) and filesystems (which may have widely varying file sizes).  I
realize other projects exist that have done similar, but none take into
account the original file permissions, index those too, and then limit
search results to documents that the searching party should have access to
(and hiding results that the searcher should not have access to).  Since the
types of files are not known in advance, I can't exactly split them up into
logical units.  I could possibly just limit my indexing to the first X mb of
any file, though.  I hadn't thought of the implications for relevance or
post-processing that you bring up above.
Thanks,
Dave

On 2/23/08, Jon Lehto <[EMAIL PROTECTED]> wrote:
>
> Dave
>
> You may want to break large docs into chunks, say by chapter or other
> logical segment.
>
> This will help in
>   - relevance ranking - the term frequency of large docs will cause
>uneven weighting unless the relevance calculation does log
> normalization
>   - finer granularity of retrieval - for example a dictionary, thesaurus,
> and
>Encyclopedia probably have what you want, but how to get it quickly?
>   - post-processing - like high-lighting, can be a performance killer, as
> the
>search/replace scans the entire large file for matching strings
>
>
> Jon
>
>
> -Original Message-
> From: David Thibault [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 21, 2008 7:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing very large files.
>
> All,
> A while back I was running into an issue with a Java heap out of memory
> error while indexing large files.  I figured out that was my own error due
> to a misconfiguration of my Netbeans memory settings.
>
> However, now that is fixed and I have stumbled upon a new error.  When
> trying to upload files which include a Solr TextField value of 32MB or
> more
> in size, I get the following error (uploading with SimplePostTool):
>
>
> Solr returned an error: error reading input, returned 0
> javax.xml.stream.XMLStreamException: error reading input, returned 0  at
> com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3709)  at
> com.bea.xml.stream.MXParser.more(MXParser.java:3715)  at
> com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1936)  at
> com.bea.xml.stream.MXParser.next(MXParser.java:1333)  at
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(
> XmlUpdateRequestHandler.java:318)  at
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(
> XmlUpdateRequestHandler.java:195)  at
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
> XmlUpdateRequestHandler.java:123)  at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:117)  at org.apache.solr.core.SolrCore.execute(
> SolrCore.java:902)  at org.apache.solr.servlet.SolrDispatchFilter.execute(
> SolrDispatchFilter.java:280)  at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:
> 237)
>   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:235)  at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:206)  at
> org.apache.catalina.core.StandardWrapperValve.invoke(
> StandardWrapperValve.java:233)  at
> org.apache.catalina.core.StandardContextValve.invoke(
> StandardContextValve.java:175)  at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
> :128
> )
>   at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
> :102
> )
>   at org.apache.catalina.core.StandardEngineValve.invoke(
> StandardEngineValve.java:109)  at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :286)
>   at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>   at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
> Http11Protocol.java:583)  at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java
> :447)  at
> java.lang.Thread.run(Thread.java:613)
>
> I suspect there's a setting somewhere that I'm overlooking that is causing
> this, but after peering through the solrconfig.xml and schema.xml files I
> am
> not seeing anything obvious (to me, anyway...=).  The second line of the
> error shows it's crashing in MXParser.fillBuf, which implies that I'm
> overloading the buffer (I assume due to too large of a string).
>
> Thanks in advance for any assistance,
> Dave
>
>