Re: sort problem

2007-09-02 Thread michael ravits
hello mike,

this is the field definition:


holds message id's, values range from 0 to 127132531
can I disable this cache?

Mike Klaas <[EMAIL PROTECTED]> wrote: On 28-Aug-07, at 6:19 AM, michael ravits 
wrote:

> hello solrs,
>
> i have an index with 30M records, weights ~50GB. latest trunk  
> version. heap size 1024mb.
> queries work fine until I specify a field to sort results by. even  
> if the result set consists of only 2 documents,  the CPU jumps high  
> and after about 5 minutes I get the following exception:

Sorting requires a one-time generation of a fieldCache for the field,  
which occupies 1, 2, 4, or 8 bytes per doc (possibly also the sum of  
the size of the unique values in the field, if it is a string  
field).  What is the definition of the field you are trying to sort  
by, and what kinds of values are indexed therein?

-Mike

> Any idea?
> thanks
>
> Java heap space
>
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.lucene.index.SegmentTermEnum.termInfo 
> (SegmentTermEnum.java:170)
> at org.apache.lucene.index.TermInfosReader.scanEnum 
> (TermInfosReader.java:166)
> at org.apache.lucene.index.TermInfosReader.get 
> (TermInfosReader.java:153)
> at org.apache.lucene.index.SegmentTermDocs.seek 
> (SegmentTermDocs.java:54)
> at org.apache.lucene.index.MultiTermDocs.termDocs 
> (MultiReader.java:429)
> at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java: 
> 380)
> at org.apache.lucene.search.FieldCacheImpl$10.createValue 
> (FieldCacheImpl.java:383)
> at org.apache.lucene.search.FieldCacheImpl$Cache.get 
> (FieldCacheImpl.java:72)
> at org.apache.lucene.search.FieldCacheImpl.getStringIndex 
> (FieldCacheImpl.java:350)
> at org.apache.lucene.search.FieldSortedHitQueue.comparatorString 
> (FieldSortedHitQueue.java:266)
> at org.apache.lucene.search.FieldSortedHitQueue$1.createValue 
> (FieldSortedHitQueue.java:182)
> at org.apache.lucene.search.FieldCacheImpl$Cache.get 
> (FieldCacheImpl.java:72)
> at  
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator 
> (FieldSortedHitQueue.java:155)
> at org.apache.lucene.search.FieldSortedHitQueue. 
> (FieldSortedHitQueue.java:56)
> at org.apache.solr.search.SolrIndexSearcher.getDocListNC 
> (SolrIndexSearcher.java:862)
> at org.apache.solr.search.SolrIndexSearcher.getDocListC 
> (SolrIndexSearcher.java:810)
> at org.apache.solr.search.SolrIndexSearcher.getDocList 
> (SolrIndexSearcher.java:703)
> at  
> org.apache.solr.handler.StandardRequestHandler.handleRequestBody 
> (StandardRequestHandler.java:125)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest 
> (RequestHandlerBase.java:78)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:723)
> at org.apache.solr.servlet.SolrDispatchFilter.execute 
> (SolrDispatchFilter.java:193)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
> (SolrDispatchFilter.java:161)
> at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter 
> (ServletHandler.java:1089)
> at org.mortbay.jetty.servlet.ServletHandler.handle 
> (ServletHandler.java:365)
> at org.mortbay.jetty.security.SecurityHandler.handle 
> (SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle 
> (SessionHandler.java:181)
> at org.mortbay.jetty.handler.ContextHandler.handle 
> (ContextHandler.java:712)
> at org.mortbay.jetty.webapp.WebAppContext.handle 
> (WebAppContext.java:405)
> at org.mortbay.jetty.handler.ContextHandlerCollection.handle 
> (ContextHandlerCollection.java:211)
> at org.mortbay.jetty.handler.HandlerCollection.handle 
> (HandlerCollection.java:114)
> at org.mortbay.jetty.handler.HandlerWrapper.handle 
> (HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:285)
>
>
>
> -
> Looking for a deal? Find great prices on flights and hotels with  
> Yahoo! FareChase.



   
-
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.

Re: sort problem

2007-09-02 Thread Yonik Seeley
On 9/2/07, michael ravits <[EMAIL PROTECTED]> wrote:
> this is the field definition:
> required="true" />
>
> holds message id's, values range from 0 to 127132531
> can I disable this cache?

No, sorting wouldn't work without it.

The cache structure certainly isn't optimal for this (every doc
probably has a different value).
If you could live with a cap of 2B on message id, switching to type
"int" would decrease the memory usage to 4 bytes per doc (presumably
you don't need range queries?)

-Yonik


updates on the server

2007-09-02 Thread James O'Rourke
Is there a way to pass the solr server a set of documents without all  
the fields present and only update the fields that are provided  
leaving the remaining document fields intact or do I need to pull  
those documents over the wire myself and do the update manual and  
then add them back to the index?


James



Multiple Values -Structured?

2007-09-02 Thread Bharani

Hi,

I have got two sets of document

1) Primary Document
2) Occurrences of primary document

Since there is no such thing as "join" i can either 

a) Post the primary document with occurrences as multi valued field
 or
b) Post the primary document for every occurrences i.e. classic
de-normalized route

My problem with 

Option a) This works great as long as the occurrence is a single field but
if i had a group of fields that describes the occurrence then the search
returns wrong results becuase of the nature of text search

i.e 1 Jan 2007
 review

 2 Jan 2007 
 revision

if i search for 2 Jan 2007 and  1 Jan 2007  i will get a hit
(which is wrong)  becuase there is no grouping of fields to associate date
and type as one unit. If i merge them as one entity then i cant use the
range quieries for date

Option B) This would result in large number of documents and even if i try
with index only and not store i am still have to deal with duplicate hit -
becuase all i want is the primary document


Is there a better approach to the problem?

Thanks
Bharani


-- 
View this message in context: 
http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12456399
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sort problem

2007-09-02 Thread michael ravits
I'll try switching to int. Thanks.

Yonik Seeley <[EMAIL PROTECTED]> wrote: On 9/2/07, michael ravits  wrote:
> this is the field definition:
>
>
> holds message id's, values range from 0 to 127132531
> can I disable this cache?

No, sorting wouldn't work without it.

The cache structure certainly isn't optimal for this (every doc
probably has a different value).
If you could live with a cap of 2B on message id, switching to type
"int" would decrease the memory usage to 4 bytes per doc (presumably
you don't need range queries?)

-Yonik


   
-
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.