hebrew stemmer

2008-08-25 Thread michael ravits
I am really curious whether anyone is working/worked on something like this?
Also I would like to hear ideas for a direction to begin working on.

Michael


  


solr 236 + facet

2007-07-26 Thread michael ravits
hello solrs!

when using the collapse feature of solr 236 and faceting, the faceting happens 
after the collapse. Is there a way to do faceting before collapsing?


   
-
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.

Fwd: Solr 1.3 HTTP server stops responding

2007-07-31 Thread michael ravits
hello again,

This happend when I was updating the index.
This time I saved the log and there appears this error number of times:

SEVERE: Error during auto-warming of key:[EMAIL 
PROTECTED]:java.lang.OutOfMemoryError: Java heap

I have increased Xmx to 850mb.
But what else can I do?

thank you for your help

michael ravits <[EMAIL PROTECTED]> wrote: Date: Tue, 31 Jul 2007 01:34:36 -0700 
(PDT)
From: michael ravits <[EMAIL PROTECTED]>
Subject: Solr 1.3 HTTP server stops responding
To: solr-user@lucene.apache.org

 hello solrs,

I am facing a similar problem like Kevin Holmes described in a recent thread.

 I have created a thread dump, maybe this can help trace the problem?
 I am attaching the zipped dump to this email.
 
I am using Solr 1.3 with the solr236 patch on win2003/2gb machine
with Xms=512m and Xmx=512m. I didn't save the console output to a file so I 
can't tell whether there were PERFORMACE or MEMORY exceptions, but next time 
I'll have the Log.

thanks for your help


-
Choose the right car based on your needs.   Check out Yahoo! Autos new Car 
Finder tool. 

   
-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us.

sort problem

2007-08-28 Thread michael ravits
hello solrs,

i have an index with 30M records, weights ~50GB. latest trunk version. heap 
size 1024mb.
queries work fine until I specify a field to sort results by. even if the 
result set consists of only 2 documents,  the CPU jumps high and after about 5 
minutes I get the following exception:

Any idea?
thanks

Java heap space

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:170)
at 
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:166)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:153)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
at org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:429)
at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:380)
at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:383)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:350)
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:266)
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:182)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:155)
at 
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:862)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:810)
at 
org.apache.solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:703)
at 
org.apache.solr.handler.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:125)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:723)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:193)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:161)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)


   
-
Looking for a deal? Find great prices on flights and hotels with Yahoo! 
FareChase.

Re: sort problem

2007-09-02 Thread michael ravits
hello mike,

this is the field definition:


holds message id's, values range from 0 to 127132531
can I disable this cache?

Mike Klaas <[EMAIL PROTECTED]> wrote: On 28-Aug-07, at 6:19 AM, michael ravits 
wrote:

> hello solrs,
>
> i have an index with 30M records, weights ~50GB. latest trunk  
> version. heap size 1024mb.
> queries work fine until I specify a field to sort results by. even  
> if the result set consists of only 2 documents,  the CPU jumps high  
> and after about 5 minutes I get the following exception:

Sorting requires a one-time generation of a fieldCache for the field,  
which occupies 1, 2, 4, or 8 bytes per doc (possibly also the sum of  
the size of the unique values in the field, if it is a string  
field).  What is the definition of the field you are trying to sort  
by, and what kinds of values are indexed therein?

-Mike

> Any idea?
> thanks
>
> Java heap space
>
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.lucene.index.SegmentTermEnum.termInfo 
> (SegmentTermEnum.java:170)
> at org.apache.lucene.index.TermInfosReader.scanEnum 
> (TermInfosReader.java:166)
> at org.apache.lucene.index.TermInfosReader.get 
> (TermInfosReader.java:153)
> at org.apache.lucene.index.SegmentTermDocs.seek 
> (SegmentTermDocs.java:54)
> at org.apache.lucene.index.MultiTermDocs.termDocs 
> (MultiReader.java:429)
> at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java: 
> 380)
> at org.apache.lucene.search.FieldCacheImpl$10.createValue 
> (FieldCacheImpl.java:383)
> at org.apache.lucene.search.FieldCacheImpl$Cache.get 
> (FieldCacheImpl.java:72)
> at org.apache.lucene.search.FieldCacheImpl.getStringIndex 
> (FieldCacheImpl.java:350)
> at org.apache.lucene.search.FieldSortedHitQueue.comparatorString 
> (FieldSortedHitQueue.java:266)
> at org.apache.lucene.search.FieldSortedHitQueue$1.createValue 
> (FieldSortedHitQueue.java:182)
> at org.apache.lucene.search.FieldCacheImpl$Cache.get 
> (FieldCacheImpl.java:72)
> at  
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator 
> (FieldSortedHitQueue.java:155)
> at org.apache.lucene.search.FieldSortedHitQueue. 
> (FieldSortedHitQueue.java:56)
> at org.apache.solr.search.SolrIndexSearcher.getDocListNC 
> (SolrIndexSearcher.java:862)
> at org.apache.solr.search.SolrIndexSearcher.getDocListC 
> (SolrIndexSearcher.java:810)
> at org.apache.solr.search.SolrIndexSearcher.getDocList 
> (SolrIndexSearcher.java:703)
> at  
> org.apache.solr.handler.StandardRequestHandler.handleRequestBody 
> (StandardRequestHandler.java:125)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest 
> (RequestHandlerBase.java:78)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:723)
> at org.apache.solr.servlet.SolrDispatchFilter.execute 
> (SolrDispatchFilter.java:193)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
> (SolrDispatchFilter.java:161)
> at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter 
> (ServletHandler.java:1089)
> at org.mortbay.jetty.servlet.ServletHandler.handle 
> (ServletHandler.java:365)
> at org.mortbay.jetty.security.SecurityHandler.handle 
> (SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle 
> (SessionHandler.java:181)
> at org.mortbay.jetty.handler.ContextHandler.handle 
> (ContextHandler.java:712)
> at org.mortbay.jetty.webapp.WebAppContext.handle 
> (WebAppContext.java:405)
> at org.mortbay.jetty.handler.ContextHandlerCollection.handle 
> (ContextHandlerCollection.java:211)
> at org.mortbay.jetty.handler.HandlerCollection.handle 
> (HandlerCollection.java:114)
> at org.mortbay.jetty.handler.HandlerWrapper.handle 
> (HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:285)
>
>
>
> -
> Looking for a deal? Find great prices on flights and hotels with  
> Yahoo! FareChase.



   
-
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.

Re: sort problem

2007-09-02 Thread michael ravits
I'll try switching to int. Thanks.

Yonik Seeley <[EMAIL PROTECTED]> wrote: On 9/2/07, michael ravits  wrote:
> this is the field definition:
>
>
> holds message id's, values range from 0 to 127132531
> can I disable this cache?

No, sorting wouldn't work without it.

The cache structure certainly isn't optimal for this (every doc
probably has a different value).
If you could live with a cap of 2B on message id, switching to type
"int" would decrease the memory usage to 4 bytes per doc (presumably
you don't need range queries?)

-Yonik


   
-
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.

can't post.sh/post.jar

2007-06-17 Thread michael ravits
hello solrs!

I get the following error on windows when trying to index an ~60mb xml file 
with post.jar.

Also couldn't get post.sh to work - anyone successfully ran it on windows?

C:\solr\example\exampledocs>java -jar post.jar  flix.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, othe
r encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file flix.xml
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.io.ByteArrayOutputStream.write(Unknown Source)
at sun.net.www.http.PosterOutputStream.write(Unknown Source)
at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at org.apache.solr.util.SimplePostTool.pipe(SimplePostTool.java:281)
at org.apache.solr.util.SimplePostTool.postData(SimplePostTool.java:247)

at org.apache.solr.util.SimplePostTool.postFile(SimplePostTool.java:213)

at org.apache.solr.util.SimplePostTool.postFiles(SimplePostTool.java:152
)
at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:112)

C:\solr\example\exampledocs>

   
-
Pinpoint customers who are looking for what you sell. 

solr-user@lucene.apache.org

2007-06-19 Thread michael ravits
hi,

I am planning on reindexing from fresh on a daily basis, while keeping the 
search online.

What I do is:

1. *:*
2. ...
3. 

Works ok. But I've noticed that the faq recommends issuing  before 
reindexing. The problem is  also seems to commit changes, so the 
index is empty until I reindex and the search can't be online.

How _not_ sending  will affect performace/results?
Will sending  after reindexing help?
Is there another solution?

thanks

   
-
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. 

delete by query multiple Ids

2007-06-26 Thread michael ravits
hello solrs
   
  is it possible to query multiple specific ids?
  something like this:
   
  mediaId:6720,6721,6722,8762,9754

   
-
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. 

multiple indices

2007-06-26 Thread michael ravits
dear solrs - thanks for all your help.
   
  I have multiple applications (blogs/forums/video/etc) - each of these is 
independent (no need to perform queries on multiple indices).
  Would it be best to use multiple instances of SOLR/JVM - one for each index 
or use a solution where only one JVM instance is running (maybe solr-215?)?
   
  Considering mostly performance and load on the servers.

   
-
Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out.