Re: Sorting in different languages

2008-05-28 Thread Uwe Klosa
I've been thinking about that in similar ways. I'll have look into it.

Thanks
Uwe

On Tue, May 27, 2008 at 9:06 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:

> Hi Uwe,
>
> On 05/26/2008 at 8:43 AM, Uwe Klosa wrote:
> > We're using Solr 1.3 in our application and we have an index
> > that contains 3 different languages which are stored and
> > indexed in fields like 'title_sv', 'title_en', 'title_no'.
> >
> > Our problem in this case is we want to sort the search
> > results according to the different rules for each language.
> > If we have 'sort=title_sv asc' we want that the Swedish
> > rules are used. Is there a standard way to achieve this?
>
> Although the underlying Lucene Sort facility supports Locale-based field
> sorting[1], Solr does not expose this functionality.
>
> I think Solr should support syntax like "sort=title_sv asc locale:sv",
> minimally including the language, but possibly also the country component of
> the locale, e.g. "sort=title_de asc locale:de_CH" for Swiss German.
>
> Steve
>
> [1]
> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/SortField.html#SortField(java.lang.String,%20java.util.Locale)
>


Re: Sorting in different languages

2008-05-28 Thread Uwe Klosa
Nice idea. I'll look into that, too.

Thanks
Uwe

On Tue, May 27, 2008 at 10:34 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : I think Solr should support syntax like "sort=title_sv asc locale:sv",
> : minimally including the language, but possibly also the country
> : component of the locale, e.g. "sort=title_de asc locale:de_CH" for Swiss
> : German.
>
> I'm not sure how i feel about that syntax, but a fairly straight forward
> way to get Locale based sorting with Solr is to write a subclass of
> StrField that took in a "locale" init param and used it in the
> getSortField method.  Then in yourschema each  could declare
> what locale to use for sorting.
>
> that would work with Solr 1.2 without any modifications, but if someone
> wants to submit a patch we can make Solr "out of the box" by making
> similar changes to the FieldType base class that would be pretty cool.
>
> (one thing that might get tricky is making the new Locale option play nice
> with the sortMissingFirst and sortMissingLast options ... might need some
> creative SortComparators)
>
>
> -Hoss
>
>


Re: solr on ubuntu 8.04

2008-05-28 Thread Andrew Savory
Hi Jack,

2008/5/28 Jack Bates <[EMAIL PROTECTED]>:
> Thanks for your suggestions. I have now tried installing Solr on two
> different machines. On one machine I installed the Ubuntu solr-tomcat5.5
> package, and on the other I simply dropped "solr.war"
> into /var/lib/tomcat5.5/webapps
>
> Both machines are running Tomcat 5.5
>
> I get the same error message on both machines:
>
> SEVERE: Exception starting filter SolrRequestFilter
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.solr.core.SolrConfig
>
> The full error message is attached.

Can you check that solr.home is set correctly?
I don't have an ubuntu box handy at the moment, will try to look into
it tonight.


Andrew.
--
[EMAIL PROTECTED] / [EMAIL PROTECTED]
http://www.andrewsavory.com/


Re: SolrTrunk start error

2008-05-28 Thread 黄芳
I am the beginer of the solr.
I am studing solr-1.2.0 now, ^_^

在 08-5-28,Shalin Shekhar Mangar<[EMAIL PROTECTED]> 写道:
> Which version of Solr were you using previously?
>
> 2008/5/28 Eason. Lee <[EMAIL PROTECTED]>:
> > I update my solr to the latest trunk yestoday
> > some error shows when I start it.
> > It seems that Lucene2.4 is not compatible with the former one
> > 2008-5-28 14:50:05 org.apache.solr.common.SolrException log
> > 严重: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> >  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
> >  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
> >  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
> >  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
> >  at
> > org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
> >  at
> > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> >  at
> > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> >  at
> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
> >  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> >  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
> >  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> >  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
> >  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
> >  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
> >  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >  at
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >  at java.lang.reflect.Method.invoke(Method.java:597)
> >  at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
> >  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >  at org.apache.lucene.store.IndexInput.readString(IndexInput.java:123)
> >  at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:306)
> >  at org.apache.lucene.index.FieldInfos.(FieldInfos.java:59)
> >  at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:300)
> >  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:264)
> >  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:232)
> >  at
> > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:91)
> >  at
> > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
> >  at
> > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:81)
> >  at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
> >  at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
> >  at
> > org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93)
> >  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
> >  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
> >  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
> >  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
> >  at
> > org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
> >  at
> > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> >  at
> > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> >  at
> > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> >  at
> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
> >  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> >  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
> >  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> >  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
> >  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
> >  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
> >  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >  at
> > sun.refle

Re: SolrTrunk start error

2008-05-28 Thread Shalin Shekhar Mangar
Ok, I see you're getting a OutOfMemoryError. That has nothing to do
with lucene changes. Try increasing your JVM heap sizes.

2008/5/28 Shalin Shekhar Mangar <[EMAIL PROTECTED]>:
> Which version of Solr were you using previously?
>
> 2008/5/28 Eason. Lee <[EMAIL PROTECTED]>:
>> I update my solr to the latest trunk yestoday
>> some error shows when I start it.
>> It seems that Lucene2.4 is not compatible with the former one
>> 2008-5-28 14:50:05 org.apache.solr.common.SolrException log
>> 严重: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
>>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>>  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>>  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>>  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>>  at
>> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
>>  at
>> org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
>>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>>  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
>>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>>  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>>  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
>>  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>>  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
>>  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>  at org.apache.lucene.store.IndexInput.readString(IndexInput.java:123)
>>  at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:306)
>>  at org.apache.lucene.index.FieldInfos.(FieldInfos.java:59)
>>  at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:300)
>>  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:264)
>>  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:232)
>>  at
>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:91)
>>  at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
>>  at
>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:81)
>>  at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>  at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>  at
>> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93)
>>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
>>  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>>  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>>  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>>  at
>> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>>  at
>> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
>>  at
>> org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
>>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>>  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
>>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>>  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>>  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
>>  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>>  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessor

Re: SolrTrunk start error

2008-05-28 Thread Shalin Shekhar Mangar
Which version of Solr were you using previously?

2008/5/28 Eason. Lee <[EMAIL PROTECTED]>:
> I update my solr to the latest trunk yestoday
> some error shows when I start it.
> It seems that Lucene2.4 is not compatible with the former one
> 2008-5-28 14:50:05 org.apache.solr.common.SolrException log
> 严重: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>  at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
>  at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
>  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
>  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>  at org.apache.lucene.store.IndexInput.readString(IndexInput.java:123)
>  at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:306)
>  at org.apache.lucene.index.FieldInfos.(FieldInfos.java:59)
>  at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:300)
>  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:264)
>  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:232)
>  at
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:91)
>  at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
>  at
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:81)
>  at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>  at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>  at
> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93)
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>  at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>  at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>  at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>  at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
>  at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>  at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
>  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>  at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>  at org.apache.catalina.core.StandardService.start(StandardService.java:516)
>  at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>  at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> 2008-5-28 14:50:06 org.apache.solr.common.SolrException log
> 严重: java.lang.RuntimeException: java.io.IOException: read past EOF
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>  at org.apache.solr

Re: simple ui?

2008-05-28 Thread Grant Ingersoll
I think if we even just had a simple search box on the admin landing  
page that showed facets would be cool.  The current admin landing page  
is a bit silly currently.


On May 28, 2008, at 2:34 AM, Karl Wettin wrote:

It would be perfect if all I had to do was to define a couple of  
facet fields, a default text query field and some title/body/class  
type to render the results.


Is there such a formula 1A JSP/servlet (or PHP) user interface for  
Solr? Perhaps something in the example or admin that I missed? If  
none of above, is there any commercial solution I can buy and have  
up and running today? I'm hysterically bad at UI.



 karl





Re: simple ui?

2008-05-28 Thread Erik Hatcher


On May 28, 2008, at 2:34 AM, Karl Wettin wrote:

It would be perfect if all I had to do was to define a couple of  
facet fields, a default text query field and some title/body/class  
type to render the results.


Is there such a formula 1A JSP/servlet (or PHP) user interface for  
Solr? Perhaps something in the example or admin that I missed? If  
none of above, is there any commercial solution I can buy and have  
up and running today? I'm hysterically bad at UI.


Solr Flare to the rescue :)   (sort of)

http://wiki.apache.org/solr/Flare

It is a Ruby on Rails front-end to Solr, with faceted navigation,  
Ajax suggest, and paging, as well as integration with SIMILE Timeline  
and Exhibit. It is a bit dusty, and requires an older version of  
Rails, but it works nicely to prop up against Solr.  It does have  
some schema.xml requirements, like *_facet used for facet fields -  
that is the kind of thing that needs to be refactored out into  
configuration.


I haven't devoted any time to Flare in a long while, but it has  
served me quite well in the past for many very quick demos and the  
basis for Blacklight (blacklight.rubyforge.org).


Erik



Re: Re[2]: "null" in admin page

2008-05-28 Thread Alexander Ramos Jardim
Well,

It surely comes on the example, as I got this problem all times I get the
example, and I have to remove the file multicore.xml or I get the error.

2008/5/28 Chris Hostetter <[EMAIL PROTECTED]>:

>
> : I got this problem some time ago. Solr comes configured as multicore and
> you
> : running it as only one core. Just disable the multicore settings.
>
> FWIW: solr is not "configured as multicore" by default, you shouldn't need
> to disable anything -- if you have a multicore.xml file, then Solr will
> run with multicore support.  if not, then it will run a single core.
>
>
>
> -Hoss
>
>


-- 
Alexander Ramos Jardim


new user: some questions about parameters and query syntax

2008-05-28 Thread Bram de Jong
hello all,


a new user here. I will be using solr to power the search over at
freesound ( http://freesound.iua.upf.edu ).

been experimenting with solr and it's damned cool :)

here we go:

1. I have a minor remark about the inconsistencies in the
query-parameter definitions... Sometimes multiple fields are given
with comma/space-separated values, sometimes they are given by
repeating the parameter:
sort=field1,field2 desc,field3
but
facet.field=field1&facet.field=field2
This is pretty confusing to first-hand users! :-)

2. two questions about the query syntax:

is
   tag:(bass drum)
the same as:
   tag:bass OR tag:drum
or the same as:
   tag:bass AND tag:drum
or neither?

is "+bass +drum" essentially the same as "bass AND drum" ?


I'm working on a query/parameter builder in python. It's work in
progress but you can see what's been done over here:
http://iua-share.upf.edu/svn/nightingale/trunk/sandbox/solr/solrquery.py
it's basically a wrapper around all the parameters you can pass via
the GET string to SOLR so you can do things like:

q = SolrQuery()
q.set_query("tag:bass")
q.set_query_options(start=0, rows=10, sort=["date desc"],
field_list=["id", "tag", "description"])
q.add_facet_field("author")
q.add_facet_field("tag")
q.set_global_facet_options(limit=20, count_missing=True)
q.set_facet_options("tag", mincount=5)
q.add_date_facet("date_written")
q.set_global_date_facet_options(start="-2YEARS", end="-1YEARS",
gap="+1MONTH", count_other=["before", "after"])
print q.get_query_string()

which results in:

sort=date+desc&facet.date.gap=%2B1MONTH&facet.field=author&facet.field=tag&facet.date=date_written&facet=true&facet.date.start=-2YEARS&facet.date.end=-2YEARS&fl=id%2Ctag%2Cdescription&facet.date.other=before&facet.date.other=after&rows=10&facet.missing=true&facet.limit=20&start=0&wt=json&q=tag%3Abass&f.tag.facet.mincount=5



 - bram

-- 
http://freesound.iua.upf.edu
http://www.smartelectronix.com
http://www.musicdsp.org


Re: new user: some questions about parameters and query syntax

2008-05-28 Thread Erik Hatcher


On May 28, 2008, at 10:41 AM, Bram de Jong wrote:

a new user here. I will be using solr to power the search over at
freesound ( http://freesound.iua.upf.edu ).


Welcome!  Cool use of Solr too.


1. I have a minor remark about the inconsistencies in the
query-parameter definitions... Sometimes multiple fields are given
with comma/space-separated values, sometimes they are given by
repeating the parameter:
sort=field1,field2 desc,field3
but
facet.field=field1&facet.field=field2
This is pretty confusing to first-hand users! :-)


Yeah, it is confusing.  But we have to be careful with order.  I  
don't believe you can rely on the order of same named request  
parameters (right?), so sort needs to be a list where order matters.   
Whereas with facet.field, order does not matter.



2. two questions about the query syntax:

is
   tag:(bass drum)
the same as:
   tag:bass OR tag:drum
or the same as:
   tag:bass AND tag:drum
or neither?


Using the standard Solr/Lucene query parser, tag:(bass drum) will use  
the operator specified in schema.xml, and it defaults to OR.



is "+bass +drum" essentially the same as "bass AND drum" ?


yes, exactly the same.


I'm working on a query/parameter builder in python.


Cool deal.  Might be worth collaborating with the pysolr project  
here:  so as to not duplicate efforts.


Erik



Re: field normalization and omitNorms

2008-05-28 Thread Christian Vogler
On Wednesday 28 May 2008 01:37:57 Otis Gospodnetic wrote:
> If you have tokenized fields of variable size and you want the field length
> to affect the relevance score, then you do not want to omit norms. 
> Omitting norms is good for fields where length is of no importance (e.g.
> gender="Male" vs. gender="Female").  Omitting norms saves you heap/RAM, one
> byte per doc per field without norms, I believe.

I am also toying with the hypothesis that omitting the field norm may be a 
good idea for title fields in languages with compound words, which typically 
consist of only a few words. 

On our server we use a German language stemmer in conjunction with a compound 
word tokenizer, which inserst extra tokens into the stream. With typical 
short titles, such as:

Elterntagung mit Rekordbeteiligung,

which is tokenized as (before stemming):

elterntagung eltern tagung mit rekordbeteiligung rekord beteiligung, 

the title ends up having 7 tokens instead of 3 or even 5, which significantly 
affects the field norms. The reason for retaining the original compound token 
is that it forces compound word queries to return only hits on compound 
words.

In addition, we also have a copied field with just the 3 tokens that skips the 
compound tokenizer, in order to boost queries that match whole words. As a 
consequence, according to the "explain" parameter, the match score for the 
non-compound title fields is *way* out of proportion.

I will have to experiment a bit - one thing that I want to try is moving the 
non-compound field from the qf parameter to the bq parameter, but omitting 
the title field norms is also on my list of things to try.

Best regards
- Christian


Re: Sorting in different languages

2008-05-28 Thread Alexander Ramos Jardim
Well,

One solution that I can see for this problem is having different indexes for
each language.

2008/5/28 Uwe Klosa <[EMAIL PROTECTED]>:

> Nice idea. I'll look into that, too.
>
> Thanks
> Uwe
>
> On Tue, May 27, 2008 at 10:34 PM, Chris Hostetter <
> [EMAIL PROTECTED]>
> wrote:
>
> >
> > : I think Solr should support syntax like "sort=title_sv asc locale:sv",
> > : minimally including the language, but possibly also the country
> > : component of the locale, e.g. "sort=title_de asc locale:de_CH" for
> Swiss
> > : German.
> >
> > I'm not sure how i feel about that syntax, but a fairly straight forward
> > way to get Locale based sorting with Solr is to write a subclass of
> > StrField that took in a "locale" init param and used it in the
> > getSortField method.  Then in yourschema each  could declare
> > what locale to use for sorting.
> >
> > that would work with Solr 1.2 without any modifications, but if someone
> > wants to submit a patch we can make Solr "out of the box" by making
> > similar changes to the FieldType base class that would be pretty cool.
> >
> > (one thing that might get tricky is making the new Locale option play
> nice
> > with the sortMissingFirst and sortMissingLast options ... might need some
> > creative SortComparators)
> >
> >
> > -Hoss
> >
> >
>



-- 
Alexander Ramos Jardim


Re: solr sorting question

2008-05-28 Thread Alexander Ramos Jardim
If you want to have some categories before another ones, but the other
categories sorted alphabeticaly, you could create another field on your
index like cat_priority and put numerical values, so you can sort by this
field and by secondarily by your category name.

2008/5/27 anuvenk <[EMAIL PROTECTED]>:

>
> Question about sorting with solr. I want to group results in certain sort
> order so i can split them & display in tabs easily.
> I want to be able to have a custom sort order instead of sort=cat asc score
> desc
> In the above mentioned way, categories are grouped in ascending order. But
> i
> want certain categories to come up first in the sort order. I don't want
> them to be grouped in ascending order. Please shed some light anyone. How
> to
> do it. Is it possible?
> --
> View this message in context:
> http://www.nabble.com/solr-sorting-question-tp17498596p17498596.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Alexander Ramos Jardim


Re: Best way to represent a 1-N relation?

2008-05-28 Thread Alexander Ramos Jardim
What about usgin dynamic fields to get this job done? Of course, it would
turn your queries a little sharper.

2008/5/27 sunil sarje <[EMAIL PROTECTED]>:

>
> Hi Alan,
>
> This is very common problem that every app would face while enabling the
> application search using any kind of search engine technology.
>
> I would create 2 indexes
> 1. city with all city related information for crime rate, school districts,
> proximity to sports, etc +city code
> 2. real estate listing documents + city code for each listing
>
> With this you dont have to change the real estate index when city related
> info is changed.
>
> For search you need to do 2 searches for single search request.
>
> Implement either custome filter (extend SolrDistpatchFilter) or custome
> response writer.
> a. Search for city and bring all cities for the given city-related
> criteria.
> b. Search listing with city codes from a. search result. (max cities - 1024
> default max count in criteria)
>
> OR
>
> If you want to use out of the box solr feature then make multiple ajax
> calls
> from user interface.
>
>
> hope this helps.
> -sunil
>
>
>
> alan. wrote:
> >
> > Hi all,
> >
> > What is the best way to represent a 1-N relation
> > between two tables in a DB as a Solr document?
> >
> > I'm trying to figure out the best way to represent objects in
> > my domain as Solr documents.  As a representative example,
> > I'll use the domain of real estate where there are cities
> > and each city has multiple listings (1-N relation)
> >
> > Let's say I'm searching for homes but the first thing I want to do is
> > identify the best city to live in.  A good city could be defined
> > by crime rate, school districts, proximity to sports, etc.
> >
> > As part of identifying the best city, I would like to exclude
> > cities that don't have listings in the price/size that I'm looking for.
> >
> > One approach is to create a 'listing' document that has
> > denormalized 'city' information.  We could search for
> > listings, use a 'city' facet to identify candidate cities,
> > and then order the 'city' facets.  (For lack of a better
> > term, I'm calling this a denormalized representation)
> >
> > Another approach is to create a 'city' document that has
> > 'listing' information attached as multiple-values or encoded
> > into a field.  We could search for cities, and
> > filter out cities that don't meet price/size requirements.
> > (Once again, for lack of a better term, I'm calling this an
> > aggregated representation)
> >
> > So to restate my question a little differently, what are
> > the advantages of using a denormalized vs an aggregated
> > representation?  Are there other approaches?
> >
> > Thanks
> >
> > Alan
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Best-way-to-represent-a-1-N-relation--tp17475861p17493360.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Alexander Ramos Jardim


ClassCastException trying to use distributed search

2008-05-28 Thread Grégoire Neuville
Hi,

I've tried several times to put solr distributed search capabilities at work
but failed each time : the ant building of solr-1.3's trunk doesn't bring
any errros, but after deploying the obtained war in tomcat and trying a
request on solr with the 'shards' parameter on, I keep on getting this cast
exception (see below).

Did I miss something during the building ? Must I apply a patch on the trunk
version ?

Thanks a lot for your answers.
-- 
Grégoire

*java.lang.ClassCastException: java.lang.String
org.apache.solr.common.SolrException: java.lang.ClassCastException:
java.lang.String at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:242)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:595) Caused by:
java.lang.ClassCastException: java.lang.String at
org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:86)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:35)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:350)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:152)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:368)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:343)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at
java.util.concurrent.FutureTask.run(FutureTask.java:123) at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
... 1 more*


Iterating the entire dataset

2008-05-28 Thread Daniel Garcia
Is there a simple way to query the entire dataset? I want to be able to iterate 
through every document in the index. 

Thanks,
Daniel


  


Re: Iterating the entire dataset

2008-05-28 Thread Yonik Seeley
On Wed, May 28, 2008 at 6:26 PM, Daniel Garcia <[EMAIL PROTECTED]> wrote:
> Is there a simple way to query the entire dataset? I want to be able to 
> iterate through every document in the index.

q=*:*&start=0&rows=100
q=*:*&start=100&rows=100
etc

You could specify a very large rows, but it's probably best to handle
in pages/chunks.

-Yonik


Solr indexing configuration help

2008-05-28 Thread gaku113

Hi all Solr users/developers/experts,

I have the following scenario and I appreciate any advice for tuning my solr
master server.  

I have a field in my schema that would index (but not stored) about ~1
ids for each document.  This field is expected to govern the size of the
document.  Each id can contain up to 6 characters.  I figure that there are
two alternatives for this field, one is the use a string multi-valued field,
and the other would be to pass a white-space-delimited string to solr and
have solr tokenize such string based on whitespace (the text_ws fieldType).  
The master server is expected to receive constant stream of updates.

The expected/estimated document size can range from 50k to 100k for a single
document.  (I know this is quite large). The number of documents is expected
to be around 200,000 on each master server, and there can be multiple master
servers (sharding).  I wish the master can handle more docs too if I can
figure a way out.  

Currently, I’m performing some basic stress tests to simulate the indexing
side on the master server.  This stress test would continuously add new
documents at the rate of about 10 documents every 30 seconds.  Autocommit is
being used (50 docs and 180 seconds constraints), but I have no idea if this
is the preferred way.  The goal is to keep adding new documents until we can
get at least 200,000 documents (or about 20GB of index) on the master (or
even more if the server can handle it)

What I experienced from the indexing stress test is that the master server
failed to respond after a while, such as non-pingable when there are about
30k documents.  When looking at the log, they are mostly:
java.lang.OutOfMemoryError: Java heap space
OR
Ping query caused exception: null (this is probably caused by the OOM
problem)

There were also a few cases that the java process even went away.

Questions:
1)  Is it better to use the multi-valued string field or the text_ws field
for this large field?
2)  Is it better to have more outstanding docs per commit or more frequent
commit, in term of maximizing server resources?  What is the preferred way
to commit documents assuming that solr master receives updates frequently?
How many updated docs should there be before issuing a commit? 
3)  How to avoid the OOM problem in my case? I’m already doing (-Xms1536M
-Xmx1536M) on a 2-GB machine. Is that not enough?  I’m concerned that adding
more Ram would just delay the OOM problem.  Any additional JVM option to
consider?
4)  Any recommendation for the master server configuration, in a sense that 
I
can maximize the number of indexed docs?
5)  How can it disable caching on the master altogether as queries won’t hit
the master?
6)  For an average doc size of 50k-100k, is that too large for solr, or even
solr is the right tool? If not, any alternative?  If we are able to reduce
the size of docs, can we expect to index more documents?

The followings are info related to software/hardware/configuration:

Solr version (solr nightly build on 5/23/2008)
Solr Specification Version: 1.2.2008.05.23.08.06.59
Solr Implementation Version: nightly
Lucene Specification Version: 2.3.2
Lucene Implementation Version: 2.3.2 652650
Jetty: 6.1.3

Schema.xml (the section that I think are relevant to the master server.)



  

  






id

Solrconfig.xml
  
false
10
500
50
5000
2
1000
1
   
org.apache.lucene.index.LogByteSizeMergePolicy
org.apache.lucene.index.ConcurrentMergeScheduler
single
  

  
false
50
10

500
5000
2
false
  
  

 
  50
  18 


  solr/bin/snapshooter
  .
  true

  

  
50



true

1
1


  
 user_id 0 1 
static newSearcher warming query from
solrconfig.xml
  


  
 fast_warm 0 10 
static firstSearcher warming query from
solrconfig.xml
  

false
4
  

Replication:
The snappuller is scheduled to run every 15 mins for now. 

Hardware:
AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive

OS:
Fedora 8 (64-bit)

JVM version:
java version "1.7.0"
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)

Java options:
java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
-XX:+UseParallelGC -jar start.jar 


-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: field normalization and omitNorms

2008-05-28 Thread Mike Klaas


On 27-May-08, at 3:16 PM, Phillip Farber wrote:



Hi all,

I've been looking without success for a simple explanation of the  
effect of omitNorms=false for a text field. Can someone point me to  
the relevant doc?


The length of the field, as well as field and document boosts, will  
not affect the doc score.  This is usually a bad thing for fields of  
variable length.




What is the effect of omitNorms=false on index size and query  
performance for say 200K documents that have s single large text  
field that ranges from maybe 30K up to around 800K?  Do I want  
omitNorms=true or false?


200K bytes, no effect on performance (unless you are livin' on a crazy  
memory edge).


-Mike


Re: Solr indexing configuration help

2008-05-28 Thread Yonik Seeley
Not sure why you would be getting an OOM from just indexing, and with
the 1.5G heap you've given the JVM.
Have you tried Sun's JVM?

-Yonik

On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>
> Hi all Solr users/developers/experts,
>
> I have the following scenario and I appreciate any advice for tuning my solr
> master server.
>
> I have a field in my schema that would index (but not stored) about ~1
> ids for each document.  This field is expected to govern the size of the
> document.  Each id can contain up to 6 characters.  I figure that there are
> two alternatives for this field, one is the use a string multi-valued field,
> and the other would be to pass a white-space-delimited string to solr and
> have solr tokenize such string based on whitespace (the text_ws fieldType).
> The master server is expected to receive constant stream of updates.
>
> The expected/estimated document size can range from 50k to 100k for a single
> document.  (I know this is quite large). The number of documents is expected
> to be around 200,000 on each master server, and there can be multiple master
> servers (sharding).  I wish the master can handle more docs too if I can
> figure a way out.
>
> Currently, I'm performing some basic stress tests to simulate the indexing
> side on the master server.  This stress test would continuously add new
> documents at the rate of about 10 documents every 30 seconds.  Autocommit is
> being used (50 docs and 180 seconds constraints), but I have no idea if this
> is the preferred way.  The goal is to keep adding new documents until we can
> get at least 200,000 documents (or about 20GB of index) on the master (or
> even more if the server can handle it)
>
> What I experienced from the indexing stress test is that the master server
> failed to respond after a while, such as non-pingable when there are about
> 30k documents.  When looking at the log, they are mostly:
> java.lang.OutOfMemoryError: Java heap space
> OR
> Ping query caused exception: null (this is probably caused by the OOM
> problem)
>
> There were also a few cases that the java process even went away.
>
> Questions:
> 1)  Is it better to use the multi-valued string field or the text_ws field
> for this large field?
> 2)  Is it better to have more outstanding docs per commit or more frequent
> commit, in term of maximizing server resources?  What is the preferred way
> to commit documents assuming that solr master receives updates frequently?
> How many updated docs should there be before issuing a commit?
> 3)  How to avoid the OOM problem in my case? I'm already doing (-Xms1536M
> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that adding
> more Ram would just delay the OOM problem.  Any additional JVM option to
> consider?
> 4)  Any recommendation for the master server configuration, in a sense 
> that I
> can maximize the number of indexed docs?
> 5)  How can it disable caching on the master altogether as queries won't 
> hit
> the master?
> 6)  For an average doc size of 50k-100k, is that too large for solr, or 
> even
> solr is the right tool? If not, any alternative?  If we are able to reduce
> the size of docs, can we expect to index more documents?
>
> The followings are info related to software/hardware/configuration:
>
> Solr version (solr nightly build on 5/23/2008)
>Solr Specification Version: 1.2.2008.05.23.08.06.59
>Solr Implementation Version: nightly
>Lucene Specification Version: 2.3.2
>Lucene Implementation Version: 2.3.2 652650
>Jetty: 6.1.3
>
> Schema.xml (the section that I think are relevant to the master server.)
>
> omitNorms="true"/>
> positionIncrementGap="100">
>  
>
>  
>
>
>  />
>  multiValued="true" omitNorms="true"/>
> stored="false"
> omitNorms="true"/>
>
> id
>
> Solrconfig.xml
>  
>false
>10
>500
>50
>5000
>2
>1000
>1
>
> org.apache.lucene.index.LogByteSizeMergePolicy
> org.apache.lucene.index.ConcurrentMergeScheduler
>single
>  
>
>  
>false
>50
>10
>
>500
>5000
>2
>false
>  
>  
>
>
>  50
>  18
>
>
>  solr/bin/snapshooter
>  .
>  true
>
>  
>
>  
>50
>  class="solr.LRUCache"
>  size="0"
>  initialSize="0"
>  autowarmCount="0"/>
>  class="solr.LRUCache"
>  size="0"
>  initialSize="0"
>  autowarmCount="0"/>
>  class="solr.LRUCache"
>  size="0"
>  initialSize="0"
>  autowarmCount="0"/>
>true
>
>1
>1
>
>
>  
> user_id 0  name="rows">1 
>static newSearcher warming query from
> solrconfig.xml
>  
>
>
>  
> fast_warm 0  name="rows">10 
>static firstSearcher warming query from
> solrconfig.xml
>  
>
>false
>4
>  
>
> Replication:
>The snappuller is scheduled to run every 15 mins fo

Re: Solr indexing configuration help

2008-05-28 Thread Gaku Mak

I used the admin GUI to get the java info.
java.vm.specification.vendor = Sun Microsystems Inc.

Any suggestion?  Thanks a lot for your help!!

-Gaku


Yonik Seeley wrote:
> 
> Not sure why you would be getting an OOM from just indexing, and with
> the 1.5G heap you've given the JVM.
> Have you tried Sun's JVM?
> 
> -Yonik
> 
> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>
>> Hi all Solr users/developers/experts,
>>
>> I have the following scenario and I appreciate any advice for tuning my
>> solr
>> master server.
>>
>> I have a field in my schema that would index (but not stored) about
>> ~1
>> ids for each document.  This field is expected to govern the size of the
>> document.  Each id can contain up to 6 characters.  I figure that there
>> are
>> two alternatives for this field, one is the use a string multi-valued
>> field,
>> and the other would be to pass a white-space-delimited string to solr and
>> have solr tokenize such string based on whitespace (the text_ws
>> fieldType).
>> The master server is expected to receive constant stream of updates.
>>
>> The expected/estimated document size can range from 50k to 100k for a
>> single
>> document.  (I know this is quite large). The number of documents is
>> expected
>> to be around 200,000 on each master server, and there can be multiple
>> master
>> servers (sharding).  I wish the master can handle more docs too if I can
>> figure a way out.
>>
>> Currently, I'm performing some basic stress tests to simulate the
>> indexing
>> side on the master server.  This stress test would continuously add new
>> documents at the rate of about 10 documents every 30 seconds.  Autocommit
>> is
>> being used (50 docs and 180 seconds constraints), but I have no idea if
>> this
>> is the preferred way.  The goal is to keep adding new documents until we
>> can
>> get at least 200,000 documents (or about 20GB of index) on the master (or
>> even more if the server can handle it)
>>
>> What I experienced from the indexing stress test is that the master
>> server
>> failed to respond after a while, such as non-pingable when there are
>> about
>> 30k documents.  When looking at the log, they are mostly:
>> java.lang.OutOfMemoryError: Java heap space
>> OR
>> Ping query caused exception: null (this is probably caused by the OOM
>> problem)
>>
>> There were also a few cases that the java process even went away.
>>
>> Questions:
>> 1)  Is it better to use the multi-valued string field or the text_ws
>> field
>> for this large field?
>> 2)  Is it better to have more outstanding docs per commit or more
>> frequent
>> commit, in term of maximizing server resources?  What is the preferred
>> way
>> to commit documents assuming that solr master receives updates
>> frequently?
>> How many updated docs should there be before issuing a commit?
>> 3)  How to avoid the OOM problem in my case? I'm already doing
>> (-Xms1536M
>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>> adding
>> more Ram would just delay the OOM problem.  Any additional JVM option to
>> consider?
>> 4)  Any recommendation for the master server configuration, in a
>> sense that I
>> can maximize the number of indexed docs?
>> 5)  How can it disable caching on the master altogether as queries
>> won't hit
>> the master?
>> 6)  For an average doc size of 50k-100k, is that too large for solr,
>> or even
>> solr is the right tool? If not, any alternative?  If we are able to
>> reduce
>> the size of docs, can we expect to index more documents?
>>
>> The followings are info related to software/hardware/configuration:
>>
>> Solr version (solr nightly build on 5/23/2008)
>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>Solr Implementation Version: nightly
>>Lucene Specification Version: 2.3.2
>>Lucene Implementation Version: 2.3.2 652650
>>Jetty: 6.1.3
>>
>> Schema.xml (the section that I think are relevant to the master server.)
>>
>>> omitNorms="true"/>
>>> positionIncrementGap="100">
>>  
>>
>>  
>>
>>
>> > required="true"
>> />
>> > multiValued="true" omitNorms="true"/>
>>> stored="false"
>> omitNorms="true"/>
>>
>> id
>>
>> Solrconfig.xml
>>  
>>false
>>10
>>500
>>50
>>5000
>>2
>>1000
>>1
>>
>> org.apache.lucene.index.LogByteSizeMergePolicy
>> org.apache.lucene.index.ConcurrentMergeScheduler
>>single
>>  
>>
>>  
>>false
>>50
>>10
>>
>>500
>>5000
>>2
>>false
>>  
>>  
>>
>>
>>  50
>>  18
>>
>>
>>  solr/bin/snapshooter
>>  .
>>  true
>>
>>  
>>
>>  
>>50
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>  autowarmCount="0"/>
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>  autowarmCount="0"/>
>>>  class="solr.LRUCache"
>>  size="0"
>>  initialSize="0"
>>   

Marking Elevation

2008-05-28 Thread Jon Baer

Hi,

Id like to use elevate.xml and the component to simulate a GSA  
"KeyMatch" function (ie type in A get B back always) but it seems the  
elements are not marked that they have been elevated.  Is there any  
other way to accomplish something like that w/o having to plug a Map  
into my SolrJ code?


Thanks.

- Jon


EnableLazyFieldLoading?

2008-05-28 Thread Dallan Quass
If I'm loading say 80-90% of the fields 80-90% of the time, and I don't have
any large compressed text fields, is it safe to say that I'm probably better
off to turn off lazy field loading?

Thanks,

--dallan



Re: Solr indexing configuration help

2008-05-28 Thread Yonik Seeley
On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
> I used the admin GUI to get the java info.
> java.vm.specification.vendor = Sun Microsystems Inc.
Well, your original email listed IcedTea... but that is mostly Sun
code,  so maybe that's why the vendor is still listed as Sun.

I'd recommend downloading1.6.0_3 from java.sun.com and trying that.

Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick
with 1.6.0_03 for now.

-Yonik


> Any suggestion?  Thanks a lot for your help!!
>
> -Gaku
>
>
> Yonik Seeley wrote:
>>
>> Not sure why you would be getting an OOM from just indexing, and with
>> the 1.5G heap you've given the JVM.
>> Have you tried Sun's JVM?
>>
>> -Yonik
>>
>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi all Solr users/developers/experts,
>>>
>>> I have the following scenario and I appreciate any advice for tuning my
>>> solr
>>> master server.
>>>
>>> I have a field in my schema that would index (but not stored) about
>>> ~1
>>> ids for each document.  This field is expected to govern the size of the
>>> document.  Each id can contain up to 6 characters.  I figure that there
>>> are
>>> two alternatives for this field, one is the use a string multi-valued
>>> field,
>>> and the other would be to pass a white-space-delimited string to solr and
>>> have solr tokenize such string based on whitespace (the text_ws
>>> fieldType).
>>> The master server is expected to receive constant stream of updates.
>>>
>>> The expected/estimated document size can range from 50k to 100k for a
>>> single
>>> document.  (I know this is quite large). The number of documents is
>>> expected
>>> to be around 200,000 on each master server, and there can be multiple
>>> master
>>> servers (sharding).  I wish the master can handle more docs too if I can
>>> figure a way out.
>>>
>>> Currently, I'm performing some basic stress tests to simulate the
>>> indexing
>>> side on the master server.  This stress test would continuously add new
>>> documents at the rate of about 10 documents every 30 seconds.  Autocommit
>>> is
>>> being used (50 docs and 180 seconds constraints), but I have no idea if
>>> this
>>> is the preferred way.  The goal is to keep adding new documents until we
>>> can
>>> get at least 200,000 documents (or about 20GB of index) on the master (or
>>> even more if the server can handle it)
>>>
>>> What I experienced from the indexing stress test is that the master
>>> server
>>> failed to respond after a while, such as non-pingable when there are
>>> about
>>> 30k documents.  When looking at the log, they are mostly:
>>> java.lang.OutOfMemoryError: Java heap space
>>> OR
>>> Ping query caused exception: null (this is probably caused by the OOM
>>> problem)
>>>
>>> There were also a few cases that the java process even went away.
>>>
>>> Questions:
>>> 1)  Is it better to use the multi-valued string field or the text_ws
>>> field
>>> for this large field?
>>> 2)  Is it better to have more outstanding docs per commit or more
>>> frequent
>>> commit, in term of maximizing server resources?  What is the preferred
>>> way
>>> to commit documents assuming that solr master receives updates
>>> frequently?
>>> How many updated docs should there be before issuing a commit?
>>> 3)  How to avoid the OOM problem in my case? I'm already doing
>>> (-Xms1536M
>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>> adding
>>> more Ram would just delay the OOM problem.  Any additional JVM option to
>>> consider?
>>> 4)  Any recommendation for the master server configuration, in a
>>> sense that I
>>> can maximize the number of indexed docs?
>>> 5)  How can it disable caching on the master altogether as queries
>>> won't hit
>>> the master?
>>> 6)  For an average doc size of 50k-100k, is that too large for solr,
>>> or even
>>> solr is the right tool? If not, any alternative?  If we are able to
>>> reduce
>>> the size of docs, can we expect to index more documents?
>>>
>>> The followings are info related to software/hardware/configuration:
>>>
>>> Solr version (solr nightly build on 5/23/2008)
>>>Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>Solr Implementation Version: nightly
>>>Lucene Specification Version: 2.3.2
>>>Lucene Implementation Version: 2.3.2 652650
>>>Jetty: 6.1.3
>>>
>>> Schema.xml (the section that I think are relevant to the master server.)
>>>
>>>>> omitNorms="true"/>
>>>>> positionIncrementGap="100">
>>>  
>>>
>>>  
>>>
>>>
>>> >> required="true"
>>> />
>>> >> multiValued="true" omitNorms="true"/>
>>>>> stored="false"
>>> omitNorms="true"/>
>>>
>>> id
>>>
>>> Solrconfig.xml
>>>  
>>>false
>>>10
>>>500
>>>50
>>>5000
>>>2
>>>1000
>>>1
>>>
>>> org.apache.lucene.index.LogByteSizeMergePolicy
>>> org.apache.lucene.index.ConcurrentMergeScheduler
>>>single
>>>  
>>>
>

Re: EnableLazyFieldLoading?

2008-05-28 Thread Yonik Seeley
On Wed, May 28, 2008 at 11:00 PM, Dallan Quass <[EMAIL PROTECTED]> wrote:
> If I'm loading say 80-90% of the fields 80-90% of the time, and I don't have
> any large compressed text fields, is it safe to say that I'm probably better
> off to turn off lazy field loading?

Yes, as long as the 10-20% aren't really big.

-Yonik


Issuing queries during analysis?

2008-05-28 Thread Dallan Quass
I have a situation where it would be beneficial to issue queries in a filter
that is called during analysis.  In a nutshell, I have an index of places
that includes possible abbreviations.  And I want to query this index during
analysis to convert user-entered places to "standardized" places.  So if
someone enters "Chicago, IL" into a "place" field, I want to write a filter
that first issues a query on "IL" to find that the standardized name for IL
is Illinois, and then issues a query on places named "Chicago" located in
"Illinois" to find that the standardized name is "Chicago, Cook, Illinois",
and then returns this string in a token.  

I've tried having the filter factory implement SolrCoreAware, but that isn't
allowed for filter factories.  I've considered calling
SolrCore.getSolrCore(), but this function has been deprecated with a comment
that "if you are using multiple cores, this is not a function to use", and
I'd like to use multiple cores someday.  I looked at MultiCore.java but
couldn't find a way to return a specific core.  

Any ideas?  I could issue the queries to standardize the place fields in
each document before indexing it, and then send SOLR documents with
pre-standardized place fields, but it would sure be more convenient (and
probably better-performing) to issue the queries during analysis.

I'd appreciate suggestions!

--dallan



Re: Solr indexing configuration help

2008-05-28 Thread Otis Gospodnetic
Gaku,

But what's this then:

>> JVM version:
>>java version "1.7.0"
>> IcedTea Runtime Environment (build 1.7.0-b21)
>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)


Get the JVM from Sun.  Also, why do you have autoCommit on if all you are 
testing is indexing?  I'd turn that off.  The Java process going away sounds 
bad and smells like a Java/JVM problem more than Solr problem.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Gaku Mak <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 28, 2008 10:30:39 PM
> Subject: Re: Solr indexing configuration help
> 
> 
> I used the admin GUI to get the java info.
> java.vm.specification.vendor = Sun Microsystems Inc.
> 
> Any suggestion?  Thanks a lot for your help!!
> 
> -Gaku
> 
> 
> Yonik Seeley wrote:
> > 
> > Not sure why you would be getting an OOM from just indexing, and with
> > the 1.5G heap you've given the JVM.
> > Have you tried Sun's JVM?
> > 
> > -Yonik
> > 
> > On Wed, May 28, 2008 at 7:35 PM, gaku113 wrote:
> >>
> >> Hi all Solr users/developers/experts,
> >>
> >> I have the following scenario and I appreciate any advice for tuning my
> >> solr
> >> master server.
> >>
> >> I have a field in my schema that would index (but not stored) about
> >> ~1
> >> ids for each document.  This field is expected to govern the size of the
> >> document.  Each id can contain up to 6 characters.  I figure that there
> >> are
> >> two alternatives for this field, one is the use a string multi-valued
> >> field,
> >> and the other would be to pass a white-space-delimited string to solr and
> >> have solr tokenize such string based on whitespace (the text_ws
> >> fieldType).
> >> The master server is expected to receive constant stream of updates.
> >>
> >> The expected/estimated document size can range from 50k to 100k for a
> >> single
> >> document.  (I know this is quite large). The number of documents is
> >> expected
> >> to be around 200,000 on each master server, and there can be multiple
> >> master
> >> servers (sharding).  I wish the master can handle more docs too if I can
> >> figure a way out.
> >>
> >> Currently, I'm performing some basic stress tests to simulate the
> >> indexing
> >> side on the master server.  This stress test would continuously add new
> >> documents at the rate of about 10 documents every 30 seconds.  Autocommit
> >> is
> >> being used (50 docs and 180 seconds constraints), but I have no idea if
> >> this
> >> is the preferred way.  The goal is to keep adding new documents until we
> >> can
> >> get at least 200,000 documents (or about 20GB of index) on the master (or
> >> even more if the server can handle it)
> >>
> >> What I experienced from the indexing stress test is that the master
> >> server
> >> failed to respond after a while, such as non-pingable when there are
> >> about
> >> 30k documents.  When looking at the log, they are mostly:
> >> java.lang.OutOfMemoryError: Java heap space
> >> OR
> >> Ping query caused exception: null (this is probably caused by the OOM
> >> problem)
> >>
> >> There were also a few cases that the java process even went away.
> >>
> >> Questions:
> >> 1)  Is it better to use the multi-valued string field or the text_ws
> >> field
> >> for this large field?
> >> 2)  Is it better to have more outstanding docs per commit or more
> >> frequent
> >> commit, in term of maximizing server resources?  What is the preferred
> >> way
> >> to commit documents assuming that solr master receives updates
> >> frequently?
> >> How many updated docs should there be before issuing a commit?
> >> 3)  How to avoid the OOM problem in my case? I'm already doing
> >> (-Xms1536M
> >> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
> >> adding
> >> more Ram would just delay the OOM problem.  Any additional JVM option to
> >> consider?
> >> 4)  Any recommendation for the master server configuration, in a
> >> sense that I
> >> can maximize the number of indexed docs?
> >> 5)  How can it disable caching on the master altogether as queries
> >> won't hit
> >> the master?
> >> 6)  For an average doc size of 50k-100k, is that too large for solr,
> >> or even
> >> solr is the right tool? If not, any alternative?  If we are able to
> >> reduce
> >> the size of docs, can we expect to index more documents?
> >>
> >> The followings are info related to software/hardware/configuration:
> >>
> >> Solr version (solr nightly build on 5/23/2008)
> >>Solr Specification Version: 1.2.2008.05.23.08.06.59
> >>Solr Implementation Version: nightly
> >>Lucene Specification Version: 2.3.2
> >>Lucene Implementation Version: 2.3.2 652650
> >>Jetty: 6.1.3
> >>
> >> Schema.xml (the section that I think are relevant to the master server.)
> >>
> >>
> >> omitNorms="true"/>
> >>
> >> positionIncrementGap="100">
> >>  
> >>
> >>  
> >>
>

Re: ClassCastException trying to use distributed search

2008-05-28 Thread Otis Gospodnetic
Grégoire,

I took a quick look at SearchHandler:242, but it looks like figuring out where 
this class cast exception came from would require debugging (which you may have 
to do on your end).  There is no patch to apply for distributed search -- 
what's in the trunk already has distributed search functionality and it works 
(I've used it), though there are still improvements to be made (e.g. SOLR-502, 
etc.)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Grégoire Neuville <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 28, 2008 6:09:08 PM
> Subject: ClassCastException trying to use distributed search
> 
> Hi,
> 
> I've tried several times to put solr distributed search capabilities at work
> but failed each time : the ant building of solr-1.3's trunk doesn't bring
> any errros, but after deploying the obtained war in tomcat and trying a
> request on solr with the 'shards' parameter on, I keep on getting this cast
> exception (see below).
> 
> Did I miss something during the building ? Must I apply a patch on the trunk
> version ?
> 
> Thanks a lot for your answers.
> -- 
> Grégoire
> 
> *java.lang.ClassCastException: java.lang.String
> org.apache.solr.common.SolrException: java.lang.ClassCastException:
> java.lang.String at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:242)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:595) Caused by:
> java.lang.ClassCastException: java.lang.String at
> org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:86)
> at
> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:35)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:350)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:152)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:368)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:343)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at
> java.util.concurrent.FutureTask.run(FutureTask.java:123) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> ... 1 more*



Re: ClassCastException trying to use distributed search

2008-05-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi Grégoire,
I could not find an obvious problem . This is expected if the response
is not written by BinaryResponseWriter .
Could you apply the attached patch and see if you get the same error?
. This patch is not a solution. It is just to diagnose  the problem.
--Noble


On Thu, May 29, 2008 at 3:39 AM, Grégoire Neuville
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've tried several times to put solr distributed search capabilities at work
> but failed each time : the ant building of solr-1.3's trunk doesn't bring
> any errros, but after deploying the obtained war in tomcat and trying a
> request on solr with the 'shards' parameter on, I keep on getting this cast
> exception (see below).
>
> Did I miss something during the building ? Must I apply a patch on the trunk
> version ?
>
> Thanks a lot for your answers.
> --
> Grégoire
>
> *java.lang.ClassCastException: java.lang.String
> org.apache.solr.common.SolrException: java.lang.ClassCastException:
> java.lang.String at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:242)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:595) Caused by:
> java.lang.ClassCastException: java.lang.String at
> org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:86)
> at
> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:35)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:350)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:152)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:368)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:343)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at
> java.util.concurrent.FutureTask.run(FutureTask.java:123) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> ... 1 more*
>



-- 
--Noble Paul
Index: src/java/org/apache/solr/common/util/FastInputStream.java
===
--- src/java/org/apache/solr/common/util/FastInputStream.java	(revision 660433)
+++ src/java/org/apache/solr/common/util/FastInputStream.java	(working copy)
@@ -208,4 +208,8 @@
   public String readUTF() throws IOException {
 return new DataInputStream(this).readUTF();
   }
+
+public String toString() {
+return new String(buf);
+}
 }
Index: src/java/org/apache/solr/common/util/NamedListCodec.java
===
--- src/java/org/apache/solr/common/util/NamedListCodec.java	(revision 660433)
+++ src/java/org/apache/solr/common/util/NamedListCodec.java	(working copy)
@@ -18,6 +18,7 @@
 
 import org.apache.solr.common.SolrDocument;
 import org.apache.solr.common.SolrDocumentList;
+import org.apache.solr.common.SolrException;
 
 import java.io.*;
 import java.util.*;
@@ -83,7 +84,8 @@
   public NamedList unmarshal(InputStream is) throws IOException {
 FastInputStream dis = FastInputStream.wrap(is);
 byte version = dis.readByte();
-return (NamedList)readVal(dis);
+  if(version != VERSION) throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Invalid response : "+ dis);
+  return (NamedList)readVal(dis);
   }
 
 


Re: Solr indexing configuration help

2008-05-28 Thread Gaku Mak

That's good to know, I will get the 1.6.0_3 version of JDK and try it out. 
Do you guys have any comments on my other questions?  Any help is greatly
appreciated!! Thanks!

-Gaku


Yonik Seeley wrote:
> 
> On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>> I used the admin GUI to get the java info.
>> java.vm.specification.vendor = Sun Microsystems Inc.
> Well, your original email listed IcedTea... but that is mostly Sun
> code,  so maybe that's why the vendor is still listed as Sun.
> 
> I'd recommend downloading1.6.0_3 from java.sun.com and trying that.
> 
> Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick
> with 1.6.0_03 for now.
> 
> -Yonik
> 
> 
>> Any suggestion?  Thanks a lot for your help!!
>>
>> -Gaku
>>
>>
>> Yonik Seeley wrote:
>>>
>>> Not sure why you would be getting an OOM from just indexing, and with
>>> the 1.5G heap you've given the JVM.
>>> Have you tried Sun's JVM?
>>>
>>> -Yonik
>>>
>>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:

 Hi all Solr users/developers/experts,

 I have the following scenario and I appreciate any advice for tuning my
 solr
 master server.

 I have a field in my schema that would index (but not stored) about
 ~1
 ids for each document.  This field is expected to govern the size of
 the
 document.  Each id can contain up to 6 characters.  I figure that there
 are
 two alternatives for this field, one is the use a string multi-valued
 field,
 and the other would be to pass a white-space-delimited string to solr
 and
 have solr tokenize such string based on whitespace (the text_ws
 fieldType).
 The master server is expected to receive constant stream of updates.

 The expected/estimated document size can range from 50k to 100k for a
 single
 document.  (I know this is quite large). The number of documents is
 expected
 to be around 200,000 on each master server, and there can be multiple
 master
 servers (sharding).  I wish the master can handle more docs too if I
 can
 figure a way out.

 Currently, I'm performing some basic stress tests to simulate the
 indexing
 side on the master server.  This stress test would continuously add new
 documents at the rate of about 10 documents every 30 seconds. 
 Autocommit
 is
 being used (50 docs and 180 seconds constraints), but I have no idea if
 this
 is the preferred way.  The goal is to keep adding new documents until
 we
 can
 get at least 200,000 documents (or about 20GB of index) on the master
 (or
 even more if the server can handle it)

 What I experienced from the indexing stress test is that the master
 server
 failed to respond after a while, such as non-pingable when there are
 about
 30k documents.  When looking at the log, they are mostly:
 java.lang.OutOfMemoryError: Java heap space
 OR
 Ping query caused exception: null (this is probably caused by the OOM
 problem)

 There were also a few cases that the java process even went away.

 Questions:
 1)  Is it better to use the multi-valued string field or the
 text_ws
 field
 for this large field?
 2)  Is it better to have more outstanding docs per commit or more
 frequent
 commit, in term of maximizing server resources?  What is the preferred
 way
 to commit documents assuming that solr master receives updates
 frequently?
 How many updated docs should there be before issuing a commit?
 3)  How to avoid the OOM problem in my case? I'm already doing
 (-Xms1536M
 -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
 adding
 more Ram would just delay the OOM problem.  Any additional JVM option
 to
 consider?
 4)  Any recommendation for the master server configuration, in a
 sense that I
 can maximize the number of indexed docs?
 5)  How can it disable caching on the master altogether as queries
 won't hit
 the master?
 6)  For an average doc size of 50k-100k, is that too large for
 solr,
 or even
 solr is the right tool? If not, any alternative?  If we are able to
 reduce
 the size of docs, can we expect to index more documents?

 The followings are info related to software/hardware/configuration:

 Solr version (solr nightly build on 5/23/2008)
Solr Specification Version: 1.2.2008.05.23.08.06.59
Solr Implementation Version: nightly
Lucene Specification Version: 2.3.2
Lucene Implementation Version: 2.3.2 652650
Jetty: 6.1.3

 Schema.xml (the section that I think are relevant to the master
 server.)

>>> sortMissingLast="true"
 omitNorms="true"/>
>>> positionIncrementGap="100">
  

Re: Chinese Language + Solr

2008-05-28 Thread j . L
On Thu, May 15, 2008 at 11:25 PM, Walter Underwood <[EMAIL PROTECTED]>
wrote:

> I've worked with the Basis products. Solid, good support.
> Last time I talked to them, they were working on hooking
> them into Lucene.
>

i don't know basis product. but i know google use it and in china,
google.cnnot better that baidu.
we always use baidu.com to search chinese information.


>
> For really good quality results from any of these, you need
> to add terms to the user dictionary of the segmenter. These
> may be local jargon, product names, personal names, place
> names, etc.
>

yes, i agree your point.

baidu's analyzer use this way which i learn from Internet.


>
> Baidu has different problems than the rest of us, because
> their code has to be scary fast. They might even trade
> lower quality for more speed.
>

Can u say it more?
I think baidu use more cache server and have effective cache strategy.


>
> wunder
>
>


-- 
regards
j.L