date:20091113

you must have  a corresponding getter which returns String.
public String getValidFrom() {
   String s = null;//convert calendar to string
   return s;
}

On Fri, Nov 13, 2009 at 2:01 PM, paulhyo  wrote:
>
> Hi Paul,
>
> it's working for Query, but not for Updating (Add Bean). The getter method
> is returning a Calendar (GregorianCalendar instance)
>
> On the indexer side, a toString() or something equivalent is done and an
> error is thrown
>
> Caused by: java.text.ParseException: Unparseable date:
> "java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet=
> rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Europe/Berlin",offset=360,dstSavings=360,useDaylight=true,tran
> itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo
> th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2]
> ,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=
> ,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0]"
>
>
> public Calendar getValidFrom() {
>        return validFrom;
> }
>
> public void setValidFrom(Calendar validFrom) {
>        this.validFrom = validFrom;
> }
>
> @Field
> public void setValidFrom(String validFrom) {
>        Calendar cal = Calendar.getInstance();
>        try {
>                cal.setTime(dateFormat.parse(validFrom));
>        } catch (ParseException e) {
>                e.printStackTrace();
>        }
>        this.validFrom = cal;
> }
>
>
>
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> create a setter method for the field which take s a Stringand apply
>> the annotation there
>>
>> example
>>
>>
>> private Calendar validFrom;
>>
>> @Field
>> public void setvalidFrom(String s){
>>     //convert to Calendar object and set the field
>> }
>>
>>
>> On Fri, Nov 13, 2009 at 12:24 PM, paulhyo  wrote:
>>>
>>> Hi,
>>>
>>> I would like to know if there is a way to add type converters when using
>>> getBeans. I need convertion when Updating (Calendar -> String) and when
>>> Searching (String -> Calendar)
>>>
>>>
>>> The Bean class defines :
>>> @Field
>>> private Calendar validFrom;
>>>
>>> but the recieved type within Query Response is a String (2009-11-13)...
>>>
>>> Actually I get this error :
>>>
>>> java.lang.RuntimeException: Exception while setting value : 2009-09-16 on
>>> private java.util.Calendar
>>> ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom
>>>        at
>>> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360)
>>>        at
>>> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342)
>>>        at
>>> org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55)
>>>        at
>>> org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324)
>>>        at
>>> ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38)
>>>        at
>>> ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41)
>>>        at
>>> ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at junit.framework.TestCase.runTest(TestCase.java:164)
>>>        at junit.framework.TestCase.runBare(TestCase.java:130)
>>>        at junit.framework.TestResult$1.protect(TestResult.java:106)
>>>        at junit.framework.TestResult.runProtected(TestResult.java:124)
>>>        at junit.framework.TestResult.run(TestResult.java:109)
>>>        at junit.framework.TestCase.run(TestCase.java:120)
>>>        at junit.framework.TestSuite.runTest(TestSuite.java:230)
>>>        at junit.framework.TestSuite.run(TestSuite.java:225)
>>>        at
>>> org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
>>>        at
>>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>>>        at
>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>>>        at
>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>>>        at
>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>>>        at
>>> org.eclipse

highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak

Hello list,

I'm new to solr but from what I'm experimenting, it's awesome.
I have a small issue regarding the highlighting feature.

It finds stuff (as I see from the query analyzer), but the highlight list
looks something like this:






(the files were added using  ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract"); and I set the "literal.id" to
the filename)

My solrconfig.xml requesthandler looks like:

  

 
   explicit
   
   true
   3
   30
   
   
   *
   true
   0.5
   [-\w ,/\n\"']{20,200}
   true
 
  

The schema.xml is untouched and downloaded yesterday from the latest stable
build.

At first, I thought it had something to do with the extraction of the pdf,
but I tried the demo xml docs also and got the same result.

I'm new to this, so please help.

Thank you,

Chuck

Re: Stop solr without losing documents

2009-11-13 Thread gwk


Michael wrote:

I've got a process external to Solr that is constantly feeding it new
documents, retrying if Solr is nonresponding.  What's the right way to
stop Solr (running in Tomcat) so no documents are lost?

Currently I'm committing all cores and then running catalina's stop
script, but between my commit and the stop, more documents can come in
that would need *another* commit...

Lots of people must have had this problem already, so I know the
answer is simple; I just can't find it!

Thanks.
Michael
  
I don't know if this is the best solution, or even if it's applicable to 
your situation but we do incremental updates from a database based on a 
timestamp, (from a simple seperate sql table filled by triggers so 
deletes are measures correctly as well). We store this timestamp in solr 
as well. Our index script first does a simple Solr request to request 
the newest timestamp and basically selects the documents to update with 
a "SELECT * FROM document_updates WHERE timestamp >= X" where X is the 
timestamp returned from Solr (We use >= for the hopefully extremely rare 
case when two updates are at the same time and also at the same time the 
index script is run where it only retrieved one of the updates, this 
will cause some documents to be updates multiple times but as document 
updates are idempotent this is no real problem.)


Regards,

gwk

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jan-Eirik B . Nævdal

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček  wrote:

> Hi,
>
> I am looking for good arguments to justify implementation a search for
> sites
> which are available on the public internet. There are many sites in
> "powered
> by Solr" section which are indexed by Google and other search engines but
> still they decided to invest resources into building and maintenance of
> their own search functionality and not to go with [user_query site:
> my_site.com] google search. Why?
>
> By no mean I am saying it makes not sense to implement Solr! But I want to
> put together list of reasons and possibly with examples. Your help would be
> much appreciated!
>
> Let's narrow the scope of this discussion to the following:
> - the search should cover several community sites running open source CMSs,
> JIRAs, Bugillas ... and the like
> - all documents use open formats (no need to parse Word or Excel)
> (maybe something close to what LucidImagination does for mailing lists of
> Lucene and Solr)
>
> My initial kick off list would be:
>
> pros:
> - considering we understand the content (we understand the domain scope) we
> can fine tune the search engine to provide more accurate results
> - Solr can give us facets
> - we have user search logs (valuable for analysis)
> - implementing Solr is a fun
>
> cons:
> - requires resources (but the cost is relatively low depending on the query
> traffic, index size and frequency of updates)
>
> Regards,
> Lukas
>
> http://blog.lukas-vlcek.com/
>



-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Markus Jelsma - Buyways B.V.

Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker

But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much faster than multiple large and complex SQL statements.

On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote:

> pros:
> - considering we understand the content (we understand the domain scope) we
> can fine tune the search engine to provide more accurate results
> - Solr can give us facets
> - we have user search logs (valuable for analysis)
> - implementing Solr is a fun
> 
> cons:
> - requires resources (but the cost is relatively low depending on the query
> traffic, index size and frequency of updates)
> 
> Regards,
> Lukas
> 
> http://blog.lukas-vlcek.com/

Re: highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak

I found the solution.
If somebody will run into the same problem, here is how I solved it.

- while uploading the document:

req.setParam("uprefix", "attr_");
req.setParam("fmap.content", "attr_content");
req.setParam("overwrite", "true");
req.setParam("commit", "true");

- in the query:
http://localhost:8983/solr/select?q=attr_content:%22Django%22&rows=4
- edit the solrconfig.xml in the requesthandler params

   id,title
so that you won't get the whole text content inside the response.

Regards,
Chuck

On Fri, Nov 13, 2009 at 11:21 AM, Chuck Mysak  wrote:

> Hello list,
>
> I'm new to solr but from what I'm experimenting, it's awesome.
> I have a small issue regarding the highlighting feature.
>
> It finds stuff (as I see from the query analyzer), but the highlight list
> looks something like this:
>
> 
> 
> 
> 
>
> (the files were added using  ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract"); and I set the "literal.id"
> to the filename)
>
> My solrconfig.xml requesthandler looks like:
>
>default="true">
> 
>  
>explicit
>
>true
>3
>30
>
>
>*
>true
>0.5
>[-\w ,/\n\"']{20,200}
>true
>  
>   
>
> The schema.xml is untouched and downloaded yesterday from the latest stable
> build.
>
> At first, I thought it had something to do with the extraction of the pdf,
> but I tried the demo xml docs also and got the same result.
>
> I'm new to this, so please help.
>
> Thank you,
>
> Chuck
>
>
>
>
>
>

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Chantal Ackermann




Jan-Eirik B. Nævdal schrieb:

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


+1 expecially the last point
you can also add a robot.txt and prohibit spidering of the site to 
reduce traffic. google won't index any highly dynamic content, then.





Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček  wrote:


Hi,

I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
"powered
by Solr" section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

By no mean I am saying it makes not sense to implement Solr! But I want to
put together list of reasons and possibly with examples. Your help would be
much appreciated!

Let's narrow the scope of this discussion to the following:
- the search should cover several community sites running open source CMSs,
JIRAs, Bugillas ... and the like
- all documents use open formats (no need to parse Word or Excel)
(maybe something close to what LucidImagination does for mailing lists of
Lucene and Solr)

My initial kick off list would be:

pros:
- considering we understand the content (we understand the domain scope) we
can fine tune the search engine to provide more accurate results
- Solr can give us facets
- we have user search logs (valuable for analysis)
- implementing Solr is a fun

cons:
- requires resources (but the cost is relatively low depending on the query
traffic, index size and frequency of updates)

Regards,
Lukas

http://blog.lukas-vlcek.com/





--
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

Lukáš Vlček wrote:
> 
> I am looking for good arguments to justify implementation a search for
> sites
> which are available on the public internet. There are many sites in
> "powered
> by Solr" section which are indexed by Google and other search engines but
> still they decided to invest resources into building and maintenance of
> their own search functionality and not to go with [user_query site:
> my_site.com] google search. Why?
> 

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Lukáš Vlček

Hi,

thanks for inputs so far... however, let's put it this way:

When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and use its search capability

Regards,
Lukas


On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg wrote:

>
>
> Lukáš Vlček wrote:
> >
> > I am looking for good arguments to justify implementation a search for
> > sites
> > which are available on the public internet. There are many sites in
> > "powered
> > by Solr" section which are indexed by Google and other search engines but
> > still they decided to invest resources into building and maintenance of
> > their own search functionality and not to go with [user_query site:
> > my_site.com] google search. Why?
> >
>
> You're assuming that Solr is just used in these cases to index discrete web
> pages which Google etc. would be able to access via following navigational
> links.
>
> I would imagine that in a lot of cases, Solr is used to index database
> entities which are used to build [parts of] pages dynamically, and which
> might be viewable in different forms in various different pages.
>
> Plus, with stored fields, you have the option of actually driving a website
> off Solr instead of directly off a database, which might make sense from a
> speed perspective in some cases.
>
> And further, going back to page-only indexing -- you have no guarantee when
> Google will decide to recrawl your site, so there may be a delay before
> changes show up in their index. With an in-house search engine you can
> reindex as often as you like.
>
> Andrew.
>
> --
> View this message in context:
> http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Data import problem with child entity from different database

Morning all,

I'm having problems with joining child a child entity from one database to a
parent from another...

My entity definitions look like this (names changed for brevity):

c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know the child table has data for the
corresponding parent rows, and I've even watched the SQL queries against the
child table appearing in Oracle's sqldeveloper as the DataImportHandler
runs. But no content for child.d gets into the index.

My schema contains a definition for a field called d like so:

(keywords_ids is a conservatively-analyzed text type which has worked fine
in other contexts.)

Two things occur to me.

1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
is just a char(4), nothing fancy. Could something weird with character
encodings be happening?

2. d isn't a primary key in either parent or child, but this shouldn't
matter should it?

Additional data points -- I also tried using the CachedSqlEntityProcessor to
do in-memory table caching of child, but it didn't work then either. I got a
lot of error messages like this:

No value available for the cache key : d in the entity : child

If anyone knows whether this is a known limitation (if so I can work round
it), or an unexpected case (if so I'll file a bug report), please shout. I'm
using 1.4.

Yet again, many thanks :-)

Andrew.

--
View this message in context:
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site



Lukáš Vlček wrote:
> 
> When you need to search for something Lucene or Solr related, which one do
> you use:
> - generic Google
> - go to a particular mail list web site and search from here (if there is
> any search form at all)
> 

Both of these (Nabble in the second case) in case any recent posts have
appeared which Google hasn't picked up.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site

For this list I usually end up @ http://solr.markmail.org (which I believe also 
uses Lucene under the hood)

Google is such a black box ... 

Pros:
+ 1 Open Source (enough said :-)

There also seems to always be the notion that "crawling" leads itself to 
produce the best results but that is rarely the case.  And unless you are a 
"special" type of site Google will not overlay your results w/ some type of 
context in the search (ie news or sports, etc).  

What I think really needs to happen is Solr (and is a bit missing @ the moment) 
is there needs to be a common interface to "reindexing" another index (if that 
makes sense) ... something akin or like OpenSearch 
(http://www.opensearch.org/Community/OpenSearch_software)

For example what I would like to do is have my site, have my search index, and 
connect Google to indexing just to my search index (and not crawl the site) ... 
the only current option for something like that are sitemaps which I think Solr 
(templates) should have a contrib project for (but you would have to generate 
these offline for sure).

- Jon  

On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote:

> Hi,
> 
> thanks for inputs so far... however, let's put it this way:
> 
> When you need to search for something Lucene or Solr related, which one do
> you use:
> - generic Google
> - go to a particular mail list web site and search from here (if there is
> any search form at all)
> - go to LucidImagination.com and use its search capability
> 
> Regards,
> Lukas
> 
> 
> On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg wrote:
> 
>> 
>> 
>> Lukáš Vlček wrote:
>>> 
>>> I am looking for good arguments to justify implementation a search for
>>> sites
>>> which are available on the public internet. There are many sites in
>>> "powered
>>> by Solr" section which are indexed by Google and other search engines but
>>> still they decided to invest resources into building and maintenance of
>>> their own search functionality and not to go with [user_query site:
>>> my_site.com] google search. Why?
>>> 
>> 
>> You're assuming that Solr is just used in these cases to index discrete web
>> pages which Google etc. would be able to access via following navigational
>> links.
>> 
>> I would imagine that in a lot of cases, Solr is used to index database
>> entities which are used to build [parts of] pages dynamically, and which
>> might be viewable in different forms in various different pages.
>> 
>> Plus, with stored fields, you have the option of actually driving a website
>> off Solr instead of directly off a database, which might make sense from a
>> speed perspective in some cases.
>> 
>> And further, going back to page-only indexing -- you have no guarantee when
>> Google will decide to recrawl your site, so there may be a delay before
>> changes show up in their index. With an in-house search engine you can
>> reindex as often as you like.
>> 
>> Andrew.
>> 
>> --
>> View this message in context:
>> http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>>

Re: Selection of terms for MoreLikeThis


Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.


Andrew Clegg wrote:
> 
> Hi,
> 
> If I run a MoreLikeThis query like the following:
> 
> http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1
> 
> one of the hits in the results is "and" (I don't do any stopword removal
> on this field).
> 
> However if I look inside that document with the TermVectorComponent:
> 
> http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords
> 
> I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms
> with *much* higher tf.idf scores, e.g.:
> 
> 
> 1
> 10
> 0.1
> 
> 
> that *don't* appear in the MoreLikeThis list. (I tried adding
> &mlt.maxwl=999 to the end of the MLT query but it makes no difference.)
> 
> What's going on? Surely something with tf.idf = 0.1 is a far better
> candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
> Or does MoreLikeThis do some other heuristic magic to select good
> candidates, and sometimes get it wrong?
> 
> BTW the keywords field is indexed, stored, multi-valued and term-vectored.
> 
> Thanks,
> 
> Andrew.
> 
> -- 
> :: http://biotext.org.uk/ ::
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import problem with child entity from different database

no obvious issues.
you may post your entire data-config.xml

do w/o CachedSqlEntityProcessor first and then apply that later


On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg  wrote:
>
> Morning all,
>
> I'm having problems with joining child a child entity from one database to a
> parent from another...
>
> My entity definitions look like this (names changed for brevity):
>
> 
>
>  
>
> 
>
> c is getting indexed fine (it's stored, I can see field 'c' in the search
> results) but child.d isn't. I know the child table has data for the
> corresponding parent rows, and I've even watched the SQL queries against the
> child table appearing in Oracle's sqldeveloper as the DataImportHandler
> runs. But no content for child.d gets into the index.
>
> My schema contains a definition for a field called d like so:
>
>  multiValued="true" termVectors="true" />
>
> (keywords_ids is a conservatively-analyzed text type which has worked fine
> in other contexts.)
>
> Two things occur to me.
>
> 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
> is just a char(4), nothing fancy. Could something weird with character
> encodings be happening?
>
> 2. d isn't a primary key in either parent or child, but this shouldn't
> matter should it?
>
> Additional data points -- I also tried using the CachedSqlEntityProcessor to
> do in-memory table caching of child, but it didn't work then either. I got a
> lot of error messages like this:
>
> No value available for the cache key : d in the entity : child
>
> If anyone knows whether this is a known limitation (if so I can work round
> it), or an unexpected case (if so I'll file a bug report), please shout. I'm
> using 1.4.
>
> Yet again, many thanks :-)
>
> Andrew.
>
> --
> View this message in context: 
> http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

exclude some fields from copying dynamic fields | schema.xml

2009-11-13 Thread Vicky_Dev


Hi, 
we are using the following entry in schema.xml to make a copy of one type of
dynamic field to another : 
 

Is it possible to exclude some fields from copying.

We are using Solr1.3

~Vikrant

-- 
View this message in context: 
http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import problem with child entity from different database




Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> no obvious issues.
> you may post your entire data-config.xml
> 

Here it is, exactly as last attempt but with usernames etc. removed.

Ignore the comments and the unused FileDataSource...

http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> do w/o CachedSqlEntityProcessor first and then apply that later
> 

Yep, that was just a bit of a wild stab in the dark to see if it made any
difference.

Thanks,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless

I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.

It would be wonderful if from Java we could simply set a per-thread
"IO priority", but, it'll be a looong time until that's possible.

So I think for now we should make a Directory impl that emulates such
behavior, eg Lucene could state the "context" (merge, flush, search,
nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and
then the Directory could hack in pausing the merge IO whenever
search/nrt-reopen IO is active.

Mike

On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller  wrote:
> Jerome L Quinn wrote:
>> Hi, everyone, this is a problem I've had for quite a while,
>> and have basically avoided optimizing because of it.  However,
>> eventually we will get to the point where we must delete as
>> well as add docs continuously.
>>
>> I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
>> instance running inside tomcat 6, so no replication.  Merge factor is the
>> default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
>> autoCommit is set at 3 sec.
>>
>> We continually push new data into the index, at somewhere between 1-10 docs
>> every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
>> under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
>> There's nothing
>> else of substance running on the box.
>>
>> Optimizing the index takes about 65 min.
>>
>> As long as I'm not optimizing, search and indexing times are satisfactory.
>>
>> When I start the optimize, I see massive problems with timeouts pushing new
>> docs
>> into the index, and search times balloon.  A typical search while
>> optimizing takes
>> about 1 min instead of a few seconds.
>>
>> Can anyone offer me help with fixing the problem?
>>
>> Thanks,
>> Jerry Quinn
>>
> Ah, the pains of optimization. Its kind of just how it is. One solution
> is to use two boxes and replication - optimize on the master, and then
> queries only hit the slave. Out of reach for some though, and adds many
> complications.
>
> Another kind of option is to use the partial optimize feature:
>
>  
>
> Using this, you can optimize down to n segments and take a shorter hit
> each time.
>
> Also, if optimizing is so painful, you might lower the merge factor
> amortize that pain better. Thats another way to slowly get there - if
> you lower the merge factor, as merging takes place, the new merge factor
> will be respected, and semgents will merge down. A merge factor of 2
> (the lowest) will make it so you only ever have 2 segments. Sometimes
> that works reasonably well - you could try 3-6 or something as well.
> Then when you do your partial optimizes (and eventually a full optimize
> perhaps), you want have so far to go.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: javabin in .NET?

2009-11-13 Thread Mauricio Scheffer

Nope. It has to be manually ported. Not so much because of the language
itself but because of differences in the libraries.


2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 

> Is there any tool to directly port java to .Net? then we can etxract
> out the client part of the javabin code and convert it.
>
> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
> wrote:
> > Has anyone looked into using the javabin response format from .NET
> (instead
> > of SolrJ)?
> >
> > It's mainly a curiosity.
> >
> > How much better could performance/bandwidth/throughput be?  How difficult
> > would it be to implement some .NET code (C#, I'd guess being the best
> > choice) to handle this response format?
> >
> > Thanks,
> >Erik
> >
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Chantal Ackermann


Hi Andrew,

no idea, I'm afraid - but could you sent the output of 
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the 
TermVectorComponent you've already pasted.


Chantal

Andrew Clegg schrieb:

Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.


Andrew Clegg wrote:

Hi,

If I run a MoreLikeThis query like the following:

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1

one of the hits in the results is "and" (I don't do any stopword removal
on this field).

However if I look inside that document with the TermVectorComponent:

http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords

I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms
with *much* higher tf.idf scores, e.g.:


1
10
0.1


that *don't* appear in the MoreLikeThis list. (I tried adding
&mlt.maxwl=999 to the end of the MLT query but it makes no difference.)

What's going on? Surely something with tf.idf = 0.1 is a far better
candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
Or does MoreLikeThis do some other heuristic magic to select good
candidates, and sometimes get it wrong?

BTW the keywords field is indexed, stored, multi-valued and term-vectored.

Thanks,

Andrew.

--
:: http://biotext.org.uk/ ::




--
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless

Another thing to try, is reducing the maxThreadCount for
ConcurrentMergeScheduler.

It defaults to 3, which I think is too high -- we should change this
default to 1 (I'll open a Lucene issue).

Mike

On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn  wrote:
>
> Hi, everyone, this is a problem I've had for quite a while,
> and have basically avoided optimizing because of it.  However,
> eventually we will get to the point where we must delete as
> well as add docs continuously.
>
> I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
> instance running inside tomcat 6, so no replication.  Merge factor is the
> default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
> autoCommit is set at 3 sec.
>
> We continually push new data into the index, at somewhere between 1-10 docs
> every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
> under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
> There's nothing
> else of substance running on the box.
>
> Optimizing the index takes about 65 min.
>
> As long as I'm not optimizing, search and indexing times are satisfactory.
>
> When I start the optimize, I see massive problems with timeouts pushing new
> docs
> into the index, and search times balloon.  A typical search while
> optimizing takes
> about 1 min instead of a few seconds.
>
> Can anyone offer me help with fixing the problem?
>
> Thanks,
> Jerry Quinn

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless

On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
 wrote:
> I think we sorely need a Directory impl that down-prioritizes IO
> performed by merging.

Presumably this "prioritizing Directory impl" could wrap/decorate any
existing Directory.

Mike

Re: javabin in .NET?

The javabin format does not have many dependencies. it may have 3-4
classes an that is it.

On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
 wrote:
> Nope. It has to be manually ported. Not so much because of the language
> itself but because of differences in the libraries.
>
>
> 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> Is there any tool to directly port java to .Net? then we can etxract
>> out the client part of the javabin code and convert it.
>>
>> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
>> wrote:
>> > Has anyone looked into using the javabin response format from .NET
>> (instead
>> > of SolrJ)?
>> >
>> > It's mainly a curiosity.
>> >
>> > How much better could performance/bandwidth/throughput be?  How difficult
>> > would it be to implement some .NET code (C#, I'd guess being the best
>> > choice) to handle this response format?
>> >
>> > Thanks,
>> >        Erik
>> >
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Selection of terms for MoreLikeThis



Chantal Ackermann wrote:
> 
> no idea, I'm afraid - but could you sent the output of 
> interestingTerms=details?
> This at least would show what MoreLikeThis uses, in comparison to the 
> TermVectorComponent you've already pasted.
> 

I can, but I'm afraid they're not very illuminating!

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=details&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1



 0
 59



 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0



Cheers,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html
Sent from the Solr - User mailing list archive at Nabble.com.

non english languages

2009-11-13 Thread Chuck Mysak

Hello all,

is there support for non-english language content indexing in Solr?
I'm interested in Bulgarian, Hungarian, Romanian and Russian.

Best regards,

Chuck

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Yonik Seeley

On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
 wrote:
> I think we sorely need a Directory impl that down-prioritizes IO
> performed by merging.

It's unclear if this case is caused by IO contention, or the OS cache
of the hot parts of the index being lost by that extra IO activity.
Of course the latter would lead to the former, but without that OS
disk cache, the searches may be too slow even w/o the extra IO.

-Yonik
http://www.lucidimagination.com

Re: non english languages

2009-11-13 Thread Robert Muir

the included snowball filters support hungarian, romanian, and russian.

On Fri, Nov 13, 2009 at 9:03 AM, Chuck Mysak  wrote:

> Hello all,
>
> is there support for non-english language content indexing in Solr?
>
I'm interested in Bulgarian, Hungarian, Romanian and Russian.
>
> Best regards,
>
> Chuck
>

-- 
Robert Muir
rcm...@gmail.com

Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Chantal Ackermann


Hi Andrew,

your URL does not include the parameter mlt.boost. Setting that to 
"true" made a noticeable difference for my queries.


If not, there is also the parameter
 mlt.minwl
"minimum word length below which words will be ignored."

All your other terms seem longer than 3, so it would help in this case? 
But seems a bit like work around.


Cheers,
Chantal

Andrew Clegg schrieb:


Chantal Ackermann wrote:

no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.



I can, but I'm afraid they're not very illuminating!

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=details&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1



 0
 59



 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0



Cheers,

Andrew.

--
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: javabin in .NET?

2009-11-13 Thread Mauricio Scheffer

I meant the standard IO libraries. They are different enough that the code
has to be manually ported. There were some automated tools back when
Microsoft introduced .Net, but IIRC they never really worked.

Anyway it's not a big deal, it should be a straightforward job. Testing it
thoroughly cross-platform is another thing though.

2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 

> The javabin format does not have many dependencies. it may have 3-4
> classes an that is it.
>
> On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
>  wrote:
> > Nope. It has to be manually ported. Not so much because of the language
> > itself but because of differences in the libraries.
> >
> >
> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
> >
> >> Is there any tool to directly port java to .Net? then we can etxract
> >> out the client part of the javabin code and convert it.
> >>
> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
> >> wrote:
> >> > Has anyone looked into using the javabin response format from .NET
> >> (instead
> >> > of SolrJ)?
> >> >
> >> > It's mainly a curiosity.
> >> >
> >> > How much better could performance/bandwidth/throughput be?  How
> difficult
> >> > would it be to implement some .NET code (C#, I'd guess being the best
> >> > choice) to handle this response format?
> >> >
> >> > Thanks,
> >> >Erik
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Reseting doc boosts

Hi,

Im trying to figure out if there is an easy way to basically "reset" all of any 
doc boosts which you have made (for analytical purposes) ... for example if I 
run an index, gather report, doc boost on the report, and reset the boosts @ 
time of next index ... 

It would seem to be from just knowing how Lucene works that I would really need 
to reindex since its a attrib on the doc itself which would have to be 
modified, but there is no easy way to query for docs which have been boosted 
either.  Any insight?

Thanks.

- Jon

Re: Selection of terms for MoreLikeThis

Chantal Ackermann wrote:
> 
> your URL does not include the parameter mlt.boost. Setting that to 
> "true" made a noticeable difference for my queries.
> 

Hmm, I'm really not sure if this is doing the right thing either. When I add
it I get:

 1.0
 0.60737264
 0.27599618
 0.2476748
 0.24487767
 0.23969446
 0.1990452
 0.18447271
 0.13297324
 0.1233415
 0.11993817
 0.11789705
 0.117194556
 0.11164951
 0.10744005
 0.09943076
 0.097062066
 0.09287166
 0.0877542
 0.0864609
 0.08362857
 0.07988805
 0.079598725
 0.07747293
 0.075560644

"and" scores far more highly than much more discriminative words like
"chloroplast" and "glyoxylate", both of which have *much* higher tf.idf
scores than "and" according to the TermVectorComponent:

8
1887
0.0042395336512983575

7

0.0063006300630063005

45
60316
7.460706943431262E-4

In fact an order of magnitude higher.

Chantal Ackermann wrote:
> 
> If not, there is also the parameter
>   mlt.minwl
> "minimum word length below which words will be ignored."
> 
> All your other terms seem longer than 3, so it would help in this case? 
> But seems a bit like work around.
> 

Yeah, I could do that, or add a stopword list to that field. But there are
some other common terms in the list like "protein" or "enzyme" that are long
and not really stopwords, but have a similarly low tf.idf to "and":

43
189541
2.2686384476181933E-4

15
16712
8.975586404978459E-4

Plus, of course, I'm curious to know exactly how MLT is identifying those
terms as important, and if it's a bug or my fault...

Thanks for your help though! Do any of the Solr devs have an idea of the
mechanism at work here?

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26337677.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-13 Thread yountod


I'm getting the same thing.  The process runs, seemingly successfully, and I
can even go to other SOLR pages pointing to the same server and pull queries
against the index with these just-added entires.  But the response to the
original import says "failed" and "rollback" both through the XML response
and also in the logs.

Why is the process reporting failure and saying it did not commit/rolled
back, when it actually succeeded in importing and indexing?  If it rolled
back, as the logs say, I would expect to not be able to pull those rows out
with new queries against the index.



Avlesh Singh wrote:
> 
>>
>> But  even after I successfully index data using
>> http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true,
>> do solr search which returns meaningful results
>>
> I am not sure what "meaningful" means. The full-import command starts an
> asynchronous process to start re-indexing. The response that you get in
> return to the above mentioned URL, (always) indicates that a full-import
> has
> been started. It does NOT know about anything that might go wrong with the
> process itself.
> 
> and then visit http://host:port/solr-example/dataimport?command=status, I
>> can see thefollowing result ...
>>
> The status URL is the one which tells you what is going on with the
> process.
> The message - "Indexing failed. Rolled back all changes" can come because
> of
> multiple reasons - missing database drivers, incorrect sql queries,
> runtime
> errors in custom transformers etc.
> 
> Start the full-import once more. Keep a watch on the Solr server log. If
> you
> can figure out what's going wrong, great; otherwise, copy-paste the
> exception stack-trace from the log file for specific answers.
> 
> Cheers
> Avlesh
> 
> On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen 
> wrote:
> 
>> No. I did not check the logs.
>>
>> But  even after I successfully index data using
>> http://host:port
>> /solr-example/dataimport?command=full-import&commit=true&clean=true,
>> do solr search which returns meaningful results, and then visit
>> http://host:port/solr-example/dataimport?command=status, I can see the
>> following result
>>
>> 
>> -
>> 
>> 0
>> 1
>> 
>> -
>> 
>> -
>> 
>> data-config.xml
>> 
>> 
>> status
>> idle
>> 
>> -
>> 
>> 0:2:11.426
>> 584
>> 1538
>> 0
>> 2009-11-09 23:54:41
>> *Indexing failed. Rolled back all changes.*
>> 2009-11-09 23:54:42
>> 2009-11-09 23:54:42
>> 2009-11-09 23:54:42
>> 
>> -
>> 
>> This response format is experimental.  It is likely to change in the
>> future.
>> 
>> 
>>
>> On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>> > On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen 
>> wrote:
>> >
>> > >
>> > >  When I use
>> > > http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport
>> to
>> > > debug
>> > > the indexing config file, I always see the status message on the
>> right
>> > part
>> > > Indexing failed. Rolled back all changes., even
>> the
>> > > indexing process looks to be successful. I am not sure whether you
>> guys
>> > > have
>> > > seen the same phenomenon or not.  BTW, I usually check the checkbox
>> Clean
>> > > and sometimes check Commit box, and then click Debug Now button.
>> > >
>> > >
>> > Do you see any exceptions in the logs?
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26338287.html
Sent from the Solr - User mailing list archive at Nabble.com.

scanning folders recursively / Tika

2009-11-13 Thread Peter Gabriel

Hello.

I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
Is there a comfortable way to scan folders recursively with an existing class 
or have i to write it myself? 

Any tips for best practise?

Greetings, Peter
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

Re: scanning folders recursively / Tika

2009-11-13 Thread Glen Newton

Have one thread recursing depth first down the directories & adding to
a queue (fixed size).
Have many threads reading off of the queue and doing the work.

-glen
http://zzzoot.blogspot.com/

2009/11/13 Peter Gabriel :
> Hello.
>
> I am on work with Tika 0.5 and want to scan a folder system about 10GB.
> Is there a comfortable way to scan folders recursively with an existing class 
> or have i to write it myself?
>
> Any tips for best practise?
>
> Greetings, Peter
> --
> Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
> sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
>



-- 

-

Re: Stop solr without losing documents

2009-11-13 Thread Michael

On Fri, Nov 13, 2009 at 4:32 AM, gwk  wrote:
> I don't know if this is the best solution, or even if it's applicable to
> your situation but we do incremental updates from a database based on a
> timestamp, (from a simple seperate sql table filled by triggers so deletes

Thanks, gwk!  This doesn't exactly meet our needs, but helped us get
to a solution.  In short, we are manually committing in our outside
updater process (instead of letting Solr autocommit), and marking
which documents have been updated before a successful commit.  Now
stopping solr is as easy as kill -9.

Michael

how to search against multiple attributes in the index


I want to build AND search query against field1 AND field2 etc. Both these
fields are stored in an index. I am migrating lucene code to Solr. Following
is my existing lucene code

BooleanQuery currentSearchingQuery = new BooleanQuery();

currentSearchingQuery.add(titleDescQuery,Occur.MUST);
highlighter = new Highlighter( new QueryScorer(titleDescQuery));

TermQuery searchTechGroupQyery = new TermQuery(new Term
("techGroup",searchForm.getTechGroup()));
currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
TermQuery searchProgramQyery = new TermQuery(new
Term("techProgram",searchForm.getTechProgram()));
currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
}

What's the equivalent Solr code for above Luce code. Any samples would be
appreciated.

Thanks,
-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
Sent from the Solr - User mailing list archive at Nabble.com.

The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Bertie Shen

Hey,

   I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
points to pretty old documentation. Is there a better document I refer to
for the setting up of LocalSolr and some performance analysis?

   Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a LocalSolr
lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene package
is in Lucene codebase and the package name is org.apache.lucene.spatial blah
blah.

   But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/  does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly.
After I restart tomcat, I could not load solr admin page. The error is as
follows. It looks solr is still looking for
old named classes.

  Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native
Method) at java.lang.Class.forName(Class.java:247) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at
org.apache.solr.core.SolrCore.(SolrCore.java:551) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744)
at
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144)
at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
org.apache.catalina.core.StandardService.start(StandardService.java:448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at
org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
Caused by: java.lang.ClassNotFoundException:
com.pjaol.search.geo.utils.DistanceFilter at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1362)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappCl

Re: how to search against multiple attributes in the index

Dive in - http://wiki.apache.org/solr/Solrj

Cheers
Avlesh

On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev  wrote:

>
> I want to build AND search query against field1 AND field2 etc. Both these
> fields are stored in an index. I am migrating lucene code to Solr.
> Following
> is my existing lucene code
>
> BooleanQuery currentSearchingQuery = new BooleanQuery();
>
> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>
> TermQuery searchTechGroupQyery = new TermQuery(new Term
> ("techGroup",searchForm.getTechGroup()));
>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
> TermQuery searchProgramQyery = new TermQuery(new
> Term("techProgram",searchForm.getTechProgram()));
>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
> }
>
> What's the equivalent Solr code for above Luce code. Any samples would be
> appreciated.
>
> Thanks,
> --
> View this message in context:
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley


It looks like solr+spatial will get some attention in 1.5, check:
https://issues.apache.org/jira/browse/SOLR-1561

Depending on your needs, that may be enough.  More robust/scaleable  
solutions will hopefully work their way into 1.5 (any help is always  
appreciated!)



On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
false in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley


Also:
https://issues.apache.org/jira/browse/SOLR-1302


On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
false in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org 
.apache.commons.daemon.support.DaemonLoader.start(Da

Re: how to search against multiple attributes in the index


I already did  dive in before. I am using solrj API and SolrQuery object to
build query. but its not clear/written how to build booleanQuery ANDing
bunch of different attributes in the index. Any samples please?

Avlesh Singh wrote:
> 
> Dive in - http://wiki.apache.org/solr/Solrj
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev  wrote:
> 
>>
>> I want to build AND search query against field1 AND field2 etc. Both
>> these
>> fields are stored in an index. I am migrating lucene code to Solr.
>> Following
>> is my existing lucene code
>>
>> BooleanQuery currentSearchingQuery = new BooleanQuery();
>>
>> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>>
>> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> ("techGroup",searchForm.getTechGroup()));
>>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> TermQuery searchProgramQyery = new TermQuery(new
>> Term("techProgram",searchForm.getTechProgram()));
>>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> }
>>
>> What's the equivalent Solr code for above Luce code. Any samples would be
>> appreciated.
>>
>> Thanks,
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ian Ibbotson

Heya.. could it be a problem with your solr config files? I seem to
recall a change from the docs as they were to get this working.. I
have...

  
  
lat
lng
4
25
  
   
   
  

  
  

  localsolr
  facet
  mlt
  highlight
  debug

  

That tie up with your config/ I'd bascially interpreted the current
packaging as... What used to be locallucene has deffo merged into
lucene-spatial in this build, no more locallucene. However, you still
need to build localsolr for now...

My solr jars are:

commons-beanutils-1.8.0.jar   commons-logging-1.1.1.jar
localsolr-1.5.2-rc1.jar  lucene-misc-2.9.1-ki-rc3.jar
serializer-2.7.1.jar   stax-1.2.0.jar
 xml-apis-1.3.04.jar
commons-codec-1.4.jar commons-pool-1.5.3.jar
log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar
slf4j-api-1.5.5.jarstax-api-1.0.jar
 xpp3-1.1.3.4.O.jar
commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar
lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar
slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar
commons-fileupload-1.2.1.jar  geronimo-stax-api_1.0_spec-1.0.1.jar
lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar
solr-commons-csv-1.4.0-ki-rc1.jar  woodstox-wstx-asl-3.2.7.jar
commons-httpclient-3.1.jargt2-referencing-2.3.1.jar
lucene-highlighter-2.9.1-ki-rc3.jar
lucene-spellchecker-2.9.1-ki-rc3.jar  solr-core-1.4.0-ki-rc1.jar
  xalan-2.7.1.jar
commons-io-1.3.2.jar  jsr108-0.01.jar
lucene-memory-2.9.1-ki-rc3.jar
org.codehaus.woodstox-wstx-asl-3.2.7.jar  solr-solrj-1.4.0-ki-rc1.jar
  xercesImpl-2.9.1.jar

Sorry for dumping the info at you... hope it helps tho

Ian.

2009/11/13 Bertie Shen :
> Hey,
>
>   I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
> search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
> points to pretty old documentation. Is there a better document I refer to
> for the setting up of LocalSolr and some performance analysis?
>
>   Just sync-ed Solr codebase and found LocalSolr is still NOT in the
> contrib package. Do we have a plan to incorporate it? I download a LocalSolr
> lib localsolr-1.5.jar from
> http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice
> that the namespace is com.pjaol.search. blah blah, while LocalLucene package
> is in Lucene codebase and the package name is org.apache.lucene.spatial blah
> blah.
>
>   But localsolr-1.5.jar from from
> http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/  does not
> work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly.
> After I restart tomcat, I could not load solr admin page. The error is as
> follows. It looks solr is still looking for
> old named classes.
>
>  Thanks.
>
> HTTP Status 500 - Severe errors in solr configuration. Check your log files
> for more detailed information on what may be wrong. If you want solr to
> continue after configuration errors, change:
> false in null
> -
> java.lang.NoClassDefFoundError:
> com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native
> Method) at java.lang.Class.forName(Class.java:247) at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
> org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
> org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
> org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
> org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
> org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at
> org.apache.solr.core.SolrCore.(SolrCore.java:551) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
> at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744)
> at
> org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144)
> at java.security.AccessController.doPrivileged(Native Method) at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at
> org.ap

Obtaining list of dynamic fields beind available in index

2009-11-13 Thread Eugene Dzhurinsky

Hi there!

How can we retrieve the complete list of dynamic fields, which are currently
available in index?

Thank you in advance!
-- 
Eugene N Dzhurinsky


pgpKftn1PiY0K.pgp
Description: PGP signature

Re: how to search against multiple attributes in the index

For a starting point, this might be a good read -
http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev  wrote:

>
> I already did  dive in before. I am using solrj API and SolrQuery object to
> build query. but its not clear/written how to build booleanQuery ANDing
> bunch of different attributes in the index. Any samples please?
>
> Avlesh Singh wrote:
> >
> > Dive in - http://wiki.apache.org/solr/Solrj
> >
> > Cheers
> > Avlesh
> >
> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
> wrote:
> >
> >>
> >> I want to build AND search query against field1 AND field2 etc. Both
> >> these
> >> fields are stored in an index. I am migrating lucene code to Solr.
> >> Following
> >> is my existing lucene code
> >>
> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
> >>
> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
> >>
> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
> >> ("techGroup",searchForm.getTechGroup()));
> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
> >> TermQuery searchProgramQyery = new TermQuery(new
> >> Term("techProgram",searchForm.getTechProgram()));
> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
> >> }
> >>
> >> What's the equivalent Solr code for above Luce code. Any samples would
> be
> >> appreciated.
> >>
> >> Thanks,
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Return doc if one or more query keywords occur multiple times

2009-11-13 Thread gistolero

Anyone?

 Original-Nachricht 
> Datum: Thu, 12 Nov 2009 13:29:20 +0100
> Von: gistol...@gmx.de
> An: solr-user@lucene.apache.org
> Betreff: Return doc if one or more query keywords occur multiple times

> Hello,
> 
> I am using Dismax request handler for queries:
> 
> ...select?q=foo bar foo2 bar2&qt=dismax&mm=2...
> 
> With parameter "mm=2" I configure that at least 2 of the optional clauses
> must match, regardless of how many clauses there are.
> 
> But now I want change this to the following:
> 
> List all documents that have at least 2 of the optional clauses OR that
> have at least one of the query terms (e.g. foo) more than once.
> 
> Is this possible?
> Thanks,
> Gisto
> 
> -- 
> DSL-Preisknaller: DSL Komplettpakete von GMX schon für 
> 16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

Re: Obtaining list of dynamic fields beind available in index

Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler
/admin/luke?numTerms=0

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky wrote:

> Hi there!
>
> How can we retrieve the complete list of dynamic fields, which are
> currently
> available in index?
>
> Thank you in advance!
> --
> Eugene N Dzhurinsky
>

Re: how to search against multiple attributes in the index


I think I found the answer. needed to read more API documentation :-)

you can do it using 
solrQuery.setFilterQueries() and build AND queries of multiple parameters.


Avlesh Singh wrote:
> 
> For a starting point, this might be a good read -
> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev 
> wrote:
> 
>>
>> I already did  dive in before. I am using solrj API and SolrQuery object
>> to
>> build query. but its not clear/written how to build booleanQuery ANDing
>> bunch of different attributes in the index. Any samples please?
>>
>> Avlesh Singh wrote:
>> >
>> > Dive in - http://wiki.apache.org/solr/Solrj
>> >
>> > Cheers
>> > Avlesh
>> >
>> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
>> wrote:
>> >
>> >>
>> >> I want to build AND search query against field1 AND field2 etc. Both
>> >> these
>> >> fields are stored in an index. I am migrating lucene code to Solr.
>> >> Following
>> >> is my existing lucene code
>> >>
>> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
>> >>
>> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>> >>
>> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> >> ("techGroup",searchForm.getTechGroup()));
>> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> >> TermQuery searchProgramQyery = new TermQuery(new
>> >> Term("techProgram",searchForm.getTechProgram()));
>> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> >> }
>> >>
>> >> What's the equivalent Solr code for above Luce code. Any samples would
>> be
>> >> appreciated.
>> >>
>> >> Thanks,
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to search against multiple attributes in the index

>
> you can do it using
> solrQuery.setFilterQueries() and build AND queries of multiple parameters.
>
Nope. You would need to read more -
http://wiki.apache.org/solr/FilterQueryGuidance

For your impatience, here's a quick starter -

#and between two fields
solrQuery.setQuery("+field1:foo +field2:bar");

#or between two fields
solrQuery.setQuery("field1:foo field2:bar");

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev  wrote:

>
> I think I found the answer. needed to read more API documentation :-)
>
> you can do it using
> solrQuery.setFilterQueries() and build AND queries of multiple parameters.
>
>
> Avlesh Singh wrote:
> >
> > For a starting point, this might be a good read -
> >
> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
> >
> > Cheers
> > Avlesh
> >
> > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev 
> > wrote:
> >
> >>
> >> I already did  dive in before. I am using solrj API and SolrQuery object
> >> to
> >> build query. but its not clear/written how to build booleanQuery ANDing
> >> bunch of different attributes in the index. Any samples please?
> >>
> >> Avlesh Singh wrote:
> >> >
> >> > Dive in - http://wiki.apache.org/solr/Solrj
> >> >
> >> > Cheers
> >> > Avlesh
> >> >
> >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
> >> wrote:
> >> >
> >> >>
> >> >> I want to build AND search query against field1 AND field2 etc. Both
> >> >> these
> >> >> fields are stored in an index. I am migrating lucene code to Solr.
> >> >> Following
> >> >> is my existing lucene code
> >> >>
> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
> >> >>
> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
> >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
> >> >>
> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
> >> >> ("techGroup",searchForm.getTechGroup()));
> >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
> >> >> TermQuery searchProgramQyery = new TermQuery(new
> >> >> Term("techProgram",searchForm.getTechProgram()));
> >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
> >> >> }
> >> >>
> >> >> What's the equivalent Solr code for above Luce code. Any samples
> would
> >> be
> >> >> appreciated.
> >> >>
> >> >> Thanks,
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Reseting doc boosts

AFAIK there is no way to "reset" the doc boost. You would need to re-index.
Moreover, there is no way to "search by boost".

Cheers
Avlesh

On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer  wrote:

> Hi,
>
> Im trying to figure out if there is an easy way to basically "reset" all of
> any doc boosts which you have made (for analytical purposes) ... for example
> if I run an index, gather report, doc boost on the report, and reset the
> boosts @ time of next index ...
>
> It would seem to be from just knowing how Lucene works that I would really
> need to reindex since its a attrib on the doc itself which would have to be
> modified, but there is no easy way to query for docs which have been boosted
> either.  Any insight?
>
> Thanks.
>
> - Jon

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Bertie Shen

Hi Ian and Ryan,

  Thanks for the reply.

  Ian, I checked your pasted config, I am using the same one except the
values of 4 25.
Basically I use the set up specified at http://www.gissearch.com/localsolr.
 But there are still the same error I pasted in previous email.

  Ryan, I just checked out the lib lucene-spatial-2.9.1.jar Grant checked in
today.  Previously I built lucene-spatial-3.0-dev.jar from Lucene java code
base directly. There is still no luck after the lib replacement.  I do not
think other lib matters in this case.





On Fri, Nov 13, 2009 at 8:34 AM, Ian Ibbotson wrote:

> Heya.. could it be a problem with your solr config files? I seem to
> recall a change from the docs as they were to get this working.. I
> have...
>
>  
>   class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory">
>lat
>lng
>4
>25
>  
>   
>   
>  
>
>   class="com.pjaol.search.solr.component.LocalSolrQueryComponent" />
>   class="org.apache.solr.handler.component.SearchHandler">
>
>  localsolr
>  facet
>  mlt
>  highlight
>  debug
>
>  
>
> That tie up with your config/ I'd bascially interpreted the current
> packaging as... What used to be locallucene has deffo merged into
> lucene-spatial in this build, no more locallucene. However, you still
> need to build localsolr for now...
>
> My solr jars are:
>
> commons-beanutils-1.8.0.jar   commons-logging-1.1.1.jar
> localsolr-1.5.2-rc1.jar  lucene-misc-2.9.1-ki-rc3.jar
>serializer-2.7.1.jar   stax-1.2.0.jar
>  xml-apis-1.3.04.jar
> commons-codec-1.4.jar commons-pool-1.5.3.jar
> log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar
>slf4j-api-1.5.5.jarstax-api-1.0.jar
>  xpp3-1.1.3.4.O.jar
> commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar
> lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar
>slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar
> commons-fileupload-1.2.1.jar  geronimo-stax-api_1.0_spec-1.0.1.jar
> lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar
>solr-commons-csv-1.4.0-ki-rc1.jar  woodstox-wstx-asl-3.2.7.jar
> commons-httpclient-3.1.jargt2-referencing-2.3.1.jar
> lucene-highlighter-2.9.1-ki-rc3.jar
> lucene-spellchecker-2.9.1-ki-rc3.jar  solr-core-1.4.0-ki-rc1.jar
>  xalan-2.7.1.jar
> commons-io-1.3.2.jar  jsr108-0.01.jar
> lucene-memory-2.9.1-ki-rc3.jar
> org.codehaus.woodstox-wstx-asl-3.2.7.jar  solr-solrj-1.4.0-ki-rc1.jar
>  xercesImpl-2.9.1.jar
>
> Sorry for dumping the info at you... hope it helps tho
>
> Ian.
>
> 2009/11/13 Bertie Shen :
> > Hey,
> >
> >   I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
> > search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
> > points to pretty old documentation. Is there a better document I refer to
> > for the setting up of LocalSolr and some performance analysis?
> >
> >   Just sync-ed Solr codebase and found LocalSolr is still NOT in the
> > contrib package. Do we have a plan to incorporate it? I download a
> LocalSolr
> > lib localsolr-1.5.jar from
> > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and
> notice
> > that the namespace is com.pjaol.search. blah blah, while LocalLucene
> package
> > is in Lucene codebase and the package name is org.apache.lucene.spatial
> blah
> > blah.
> >
> >   But localsolr-1.5.jar from from
> > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/  does
> not
> > work with lucene-spatial-3.0-dev.jar I build from Lucene codebase
> directly.
> > After I restart tomcat, I could not load solr admin page. The error is as
> > follows. It looks solr is still looking for
> > old named classes.
> >
> >  Thanks.
> >
> > HTTP Status 500 - Severe errors in solr configuration. Check your log
> files
> > for more detailed information on what may be wrong. If you want solr to
> > continue after configuration errors, change:
> > false in null
> > -
> > java.lang.NoClassDefFoundError:
> > com/pjaol/search/geo/utils/DistanceFilter at
> java.lang.Class.forName0(Native
> > Method) at java.lang.Class.forName(Class.java:247) at
> >
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
> > at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
> > org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
> > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
> > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
> > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
> > org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at
> > org.apache.solr.core.SolrCore.(SolrCore.java:551) at
> >
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> > at
> >
> org.apache.solr.servlet.SolrDispatch

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-13 Thread yountod


The process initially completes with:

  2009-11-13 09:40:46 
  Indexing completed. Added/Updated: 20 documents. Deleted
0 documents. 


...but then it fails with:

  2009-11-13 09:40:46 
  Indexing failed. Rolled back all changes. 
  2009-11-13 09:41:10 
  2009-11-13 09:41:10 
  2009-11-13 09:41:10 



I think it may have something to do with this, which I found by using the
DataImport.jsp:

(Thread.java:636) Caused by: java.sql.SQLException: Illegal value for
setFetchSize(). at
com.mysql.jdbc.Statement.setFetchSize(Statement.java:1864) at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:242)
... 28 more 



-- 
View this message in context: 
http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26340360.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to search against multiple attributes in the index


great. thanks. that was helpful

Avlesh Singh wrote:
> 
>>
>> you can do it using
>> solrQuery.setFilterQueries() and build AND queries of multiple
>> parameters.
>>
> Nope. You would need to read more -
> http://wiki.apache.org/solr/FilterQueryGuidance
> 
> For your impatience, here's a quick starter -
> 
> #and between two fields
> solrQuery.setQuery("+field1:foo +field2:bar");
> 
> #or between two fields
> solrQuery.setQuery("field1:foo field2:bar");
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev 
> wrote:
> 
>>
>> I think I found the answer. needed to read more API documentation :-)
>>
>> you can do it using
>> solrQuery.setFilterQueries() and build AND queries of multiple
>> parameters.
>>
>>
>> Avlesh Singh wrote:
>> >
>> > For a starting point, this might be a good read -
>> >
>> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
>> >
>> > Cheers
>> > Avlesh
>> >
>> > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev 
>> > wrote:
>> >
>> >>
>> >> I already did  dive in before. I am using solrj API and SolrQuery
>> object
>> >> to
>> >> build query. but its not clear/written how to build booleanQuery
>> ANDing
>> >> bunch of different attributes in the index. Any samples please?
>> >>
>> >> Avlesh Singh wrote:
>> >> >
>> >> > Dive in - http://wiki.apache.org/solr/Solrj
>> >> >
>> >> > Cheers
>> >> > Avlesh
>> >> >
>> >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev 
>> >> wrote:
>> >> >
>> >> >>
>> >> >> I want to build AND search query against field1 AND field2 etc.
>> Both
>> >> >> these
>> >> >> fields are stored in an index. I am migrating lucene code to Solr.
>> >> >> Following
>> >> >> is my existing lucene code
>> >> >>
>> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery();
>> >> >>
>> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST);
>> >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery));
>> >> >>
>> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term
>> >> >> ("techGroup",searchForm.getTechGroup()));
>> >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
>> >> >> TermQuery searchProgramQyery = new TermQuery(new
>> >> >> Term("techProgram",searchForm.getTechProgram()));
>> >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
>> >> >> }
>> >> >>
>> >> >> What's the equivalent Solr code for above Luce code. Any samples
>> would
>> >> be
>> >> >> appreciated.
>> >> >>
>> >> >> Thanks,
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
>> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: scanning folders recursively / Tika

Peter - if you want, download the code from Lucene in Action 1 or 2, it has 
index traversal and indexing.  2nd edition uses Tika.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Peter Gabriel 
> To: solr-user@lucene.apache.org
> Sent: Fri, November 13, 2009 10:26:48 AM
> Subject: scanning folders recursively / Tika
> 
> Hello.
> 
> I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
> Is there a comfortable way to scan folders recursively with an existing class 
> or 
> have i to write it myself? 
> 
> Any tips for best practise?
> 
> Greetings, Peter
> -- 
> Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
> sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

Re: Customizing Field Score (Multivalued Field)

2009-11-13 Thread Stephen Duncan Jr

On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr  wrote:

> On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter  > wrote:
>
>>
>> oh man, so you were parsing the Stored field values of every matching doc
>> at query time? ouch.
>>
>> Assuming i'm understanding your goal, the conventional way to solve this
>> type of problem is "payloads" ... you'll find lots of discussion on it in
>> the various Lucene mailing lists, and if you look online Michael Busch has
>> various slides that talk about using them.  they let you say things
>> like "in this document, at this postion of field 'x' the word 'microsoft'
>> is worth 37.4, but at this other position (or in this other document)
>> 'microsoft' is only worth 17.2"
>>
>> The simplest way to use them in Solr (as i understand it) is to use
>> soemthing like the DelimitedPayloadTokenFilterFactory when indexing, and
>> then write yourself
>> a simple little custom QParser that generates a BoostingTermQuery on your
>> field.
>>
>> should be a lot simpler to implement then the Query you are describing,
>> and much faster.
>>
>>
>> -Hoss
>>
>>
> Thanks. I finally got around to looking at this again today and was looking
> at a similar path, so I appreciate the confirmation.
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>

For posterity, here's the rest of what I discovered trying to implement
this:

You'll need to write a PayloadSimilarity as described here:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/(here's
my updated version due to deprecation of the method mentioned in
that article):

@Override
public float scorePayload(
int docId,
String fieldName,
int start,
int end,
byte[] payload,
int offset,
int length)
{
// can ignore length here, because we know it is encoded as 4 bytes
return PayloadHelper.decodeFloat(payload, offset);
}

You'll need to register that similarity in your Solr schema.xml (was hard to
figure out, as I didn't realize that the similarity has to be applied
globally to the writer/search used generally, even though I only care about
payloads on one field, so I wasted time trying to figure out how to plug in
the similarity in my query parser).

You'll want to use the "payloads" type or something based on it that's in
the example schema.xml.

The latest and greatest query type to use is PayloadTermQuery.  I use it in
my custom query parser class, overriding getFieldQuery, checking for my
field name, and then:

 return new PayloadTermQuery(new Term(field, queryText),
new AveragePayloadFunction());

Due to the global nature of the Similarity, I guess you'd have to modify it
to look at the field name and base behavior on that if you wanted different
kinds of payloads on different fields in one schema.

Also, whereas in my original implementation, I controlled the score
completely, and therefore if I set a score of 0.8, the doc came back as
score of 0.8, in this technique the payload is just used as a boost/addition
to the score, so my scores came out higher than before.  Since they're still
in the same relative order, that still satisfied my needs, but did require
updating my test cases.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Making search results more stable as index is updated

2009-11-13 Thread Chris Harris

If documents are being added to and removed from an index (and commits
are being issued) while a user is searching, then the experience of
paging through search results using the obvious solr mechanism
(&start=100&Rows=10) may be disorienting for the user. For one
example, by the time the user clicks "next page" for the first time, a
document that they saw on page 1 may have been pushed onto page 2.
(This may be especially pronounced if docs are being sorted by date.)

I'm wondering what are the best options available for presenting a
more stable set of search results to users in such cases. The obvious
candidates to me are:

#1: Cache results in the user session of the web tier. (In particular,
maybe just cache the uniqueKey of each maching document.)

  Pro: Simple
  Con: May require capping the # of search results in order to make
the initial query (which now has Solr numRows param >> web pageSize)
fast enough. For example, maybe it's only practical to cache the first
500 records.

#2: Create some kind of per-user results cache in Solr. (One simple
implementation idea: You could make your Solr search handler take a
userid parameter, and cache each user's last search in a special
per-user results cache. You then also provide an API that says, "give
me records n through m of userid #1334's last search". For your
subsequent queries, you consult the latter API rather than redoing
your search. Because Lucene docids are unstable across commits and
such, I think this means caching the uniqueKey of each maching
document. This in turn means looking up the uniqueKey of each maching
document at search time. It also means you can't use the existing Solr
caches, but need to make a new one.)

  Pro: Maybe faster than #1?? (Saves on data transfer between Solr and
web tier, at least during the initial query.)
  Con: More complicated than #1.

#3: Use filter queries to attempt to make your subsequent queries (for
page 2, page 3, etc.) return results consistent with your original
query. (One idea is to give each document a docAddedTimestamp field,
which would have precision down to the millisecond or something. On
your initial query, you could note the current time, T. Then for the
subsequent queries you add a filter query for docAddedTimestamp<=T.
Hopefully with a trie date field this would be fast. This should
hopefully keep any docs newly added after T from showing up in the
user's search results as they page through them. However, it won't
necessarily protect you from docs that were *reindexed* (i.e. re-add a
doc with the same uniqueKey as an existing doc) or docs that were
deleted.)

  Pro: Doesn't require a new cache, and no cap on # of search results
  Con: Maybe doesn't provide total stability.

Any feedback on these options? Are there other ideas to consider?

Thanks,
Chris

Re: having solr generate and execute other related queries automatically

2009-11-13 Thread gdeconto

tpunder wrote:
> 
> Maybe I misunderstand what you are trying to do (or the facet.query
> feature).  If I did an initial query on my data-set that left me with the
> following questions:
> ...
> http://localhost:8983/solr/select/?q=*%3A*&start=0&rows=0&facet=on&facet.query=brand_id:1&facet.query=brand_id:2&facet.query=+%2Bbrand_id:5+%2Bcategory_id:4051
> ...
> 

Thanks for the reply Tim.

I can't provide you with an example as I dont have anything prototyped as
yet; I am still trying to work things thru in my head.  The +20 queries
would allow us to suggest other possibilities to users in a facet-like way
(but not returning the exact same info as facets).

With the technique you mention I would have to specify the list of query
params for each facet.query.  That would work for relatively simple queries. 
Unfortunately, the queries I was looking at doing would be fairly long (say
hundreds of AND/OR statements).   That said, I dont think solr would be able
to handle the query size I would end up with (at least not efficiently),
because the resulting query would consist of thousands of AND/OR statements
(isnt there a limit of sorts in Solr?)

I think that my best bet would be to extend the SearchComponent and perform
the additional query generation and execution in the extension.  That
approach should also allow me to have access to the facet values that the
base query would generate (which would allow me to generate and execute the
other queries).

thx again.
-- 
View this message in context: 
http://old.nabble.com/having-solr-generate-and-execute-other-related-queries-automatically-tp26327032p26343409.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Multicore solr.xml schemaName parameter not being recognized


: On the CoreAdmin wiki page.  thanks 

FWIW: The only time the string "schemaName" appears on the CoreAdmin wiki 
page is when it mentions that "solr.core.schemaName" is a property that is 
available to cores by default.

the documentation for  specificly says...

>> The  tag accepts the following attributes:
   ...
>>  * schema - The schema file name for a given core. The default is 
   ...

So the documentation is correct.


-Hoss

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn


Mark Miller  wrote on 11/12/2009 07:18:03 PM:
> Ah, the pains of optimization. Its kind of just how it is. One solution
> is to use two boxes and replication - optimize on the master, and then
> queries only hit the slave. Out of reach for some though, and adds many
> complications.

Yes, in my use case 2 boxes isn't a great option.


> Another kind of option is to use the partial optimize feature:
>
>  
>
> Using this, you can optimize down to n segments and take a shorter hit
> each time.

Is this a 1.4 feature?  I'm planning to migrate to 1.4, but it'll take a
while since
I have to port custom code forward, including a query parser.


> Also, if optimizing is so painful, you might lower the merge factor
> amortize that pain better. Thats another way to slowly get there - if
> you lower the merge factor, as merging takes place, the new merge factor
> will be respected, and semgents will merge down. A merge factor of 2
> (the lowest) will make it so you only ever have 2 segments. Sometimes
> that works reasonably well - you could try 3-6 or something as well.
> Then when you do your partial optimizes (and eventually a full optimize
> perhaps), you want have so far to go.

So this will slow down indexing but speed up optimize somewhat?
Unfortunately
right now I lose docs I'm indexing, as well slowing searching to a crawl.
Ugh.

I've got plenty of CPU horsepower.  This is where having the ability to
optimize
on another filesystem would be useful.

Would it perhaps make sense to set up a master/slave on the same machine?
Then
I suppose I can have an index being optimized that might not clobber the
search.
Would new indexed items still be dropped on the floor?

Thanks,
Jerry

Re: Stop solr without losing documents


: which documents have been updated before a successful commit.  Now
: stopping solr is as easy as kill -9.

please don't kill -9 ... it's grossly overkill, and doesn't give your 
servlet container a fair chance to cleanthings up.  A lot of work has been 
done to make Lucene indexes robust to hard terminations of the JVM (or 
physical machine) but there's no reason to go out of your way to try and 
stab it in the heart when you could just shut it down cleanly.

that's not to say your appraoch isn't a good one -- if you only have one 
client sending updates/commits then having it keep track of what was 
indexed prior to the lasts successful commit is a viable way to dela with 
what happens if solr stops responding (either because you shut it down, or 
because it crashed for some other reason).

Alternately, you could take advantage of the "enabled" feature from your 
client (just have it test the enabled url ever N updates or so) and when 
it sees that you have disabled the port it can send one last commit and 
then stop sending updates until it sees the enabled URL work againg -- as 
soon as you see the updates stop, you can safely shutdown hte port.


-Hoss

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn

ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
>
> On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
>  wrote:
> > I think we sorely need a Directory impl that down-prioritizes IO
> > performed by merging.
>
> It's unclear if this case is caused by IO contention, or the OS cache
> of the hot parts of the index being lost by that extra IO activity.
> Of course the latter would lead to the former, but without that OS
> disk cache, the searches may be too slow even w/o the extra IO.

Is there a way to configure things so that search and new data indexing
get cached under the control of solr/lucene?  Then we'd be less reliant
on the OS behavior.

Alternatively if there are OS params I can tweak (RHEL/Centos 5)
to solve the problem, that's an option for me.

Would you know if 1.4 is better behaved than 1.3?

Thanks,
Jerry

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn

ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:

> On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
>  wrote:
> > I think we sorely need a Directory impl that down-prioritizes IO
> > performed by merging.
>
> It's unclear if this case is caused by IO contention, or the OS cache
> of the hot parts of the index being lost by that extra IO activity.
> Of course the latter would lead to the former, but without that OS
> disk cache, the searches may be too slow even w/o the extra IO.

On linux there's the ionice command to try to throttle processes.  Would it
be possible and make sense to have a separate process for optimizing that
had ionice set it to idle?  Can the index be shared this way?

Thanks,
Jerry

Re: NPE when trying to view a specific document via Luke


: I'm seeing this stack trace when I try to view a specific document, e.g. 
: /admin/luke?id=1 but luke appears to be working correctly when I just 

FWIW: I was able to reproduce this using the example setup (i picked a 
doc id at random)  suspecting it was a bug in docFreq when using multiple 
segments, i tried optimizing and still got an NPE, but then my entire 
computer crashed (unrelated) before i could look any deeper.

I have to go out now, but i'll try to dig into this more when i get back 
... given where it happens in the code, it seems like a potentially 
serious lucene bug (either that: or LukeRequestHandler is doing something 
it really shouldn't be, but i can't imagine how it could trigger an NPE 
that deep in the lucene code)



: view /admin/luke. Does this look familiar to anyone? Our sysadmin just 
: upgraded us to the 1.4 release, I'm not sure if this occurred before 
: that.
: 
: Thanks,
: Jake
: 
: 1. java.lang.NullPointerException
: 2. at org.apache.lucene.index.TermBuffer.set(TermBuffer.java:95)
: 3. at 
org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158)
: 4. at 
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
: 5. at 
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
: 6. at 
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975)
: 7. at 
org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627)
: 8. at 
org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
: 9. at 
org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:248)
: 10.at 
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:124)
: 11.at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
: 12.at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
: 13.at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
: 14.at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
: 15.at 
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
: 16.at 
com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
: 17.at 
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
: 18.at 
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
: 19.at 
com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
: 20.at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
: 21.at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
: 22.at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
: 23.at java.lang.Thread.run(Thread.java:619)
: 24.
: 25. Date: Fri, 13 Nov 2009 02:19:54 GMT
: 26. Server: Apache/2.2.3 (Red Hat)
: 27. Cache-Control: no-cache, no-store
: 28. Pragma: no-cache
: 29. Expires: Sat, 01 Jan 2000 01:00:00 GMT
: 30. Content-Type: text/html; charset=UTF-8
: 31. Vary: Accept-Encoding,User-Agent
: 32. Content-Encoding: gzip
: 33. Content-Length: 1066
: 34. Connection: close
: 35.
: 



-Hoss

Re: Request assistance with distributed search multi shard/core setup and configuration

DS requires a bunch of shard names in the url. That's all. Note that a
ds does not use the data of the solr you call.

You can create an entry point for your distributed search by adding a
new  element in solrconfig.xml. You would add the
shard list parameter to the "defaults" list. Do not have it call the
same requesthandler path- you'll get an infinite loop.

On Tue, Nov 10, 2009 at 6:44 PM, Otis Gospodnetic
 wrote:
> Hm, I don't follow.  You don't need to create a custom (request) handler to 
> make use of Solr's distributed search.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: "Turner, Robbin J" 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Tue, November 10, 2009 6:41:32 PM
>> Subject: RE: Request assistance with distributed search multi shard/core   
>> setup and configuration
>>
>> Thanks, I had already read through this url.  I guess my request was is 
>> there a
>> way to setup something that is already part of solr itself to pass the
>> URL[shard...] then having create a custom handler.
>>
>> thanks
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Tuesday, November 10, 2009 6:09 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Request assistance with distributed search multi shard/core 
>> setup
>> and configuration
>>
>> Right, that's http://wiki.apache.org/solr/DistributedSearch
>>
>> Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> - Original Message 
>> > From: "Turner, Robbin J"
>> > To: "solr-user@lucene.apache.org"
>> > Sent: Tue, November 10, 2009 6:05:19 PM
>> > Subject: RE: Request assistance with distributed search multi
>> > shard/core  setup and configuration
>> >
>> > I've already done the single Solr, that's why my request.  I read on
>> > some site that there is a way to setup the configuration so I can send
>> > a query to one solr instance and have it pass it on or distribute it across
>> all the instances?
>> >
>> > Btw, thanks for the quick reply.
>> > RJ
>> >
>> > -Original Message-
>> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> > Sent: Tuesday, November 10, 2009 6:02 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Request assistance with distributed search multi
>> > shard/core setup and configuration
>> >
>> > RJ,
>> >
>> > You may want to take a simpler step - single Solr core (no solr.xml
>> > needed) per machine.  Then distributed search really only requires
>> > that you specify shard URLs in the URL of the search requests.  In
>> > practice/production you rarely benefit from distributed search against
>> > multiple cores on the same server anyway.
>> >
>> > Otis
>> > --
>> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> >
>> >
>> >
>> >
>> > 
>> > From: "Turner, Robbin J"
>> > To: "solr-user@lucene.apache.org"
>> > Sent: Tue, November 10, 2009 5:58:52 PM
>> > Subject: Request assistance with distributed search multi shard/core
>> > setup and configuration
>> >
>> > I've been looking through all the documentation.  I've set up a single
>> > solr instance, and one multicore instance.  If someone would be
>> > willing to share some configuration examples and/or advise for setting
>> > up solr for distributing the search, I would really appreciate it.
>> > I've read that there is a way to do it, but most of the current
>> > documentation doesn't provide enough example on what to do with
>> > solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for the servlet
>> container.  I deployed the solr 1.4.0 released yesterday.
>> >
>> > Thanks
>> > RJ
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: NPE when trying to view a specific document via Luke

2009-11-13 Thread Yonik Seeley

On Fri, Nov 13, 2009 at 5:41 PM, Chris Hostetter
 wrote:
> : I'm seeing this stack trace when I try to view a specific document, e.g.
> : /admin/luke?id=1 but luke appears to be working correctly when I just
>
> FWIW: I was able to reproduce this using the example setup (i picked a
> doc id at random)  suspecting it was a bug in docFreq

Probably just a null being passed in the text part of the term.
I bet Luke expects all field values to be strings, but some are binary.

-Yonik
http://www.lucidimagination.com

Fwd: Lucene MMAP Usage with Solr

2009-11-13 Thread ST ST

Folks,

I am trying to get Lucene MMAP to work in solr.

I am assuming that when I configure MMAP the entire index will be loaded
into RAM.
Is that the right assumption ?

I have tried the following ways for using MMAP:

Option 1. Using the solr config below for MMAP configuration

-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

   With this config, when I start solr with a 30G index, I expected that the
RAM usage should go up, but it did not.

Option 2. By Code Change
I made the following code change :

   Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead
of FSDirectory.
   Code snippet pasted below.


Could you help me to understand if these are the right way to use MMAP?

Thanks much
/ST.

Code SNippet for Option 2:

package org.apache.solr.core;
/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.File;
import java.io.IOException;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;

/**
 * Directory provider which mimics original Solr FSDirectory based behavior.
 *
 */
public class StandardDirectoryFactory extends DirectoryFactory {

  public Directory open(String path) throws IOException {
return MMapDirectory.open(new File(path));
  }
}

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Peter Wolanin

Thanks for the link - there doesn't seem a be a fix version specified,
so I guess this will not officially ship with lucene 2.9?

-Peter

On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir  wrote:
> Peter, here is a project that does this:
> http://issues.apache.org/jira/browse/LUCENE-1488
>
>
>> That's kind of interesting - in general can I build a custom tokenizer
>> from existing tokenizers that treats different parts of the input
>> differently based on the utf-8 range of the characters?  E.g. use a
>> porter stemmer for stretches of Latin text and n-gram or something
>> else for CJK?
>>
>> -Peter
>>
>> On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic
>>  wrote:
>> > Yes, that's the n-gram one.  I believe the existing CJK one in Lucene is
>> really just an n-gram tokenizer, so no different than the normal n-gram
>> tokenizer.
>> >
>> > Otis
>> > --
>> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> >
>> >
>> >
>> > - Original Message 
>> >> From: Peter Wolanin 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Tue, November 10, 2009 7:34:37 PM
>> >> Subject: Re: any docs on solr.EdgeNGramFilterFactory?
>> >>
>> >> So, this is the normal N-gram one?  NGramTokenizerFactory
>> >>
>> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the
>> >> Solr codebase:
>> >>
>> >>
>> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
>> >>
>> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html
>> >>
>> >> The CJK one uses the lucene CJKTokenizer
>> >>
>> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html
>> >>
>> >> and there seems to be another one even that no one has wrapped into
>> Solr:
>> >>
>> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html
>> >>
>> >> So seems like the existing options are a little better than I thought,
>> >> though it would be nice to have some docs on properly configuring
>> >> these.
>> >>
>> >> -Peter
>> >>
>> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
>> >> wrote:
>> >> > Peter,
>> >> >
>> >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but
>> just
>> >> n-grams.
>> >> > Before you take the n-gram route, you may want to look at the smart
>> Chinese
>> >> analyzer in Lucene contrib (I think it works only for Simplified
>> Chinese) and
>> >> Sen (on java.net).  I also spotted a Korean analyzer in the wild a few
>> months
>> >> back.
>> >> >
>> >> > Otis
>> >> > --
>> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> >> >
>> >> >
>> >> >
>> >> > - Original Message 
>> >> >> From: Peter Wolanin
>> >> >> To: solr-user@lucene.apache.org
>> >> >> Sent: Tue, November 10, 2009 4:06:52 PM
>> >> >> Subject: any docs on solr.EdgeNGramFilterFactory?
>> >> >>
>> >> >> This fairly recent blog post:
>> >> >>
>> >> >>
>> >>
>> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>> >> >>
>> >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
>> >> >> for the index.  I don't see any mention of that tokenizer on the Solr
>> >> >> wiki - is it just waiting to be added, or is there any other
>> >> >> documentation in addition to the blog post?  In particular, there was
>> >> >> a thread last year about using an N-gram tokenizer to enable
>> >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to
>> >> >> know how people are configuring their schema (with this tokenizer?)
>> >> >> for that use case.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Peter
>> >> >>
>> >> >> --
>> >> >> Peter M. Wolanin, Ph.D.
>> >> >> Momentum Specialist,  Acquia. Inc.
>> >> >> peter.wola...@acquia.com
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Peter M. Wolanin, Ph.D.
>> >> Momentum Specialist,  Acquia. Inc.
>> >> peter.wola...@acquia.com
>> >
>> >
>>
>>
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wola...@acquia.com
>>
>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Reseting doc boosts

2009-11-13 Thread Koji Sekiguchi


I'm not sure this is what you are looking for,
but there is FieldNormModifier tool in Lucene.

Koji

--

http://www.rondhuit.com/en/


Avlesh Singh wrote:

AFAIK there is no way to "reset" the doc boost. You would need to re-index.
Moreover, there is no way to "search by boost".

Cheers
Avlesh

On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer  wrote:

  

Hi,

Im trying to figure out if there is an easy way to basically "reset" all of
any doc boosts which you have made (for analytical purposes) ... for example
if I run an index, gather report, doc boost on the report, and reset the
boosts @ time of next index ...

It would seem to be from just knowing how Lucene works that I would really
need to reindex since its a attrib on the doc itself which would have to be
modified, but there is no easy way to query for docs which have been boosted
either.  Any insight?

Thanks.

- Jon

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Robert Muir

ah, thanks, i'll tentatively set one in the future, but definitely not 2.9.x

more just to show you the idea, you can do different things depending on
different runs of writing systems in text.
but it doesnt solve everything: you only know its Latin script, not english,
so you can't safely automatically do anything like stemming.

say your content is only chinese, english:

the analyzer won't know your latin script text is english, versus say,
french from the unicode, so it won't stem it.
but that analyzer will lowercase it. it won't know if your ideographs are
chinese or japanese, but it will use n-gram tokenization, you get the drift.

in that impl, it puts the script code in the flags so downstream you could
do something like stemming if you happen to know more than is evident from
the unicode.

On Fri, Nov 13, 2009 at 6:23 PM, Peter Wolanin wrote:

> Thanks for the link - there doesn't seem a be a fix version specified,
> so I guess this will not officially ship with lucene 2.9?
>
> -Peter
>
> On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir  wrote:
> > Peter, here is a project that does this:
> > http://issues.apache.org/jira/browse/LUCENE-1488
> >
> >
> >> That's kind of interesting - in general can I build a custom tokenizer
> >> from existing tokenizers that treats different parts of the input
> >> differently based on the utf-8 range of the characters?  E.g. use a
> >> porter stemmer for stretches of Latin text and n-gram or something
> >> else for CJK?
> >>
> >> -Peter
> >>
> >> On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic
> >>  wrote:
> >> > Yes, that's the n-gram one.  I believe the existing CJK one in Lucene
> is
> >> really just an n-gram tokenizer, so no different than the normal n-gram
> >> tokenizer.
> >> >
> >> > Otis
> >> > --
> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >> >
> >> >
> >> >
> >> > - Original Message 
> >> >> From: Peter Wolanin 
> >> >> To: solr-user@lucene.apache.org
> >> >> Sent: Tue, November 10, 2009 7:34:37 PM
> >> >> Subject: Re: any docs on solr.EdgeNGramFilterFactory?
> >> >>
> >> >> So, this is the normal N-gram one?  NGramTokenizerFactory
> >> >>
> >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the
> >> >> Solr codebase:
> >> >>
> >> >>
> >>
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
> >> >>
> >>
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html
> >> >>
> >> >> The CJK one uses the lucene CJKTokenizer
> >> >>
> >>
> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html
> >> >>
> >> >> and there seems to be another one even that no one has wrapped into
> >> Solr:
> >> >>
> >>
> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html
> >> >>
> >> >> So seems like the existing options are a little better than I
> thought,
> >> >> though it would be nice to have some docs on properly configuring
> >> >> these.
> >> >>
> >> >> -Peter
> >> >>
> >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
> >> >> wrote:
> >> >> > Peter,
> >> >> >
> >> >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but
> >> just
> >> >> n-grams.
> >> >> > Before you take the n-gram route, you may want to look at the smart
> >> Chinese
> >> >> analyzer in Lucene contrib (I think it works only for Simplified
> >> Chinese) and
> >> >> Sen (on java.net).  I also spotted a Korean analyzer in the wild a
> few
> >> months
> >> >> back.
> >> >> >
> >> >> > Otis
> >> >> > --
> >> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >> >> >
> >> >> >
> >> >> >
> >> >> > - Original Message 
> >> >> >> From: Peter Wolanin
> >> >> >> To: solr-user@lucene.apache.org
> >> >> >> Sent: Tue, November 10, 2009 4:06:52 PM
> >> >> >> Subject: any docs on solr.EdgeNGramFilterFactory?
> >> >> >>
> >> >> >> This fairly recent blog post:
> >> >> >>
> >> >> >>
> >> >>
> >>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> >> >> >>
> >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the
> tokenizer
> >> >> >> for the index.  I don't see any mention of that tokenizer on the
> Solr
> >> >> >> wiki - is it just waiting to be added, or is there any other
> >> >> >> documentation in addition to the blog post?  In particular, there
> was
> >> >> >> a thread last year about using an N-gram tokenizer to enable
> >> >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious
> to
> >> >> >> know how people are configuring their schema (with this
> tokenizer?)
> >> >> >> for that use case.
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Peter
> >> >> >>
> >> >> >> --
> >> >> >> Peter M. Wolanin, Ph.D.
> >> >> >>

Re: NPE when trying to view a specific document via Luke


: > FWIW: I was able to reproduce this using the example setup (i picked a
: > doc id at random) �suspecting it was a bug in docFreq
: 
: Probably just a null being passed in the text part of the term.
: I bet Luke expects all field values to be strings, but some are binary.

I'm not sure i follow you ... i think you saying that naive assumptions in 
the LukeRequestHandler could result in it asking for the docFreq of a term 
that has a null string value because some field types are binary, except 
that...

 1) 1.3 didn't have this problem
 2) LukeRequestHandler.getDocumentFieldsInfo didn't change from 1.3 to 1.4

I tied to reproduce this in 1.4 using an index/configs created with 1.3, 
but i got a *different* NPE when loading this url...

   http://localhost:8983/solr/admin/luke?id=SP2514N

SEVERE: java.lang.NullPointerException
at 
org.apache.solr.util.NumberUtils.SortableStr2int(NumberUtils.java:127)
at 
org.apache.solr.util.NumberUtils.SortableStr2float(NumberUtils.java:83)
at 
org.apache.solr.util.NumberUtils.SortableStr2floatStr(NumberUtils.java:89)
at 
org.apache.solr.schema.SortableFloatField.indexedToReadable(SortableFloatField.java:62)
at 
org.apache.solr.schema.SortableFloatField.toExternal(SortableFloatField.java:53)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:245)

...all three of these stack traces seem to suggest that some imple of 
Fieldable.stringValue in 2.9 is returning null in cases where it returned 
*something* else in the 2.4-dev jar used by Solr 1.3.

That seems like it could have other impacts besides LukeRequestHandler.


-Hoss

Re: NPE when trying to view a specific document via Luke


: I tied to reproduce this in 1.4 using an index/configs created with 1.3, 
: but i got a *different* NPE when loading this url...

I should have tried a simpler test ...  iget NPE's just trying to execute 
a simple search for *:* when i try to use the example index built 
in 1.3  (with the 1.3 configs) in 1.4.  same (apparent) cause: code is 
attempting to deref a string returned by Fieldable.stringValue() which is 
null... 

java.lang.NullPointerException
at 
org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311)
at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483)
at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420)
at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457)
at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520)
at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130)
at 
org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325)

This really does smell like something in Lucene changed behavior 
drasticly.  I've been looking at diffs from java/tr...@691741 and 
java/tags/lucene_2_9_1 but nothing jumps out at me that would explain 
this.

If nothing else, i'm opening a solr issue...



-Hoss

StreamingUpdateSolrServer commit?

2009-11-13 Thread erikea...@yahoo.com


When does  StreamingUpdateSolrServer commit?

I know there's a threshhold and thread pool as params but I don't see a commit 
timeout.   Do I have to manage this myself?

Re: exclude some fields from copying dynamic fields | schema.xml

There is no direct way.

Let's say you have a "nocopy_s" and you do not want a copy
"nocopy_str_s". This might work: declare "nocopy_str_s" as a field and
make it not indexed and not stored. I don't know if this will work.

It requires two overrides to work: 1) that declaring a field name that
matches a wildcard will override the default wildcard rule, and 2)
that "stored=false indexed=false" works.

On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev
 wrote:
>
> Hi,
> we are using the following entry in schema.xml to make a copy of one type of
> dynamic field to another :
> 
>
> Is it possible to exclude some fields from copying.
>
> We are using Solr1.3
>
> ~Vikrant
>
> --
> View this message in context: 
> http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: Reseting doc boosts

This looks exactly like what I was needing ... this looks like it would be a 
great tool / addition to Solr web interface but it looks like it only takes 
(Directory d, Similarity s) (vs. subset collection of documents) ...

Either way great find, thanks for your help ...

- Jon

On Nov 13, 2009, at 6:40 PM, Koji Sekiguchi wrote:

> I'm not sure this is what you are looking for,
> but there is FieldNormModifier tool in Lucene.
> 
> Koji
> 
> -- 
> 
> http://www.rondhuit.com/en/
> 
> 
> Avlesh Singh wrote:
>> AFAIK there is no way to "reset" the doc boost. You would need to re-index.
>> Moreover, there is no way to "search by boost".
>> 
>> Cheers
>> Avlesh
>> 
>> On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer  wrote:
>> 
>>  
>>> Hi,
>>> 
>>> Im trying to figure out if there is an easy way to basically "reset" all of
>>> any doc boosts which you have made (for analytical purposes) ... for example
>>> if I run an index, gather report, doc boost on the report, and reset the
>>> boosts @ time of next index ...
>>> 
>>> It would seem to be from just knowing how Lucene works that I would really
>>> need to reindex since its a attrib on the doc itself which would have to be
>>> modified, but there is no easy way to query for docs which have been boosted
>>> either.  Any insight?
>>> 
>>> Thanks.
>>> 
>>> - Jon
>>>
>> 
>>  
>

Re: Making search results more stable as index is updated

This is one case where permanent caches are interesting. Another case
is highlighting: in some cases highlighting takes a lot of work, and
this work is not cached.

It might be a cleaner architecture to have session-maintaining code in
a separate front-end app, and leave Solr session-free.

On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris  wrote:
> If documents are being added to and removed from an index (and commits
> are being issued) while a user is searching, then the experience of
> paging through search results using the obvious solr mechanism
> (&start=100&Rows=10) may be disorienting for the user. For one
> example, by the time the user clicks "next page" for the first time, a
> document that they saw on page 1 may have been pushed onto page 2.
> (This may be especially pronounced if docs are being sorted by date.)
>
> I'm wondering what are the best options available for presenting a
> more stable set of search results to users in such cases. The obvious
> candidates to me are:
>
> #1: Cache results in the user session of the web tier. (In particular,
> maybe just cache the uniqueKey of each maching document.)
>
>  Pro: Simple
>  Con: May require capping the # of search results in order to make
> the initial query (which now has Solr numRows param >> web pageSize)
> fast enough. For example, maybe it's only practical to cache the first
> 500 records.
>
> #2: Create some kind of per-user results cache in Solr. (One simple
> implementation idea: You could make your Solr search handler take a
> userid parameter, and cache each user's last search in a special
> per-user results cache. You then also provide an API that says, "give
> me records n through m of userid #1334's last search". For your
> subsequent queries, you consult the latter API rather than redoing
> your search. Because Lucene docids are unstable across commits and
> such, I think this means caching the uniqueKey of each maching
> document. This in turn means looking up the uniqueKey of each maching
> document at search time. It also means you can't use the existing Solr
> caches, but need to make a new one.)
>
>  Pro: Maybe faster than #1?? (Saves on data transfer between Solr and
> web tier, at least during the initial query.)
>  Con: More complicated than #1.
>
> #3: Use filter queries to attempt to make your subsequent queries (for
> page 2, page 3, etc.) return results consistent with your original
> query. (One idea is to give each document a docAddedTimestamp field,
> which would have precision down to the millisecond or something. On
> your initial query, you could note the current time, T. Then for the
> subsequent queries you add a filter query for docAddedTimestamp<=T.
> Hopefully with a trie date field this would be fast. This should
> hopefully keep any docs newly added after T from showing up in the
> user's search results as they page through them. However, it won't
> necessarily protect you from docs that were *reindexed* (i.e. re-add a
> doc with the same uniqueKey as an existing doc) or docs that were
> deleted.)
>
>  Pro: Doesn't require a new cache, and no cap on # of search results
>  Con: Maybe doesn't provide total stability.
>
> Any feedback on these options? Are there other ideas to consider?
>
> Thanks,
> Chris
>



-- 
Lance Norskog
goks...@gmail.com

Re: StreamingUpdateSolrServer commit?

Unless I slept through it, you still need to explicitly commit, even with SUSS.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "erikea...@yahoo.com" 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, November 13, 2009 9:43:53 PM
> Subject: StreamingUpdateSolrServer commit?
> 
> 
> When does  StreamingUpdateSolrServer commit?
> 
> I know there's a threshhold and thread pool as params but I don't see a 
> commit 
> timeout.   Do I have to manage this myself?

Re: Fwd: Lucene MMAP Usage with Solr

I thought that was the way to use it (but I've never had to use it myself) and 
that it means memory through the roof, yes.
If you look at the Solr Admin statistics page, does it show you which Directory 
you are using?

For example, on 1 Solr instance I'm looking at I see:

readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: ST ST 
> To: solr-user@lucene.apache.org
> Sent: Fri, November 13, 2009 6:03:57 PM
> Subject: Fwd: Lucene MMAP Usage with Solr
> 
> Folks,
> 
> I am trying to get Lucene MMAP to work in solr.
> 
> I am assuming that when I configure MMAP the entire index will be loaded
> into RAM.
> Is that the right assumption ?
> 
> I have tried the following ways for using MMAP:
> 
> Option 1. Using the solr config below for MMAP configuration
> 
> -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory
> 
>With this config, when I start solr with a 30G index, I expected that the
> RAM usage should go up, but it did not.
> 
> Option 2. By Code Change
> I made the following code change :
> 
>Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead
> of FSDirectory.
>Code snippet pasted below.
> 
> 
> Could you help me to understand if these are the right way to use MMAP?
> 
> Thanks much
> /ST.
> 
> Code SNippet for Option 2:
> 
> package org.apache.solr.core;
> /**
> * Licensed to the Apache Software Foundation (ASF) under one or more
> * contributor license agreements.  See the NOTICE file distributed with
> * this work for additional information regarding copyright ownership.
> * The ASF licenses this file to You under the Apache License, Version 2.0
> * (the "License"); you may not use this file except in compliance with
> * the License.  You may obtain a copy of the License at
> *
> *http://www.apache.org/licenses/LICENSE-2.0
> *
> * Unless required by applicable law or agreed to in writing, software
> * distributed under the License is distributed on an "AS IS" BASIS,
> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> 
> import java.io.File;
> import java.io.IOException;
> 
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.MMapDirectory;
> 
> /**
> * Directory provider which mimics original Solr FSDirectory based behavior.
> *
> */
> public class StandardDirectoryFactory extends DirectoryFactory {
> 
>   public Directory open(String path) throws IOException {
> return MMapDirectory.open(new File(path));
>   }
> }

Re: Stop solr without losing documents

So I think the question is really:
"If I stop the servlet container, does Solr issue a commit in the shutdown hook 
in order to ensure all buffered docs are persisted to disk before the JVM 
exits".

I don't have the Solr source handy, but if I did, I'd look for "Shutdown", 
"Hook" and "finalize" in the code.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Chris Hostetter 
> To: solr-user@lucene.apache.org
> Sent: Fri, November 13, 2009 4:09:00 PM
> Subject: Re: Stop solr without losing documents
> 
> 
> : which documents have been updated before a successful commit.  Now
> : stopping solr is as easy as kill -9.
> 
> please don't kill -9 ... it's grossly overkill, and doesn't give your 
> servlet container a fair chance to cleanthings up.  A lot of work has been 
> done to make Lucene indexes robust to hard terminations of the JVM (or 
> physical machine) but there's no reason to go out of your way to try and 
> stab it in the heart when you could just shut it down cleanly.
> 
> that's not to say your appraoch isn't a good one -- if you only have one 
> client sending updates/commits then having it keep track of what was 
> indexed prior to the lasts successful commit is a viable way to dela with 
> what happens if solr stops responding (either because you shut it down, or 
> because it crashed for some other reason).
> 
> Alternately, you could take advantage of the "enabled" feature from your 
> client (just have it test the enabled url ever N updates or so) and when 
> it sees that you have disabled the port it can send one last commit and 
> then stop sending updates until it sees the enabled URL work againg -- as 
> soon as you see the updates stop, you can safely shutdown hte port.
> 
> 
> -Hoss

changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin

I'm testing out the final release of Solr 1.4 as compared to the build
I have been using from around June.

I'm using hte dismax handler for searches.  I'm finding that
highlighting is completely broken as compared to previously.  Much
more text is returned than it should for each string in , but the search words  are never highlighted in
that response.  Setting usePhraseHighlighter=false makes no
difference.

Any pointers appreciated.

-Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Solr 1.3 query and index perf tank during optimize

Let's take a step back.  Why do you need to optimize?  You said: "As long as 
I'm not optimizing, search and indexing times are satisfactory." :)

You don't need to optimize just because you are continuously adding and 
deleting documents.  On the contrary!

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Jerome L Quinn 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 12, 2009 6:30:42 PM
> Subject: Solr 1.3 query and index perf tank during optimize
> 
> 
> Hi, everyone, this is a problem I've had for quite a while,
> and have basically avoided optimizing because of it.  However,
> eventually we will get to the point where we must delete as
> well as add docs continuously.
> 
> I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
> instance running inside tomcat 6, so no replication.  Merge factor is the
> default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
> autoCommit is set at 3 sec.
> 
> We continually push new data into the index, at somewhere between 1-10 docs
> every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
> under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
> There's nothing
> else of substance running on the box.
> 
> Optimizing the index takes about 65 min.
> 
> As long as I'm not optimizing, search and indexing times are satisfactory.
> 
> When I start the optimize, I see massive problems with timeouts pushing new
> docs
> into the index, and search times balloon.  A typical search while
> optimizing takes
> about 1 min instead of a few seconds.
> 
> Can anyone offer me help with fixing the problem?
> 
> Thanks,
> Jerry Quinn

Re: Solr 1.3 query and index perf tank during optimize

The 'maxSegments' feature is new with 1.4.  I'm not sure that it will
cause any less disk I/O during optimize.

The 'mergeFactor=2' idea is not what you think: in this case the index
is always "mostly optimized", so you never need to run optimize.
Indexing is always slower, because you amortize the optimize time into
little continuous chunks during indexing. You never stop indexing. You
should not lose documents.

On Fri, Nov 13, 2009 at 1:07 PM, Jerome L Quinn  wrote:
>
> Mark Miller  wrote on 11/12/2009 07:18:03 PM:
>> Ah, the pains of optimization. Its kind of just how it is. One solution
>> is to use two boxes and replication - optimize on the master, and then
>> queries only hit the slave. Out of reach for some though, and adds many
>> complications.
>
> Yes, in my use case 2 boxes isn't a great option.
>
>
>> Another kind of option is to use the partial optimize feature:
>>
>>  
>>
>> Using this, you can optimize down to n segments and take a shorter hit
>> each time.
>
> Is this a 1.4 feature?  I'm planning to migrate to 1.4, but it'll take a
> while since
> I have to port custom code forward, including a query parser.
>
>
>> Also, if optimizing is so painful, you might lower the merge factor
>> amortize that pain better. Thats another way to slowly get there - if
>> you lower the merge factor, as merging takes place, the new merge factor
>> will be respected, and semgents will merge down. A merge factor of 2
>> (the lowest) will make it so you only ever have 2 segments. Sometimes
>> that works reasonably well - you could try 3-6 or something as well.
>> Then when you do your partial optimizes (and eventually a full optimize
>> perhaps), you want have so far to go.
>
> So this will slow down indexing but speed up optimize somewhat?
> Unfortunately
> right now I lose docs I'm indexing, as well slowing searching to a crawl.
> Ugh.
>
> I've got plenty of CPU horsepower.  This is where having the ability to
> optimize
> on another filesystem would be useful.
>
> Would it perhaps make sense to set up a master/slave on the same machine?
> Then
> I suppose I can have an index being optimized that might not clobber the
> search.
> Would new indexed items still be dropped on the floor?
>
> Thanks,
> Jerry



-- 
Lance Norskog
goks...@gmail.com

Re: Fwd: Lucene MMAP Usage with Solr

http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/apis/mmap.htm

Normally file I/O in a program means that the data is copied between
the system I/O disk cache and the program's memory. Memory-mapping
means that the program address space points to the disk I/O cache
directly, there is no copying. In other words the program and the OS
share the same memory.

The OS tends to stream the entire file in but this is not required.
Memory-mapping may be faster and may not, depending on your index
sizes, memory access patterns, etc.

On Fri, Nov 13, 2009 at 7:49 PM, Otis Gospodnetic
 wrote:
> I thought that was the way to use it (but I've never had to use it myself) 
> and that it means memory through the roof, yes.
> If you look at the Solr Admin statistics page, does it show you which 
> Directory you are using?
>
> For example, on 1 Solr instance I'm looking at I see:
>
> readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/
>
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: ST ST 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, November 13, 2009 6:03:57 PM
>> Subject: Fwd: Lucene MMAP Usage with Solr
>>
>> Folks,
>>
>> I am trying to get Lucene MMAP to work in solr.
>>
>> I am assuming that when I configure MMAP the entire index will be loaded
>> into RAM.
>> Is that the right assumption ?
>>
>> I have tried the following ways for using MMAP:
>>
>> Option 1. Using the solr config below for MMAP configuration
>>
>> -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory
>>
>>    With this config, when I start solr with a 30G index, I expected that the
>> RAM usage should go up, but it did not.
>>
>> Option 2. By Code Change
>>     I made the following code change :
>>
>>    Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead
>> of FSDirectory.
>>    Code snippet pasted below.
>>
>>
>> Could you help me to understand if these are the right way to use MMAP?
>>
>> Thanks much
>> /ST.
>>
>> Code SNippet for Option 2:
>>
>> package org.apache.solr.core;
>> /**
>> * Licensed to the Apache Software Foundation (ASF) under one or more
>> * contributor license agreements.  See the NOTICE file distributed with
>> * this work for additional information regarding copyright ownership.
>> * The ASF licenses this file to You under the Apache License, Version 2.0
>> * (the "License"); you may not use this file except in compliance with
>> * the License.  You may obtain a copy of the License at
>> *
>> *    http://www.apache.org/licenses/LICENSE-2.0
>> *
>> * Unless required by applicable law or agreed to in writing, software
>> * distributed under the License is distributed on an "AS IS" BASIS,
>> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> * See the License for the specific language governing permissions and
>> * limitations under the License.
>> */
>>
>> import java.io.File;
>> import java.io.IOException;
>>
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.MMapDirectory;
>>
>> /**
>> * Directory provider which mimics original Solr FSDirectory based behavior.
>> *
>> */
>> public class StandardDirectoryFactory extends DirectoryFactory {
>>
>>   public Directory open(String path) throws IOException {
>>     return MMapDirectory.open(new File(path));
>>   }
>> }
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin

Apparently one of my conf files was broken - odd that I didn't see any
exceptions.  Anyhow - excuse my haste, I don't see the problem now.

-Peter

On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin
 wrote:
> I'm testing out the final release of Solr 1.4 as compared to the build
> I have been using from around June.
>
> I'm using hte dismax handler for searches.  I'm finding that
> highlighting is completely broken as compared to previously.  Much
> more text is returned than it should for each string in  name="highlighting">, but the search words  are never highlighted in
> that response.  Setting usePhraseHighlighter=false makes no
> difference.
>
> Any pointers appreciated.
>
> -Peter
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Reseting doc boosts

Yeah I ended up created a "boosted" field for @ least debugging, but might 
patch / extend / create my own FieldNormModifier using just that criteria + 
doing the reset.

- Jon

On Nov 13, 2009, at 12:21 PM, Avlesh Singh wrote:

> AFAIK there is no way to "reset" the doc boost. You would need to re-index.
> Moreover, there is no way to "search by boost".
> 
> Cheers
> Avlesh
> 
> On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer  wrote:
> 
>> Hi,
>> 
>> Im trying to figure out if there is an easy way to basically "reset" all of
>> any doc boosts which you have made (for analytical purposes) ... for example
>> if I run an index, gather report, doc boost on the report, and reset the
>> boosts @ time of next index ...
>> 
>> It would seem to be from just knowing how Lucene works that I would really
>> need to reindex since its a attrib on the doc itself which would have to be
>> modified, but there is no easy way to query for docs which have been boosted
>> either.  Any insight?
>> 
>> Thanks.
>> 
>> - Jon

Re: Data import problem with child entity from different database

am unable to get the file
http://old.nabble.com/file/p26335171/dataimport.temp.xml

On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg  wrote:
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> no obvious issues.
>> you may post your entire data-config.xml
>>
>
> Here it is, exactly as last attempt but with usernames etc. removed.
>
> Ignore the comments and the unused FileDataSource...
>
> http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> do w/o CachedSqlEntityProcessor first and then apply that later
>>
>
> Yep, that was just a bit of a wild stab in the dark to see if it made any
> difference.
>
> Thanks,
>
> Andrew.
>
> --
> View this message in context: 
> http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Stop solr without losing documents

I would go with polling Solr to find what is not yet there. In
production, it is better to assume that things will break, and have
backstop janitors that fix them. And then test those janitors
regularly.

On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic
 wrote:
> So I think the question is really:
> "If I stop the servlet container, does Solr issue a commit in the shutdown 
> hook in order to ensure all buffered docs are persisted to disk before the 
> JVM exits".
>
> I don't have the Solr source handy, but if I did, I'd look for "Shutdown", 
> "Hook" and "finalize" in the code.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: Chris Hostetter 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, November 13, 2009 4:09:00 PM
>> Subject: Re: Stop solr without losing documents
>>
>>
>> : which documents have been updated before a successful commit.  Now
>> : stopping solr is as easy as kill -9.
>>
>> please don't kill -9 ... it's grossly overkill, and doesn't give your
>> servlet container a fair chance to cleanthings up.  A lot of work has been
>> done to make Lucene indexes robust to hard terminations of the JVM (or
>> physical machine) but there's no reason to go out of your way to try and
>> stab it in the heart when you could just shut it down cleanly.
>>
>> that's not to say your appraoch isn't a good one -- if you only have one
>> client sending updates/commits then having it keep track of what was
>> indexed prior to the lasts successful commit is a viable way to dela with
>> what happens if solr stops responding (either because you shut it down, or
>> because it crashed for some other reason).
>>
>> Alternately, you could take advantage of the "enabled" feature from your
>> client (just have it test the enabled url ever N updates or so) and when
>> it sees that you have disabled the port it can send one last commit and
>> then stop sending updates until it sees the enabled URL work againg -- as
>> soon as you see the updates stop, you can safely shutdown hte port.
>>
>>
>> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: javabin in .NET?

OK. Is there anyone trying it out? where is this code ? I can try to help ..

On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
 wrote:
> I meant the standard IO libraries. They are different enough that the code
> has to be manually ported. There were some automated tools back when
> Microsoft introduced .Net, but IIRC they never really worked.
>
> Anyway it's not a big deal, it should be a straightforward job. Testing it
> thoroughly cross-platform is another thing though.
>
> 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> The javabin format does not have many dependencies. it may have 3-4
>> classes an that is it.
>>
>> On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
>>  wrote:
>> > Nope. It has to be manually ported. Not so much because of the language
>> > itself but because of differences in the libraries.
>> >
>> >
>> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>> >
>> >> Is there any tool to directly port java to .Net? then we can etxract
>> >> out the client part of the javabin code and convert it.
>> >>
>> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher 
>> >> wrote:
>> >> > Has anyone looked into using the javabin response format from .NET
>> >> (instead
>> >> > of SolrJ)?
>> >> >
>> >> > It's mainly a curiosity.
>> >> >
>> >> > How much better could performance/bandwidth/throughput be?  How
>> difficult
>> >> > would it be to implement some .NET code (C#, I'd guess being the best
>> >> > choice) to handle this response format?
>> >> >
>> >> > Thanks,
>> >> >        Erik
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -
>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >>
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Data import problem with child entity from different database