Re: delta-import not giving updated records

2009-02-23 Thread con

HI

I made that change of quotes and case sensitivity. But now i am getting the
below exception while running delta-import:

Document # 1
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00918: column ambiguously
defined

at
oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:91)
at 
oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
at
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
at
oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:774)
at
oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:849)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1186)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1770)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739)
at
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:179)
... 10 more
13:48:22,538 ERROR [STDERR] 23 Feb, 2009 1:48:22 PM
org.apache.solr.handler.dataimport.DataImporter doDeltaImport
SEVERE: Delta Import Failed


Also i tried replacing deltaQuery with deltaImportQuery. This is not
throwing any exception but not updating the index also. I have put the same
query that is used in full import to do the delta import.

Thanks
con




Shalin Shekhar Mangar wrote:
> 
> 1. There is no closing quote in transformer="TemplateTransformer
> 2. Attribute names are case-sensitive so it should be deltaQuery instead
> of
> deltaquery
> 
> On Fri, Feb 20, 2009 at 6:48 PM, con  wrote:
> 
>>
>> Hi alll
>>
>> I am trying to run delta-import. For this I am having the below
>> data-config.xml
>>
>> 
>>> driver="oracle.jdbc.driver.OracleDriver"
>> url="***" user="" password="*"/>
>>
>>> transformer="TemplateTransformer pk="USER_ID"
>>query="select USERS.USER_ID, USERS.USER_NAME,
>> USERS.CREATED_TIMESTAMP
>> FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID"
>>
>>deltaquery="select USERS.USER_ID, USERS.USER_NAME,
>> USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID =
>> CUSTOMERS.USER_ID" >
>>> />
>>
>>
>> 
>>
>> But nothing is happening when i call
>> http://localhost:8080/solr/users/dataimport?command=delta-import. Whereas
>> the dataimport.properties is getting updated with the time at which
>> delta-import is run.
>>
>> Where as
>> http://localhost:8080/solr/users/dataimport?command=full-importis
>> properly inserting data.
>>
>> Can anybody suggest what is wrong with this configuration.
>>
>> Thanks
>> con
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22120184.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22157367.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Persistent, seemingly unfixable corrupt indices

2009-02-23 Thread Michael McCandless


This is spooky!

First off, why are you hitting so much index corruption?  Many classes  
of failure (unhandled exception exits JVM, JVM killed or SEGVs, OS  
crashes, power cord is pulled, etc.) should never result in index  
corruption.  Other failures (bad RAM, bad hard drives) can easily  
cause corruption.  So I'd really like to understand what kind of  
corruption you're seeing and how/why.  Why does Solr need to be  
killed, and how do you kill it?  When CheckIndex does catch the  
failure, what failures is it seeing?  Is there any pattern to which  
indexes become corrupt?


Hmm -- you seem to be using Lucene 2.3.1, so in fact OS crashes and  
power cord pulling could lead to corruption.  But JVM crashing or  
being killed should not.  Upgrading to Solr 1.3 (Lucene 2.4) would be  
a good idea, though I'd still like to understand what's causing your  
corruption.


Second off, you're right: CheckIndex fails to detect the docs-out-of- 
order form of corruption.  I will open Jira issue & fix it.


Mike

James Brady wrote:


Hi,My indices sometime become corrupted - normally when Solr has to be
KILLed - these are not normally too much of a problem, as
Lucene's CheckIndex tool can normally detect missing / broken  
segments and

fix them.

However, I now have a few indices throwing errors like this:

INFO: [core4] webapp=/solr path=/update params={} status=0 QTime=2
Exception in thread "Thread-75"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: docs out of order  
(1124 <=

1138 )
at
org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:271)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out  
of order

(1124 <= 1138 )
at
org 
.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
502)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
456)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
425)
at  
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
389)

at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3109)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at
org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:240)


and

INFO: [core7] webapp=/solr path=/update params={} status=500  
QTime=5457

Feb 22, 2009 12:14:07 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.index.CorruptIndexException: docs out of  
order

(242 <= 248 )
at
org 
.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
502)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
456)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
425)
at  
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
389)

at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3109)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at
org 
.apache 
.lucene 
.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java: 
193)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1800)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1795)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1791)

at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2398)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java: 
1465)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java: 
1424)

at
org 
.apache 
.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java: 
278)



CheckIndex reports these cores as being completely healthy, and yet  
I can't

commit new documents in to them.

Rebuilding indices isn't an option for me: is there any other way to  
fix

this? If not, any ideas on what I can do to prevent it in the future?

Many thanks,
James




Re: Persistent, seemingly unfixable corrupt indices

2009-02-23 Thread Michael McCandless


Actually, even in 2.3.1, CheckIndex checks for docs-out-of-order both  
within and across segments, so now I'm at a loss as to why it's not  
catching your case.   Any of these indexes small enough to post  
somewhere i could access?


Mike

James Brady wrote:


Hi,My indices sometime become corrupted - normally when Solr has to be
KILLed - these are not normally too much of a problem, as
Lucene's CheckIndex tool can normally detect missing / broken  
segments and

fix them.

However, I now have a few indices throwing errors like this:

INFO: [core4] webapp=/solr path=/update params={} status=0 QTime=2
Exception in thread "Thread-75"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: docs out of order  
(1124 <=

1138 )
at
org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:271)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out  
of order

(1124 <= 1138 )
at
org 
.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
502)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
456)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
425)
at  
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
389)

at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3109)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at
org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:240)


and

INFO: [core7] webapp=/solr path=/update params={} status=500  
QTime=5457

Feb 22, 2009 12:14:07 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.index.CorruptIndexException: docs out of  
order

(242 <= 248 )
at
org 
.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
502)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
456)

at
org 
.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
425)
at  
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
389)

at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3109)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at
org 
.apache 
.lucene 
.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java: 
193)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1800)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1795)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java: 
1791)

at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2398)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java: 
1465)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java: 
1424)

at
org 
.apache 
.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java: 
278)



CheckIndex reports these cores as being completely healthy, and yet  
I can't

commit new documents in to them.

Rebuilding indices isn't an option for me: is there any other way to  
fix

this? If not, any ideas on what I can do to prevent it in the future?

Many thanks,
James




Re: delta-import not giving updated records

2009-02-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Feb 23, 2009 at 2:09 PM, con  wrote:
>
> HI
>
> I made that change of quotes and case sensitivity. But now i am getting the
> below exception while running delta-import:
>
> Document # 1
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43)
>at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
>at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
>at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
>at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
>at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
>at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
> Caused by: java.sql.SQLSyntaxErrorException: ORA-00918: column ambiguously
> defined
>
>at
> oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:91)
>at 
> oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
>at
> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
>at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
>at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
>at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030)
>at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
>at
> oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:774)
>at
> oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:849)
>at
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1186)
>at
> oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1770)
>at 
> oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739)
>at
> oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298)
>at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:179)
>... 10 more
> 13:48:22,538 ERROR [STDERR] 23 Feb, 2009 1:48:22 PM
> org.apache.solr.handler.dataimport.DataImporter doDeltaImport
> SEVERE: Delta Import Failed
>
>
> Also i tried replacing deltaQuery with deltaImportQuery. This is not
> throwing any exception but not updating the index also. I have put the same
> query that is used in full import to do the delta import.

deltaImportQuery is not a replacement for deltaQuery
both hast to be present. 'deltaQuery' identifies the changed rows and
deltaImportQuery uses the values to import data

>
> Thanks
> con
>
>
>
>
> Shalin Shekhar Mangar wrote:
>>
>> 1. There is no closing quote in transformer="TemplateTransformer
>> 2. Attribute names are case-sensitive so it should be deltaQuery instead
>> of
>> deltaquery
>>
>> On Fri, Feb 20, 2009 at 6:48 PM, con  wrote:
>>
>>>
>>> Hi alll
>>>
>>> I am trying to run delta-import. For this I am having the below
>>> data-config.xml
>>>
>>> 
>>>>> driver="oracle.jdbc.driver.OracleDriver"
>>> url="***" user="" password="*"/>
>>>
>>>>> transformer="TemplateTransformer pk="USER_ID"
>>>query="select USERS.USER_ID, USERS.USER_NAME,
>>> USERS.CREATED_TIMESTAMP
>>> FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID"
>>>
>>>deltaquery="select USERS.USER_ID, USERS.USER_NAME,
>>> USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID =
>>> CUSTOMERS.USER_ID" >
>>>>> />
>>>
>>>
>>> 
>>>
>>> But nothing is happening when i call
>>> http://localhost:8080/solr/users/dataimport?command=delta-import. Whereas
>>> the dataimport.properties is getting updated with the time at which
>>> delta-import is run.
>>>
>>> Where as
>>> http://localhost:8080/solr/users/dataimport?command=full-importis
>>> properly inserting data.
>>>
>>> Can anybody suggest what is wrong with this configuration.
>>>
>>> Thanks
>>> con
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22120184.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22157367.html
> Sent from the Sol

Re: Error with highlighter and UTF-8 chars?

2009-02-23 Thread Koji Sekiguchi

Jacob,

What Solr version are you using? There is a bug in SolrHighlighter of 
Solr 1.3,

you may want to look at:

https://issues.apache.org/jira/browse/SOLR-925
https://issues.apache.org/jira/browse/LUCENE-1500

regards,

Koji


Jacob Singh wrote:

Hi,

We ran into a weird one today.  We have a document which is written in
German and everytime we make a query which matches it, we get the
following:

java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
at java.lang.String.substring(String.java:1935)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)


>From source diving it looks like Lucene's highlighter is trying to
subStr against an offset that is outside the bounds of the body field
which it is highlighting against.  Running a fq against the ID of the
doucment returns it fine (because no highlighting is done) and I took
the body and tried to cut the first 2822 chars and while it is near
the end of the body, it is still in range.

Here is the related code:

startOffset = tokenGroup.matchStartOffset;
endOffset = tokenGroup.matchEndOffset;
tokenText = text.substring(startOffset, endOffset);


This leads me to believe there is some problem with mb string encoding
and Lucene's counting.

Any ideas here?  Tomcat is configured with UTF-8 btw.

Best,
Jacob


  




Re: Error with highlighter and UTF-8 chars?

2009-02-23 Thread Peter Wolanin
We are using Solr trunk (1.4)  - currently " nightly exported - yonik
- 2009-02-05 08:06:00"

-Peter

On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi  wrote:
> Jacob,
>
> What Solr version are you using? There is a bug in SolrHighlighter of Solr
> 1.3,
> you may want to look at:
>
> https://issues.apache.org/jira/browse/SOLR-925
> https://issues.apache.org/jira/browse/LUCENE-1500
>
> regards,
>
> Koji
>
>
> Jacob Singh wrote:
>>
>> Hi,
>>
>> We ran into a weird one today.  We have a document which is written in
>> German and everytime we make a query which matches it, we get the
>> following:
>>
>> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
>>at java.lang.String.substring(String.java:1935)
>>at
>> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
>>
>>
>> >From source diving it looks like Lucene's highlighter is trying to
>> subStr against an offset that is outside the bounds of the body field
>> which it is highlighting against.  Running a fq against the ID of the
>> doucment returns it fine (because no highlighting is done) and I took
>> the body and tried to cut the first 2822 chars and while it is near
>> the end of the body, it is still in range.
>>
>> Here is the related code:
>>
>> startOffset = tokenGroup.matchStartOffset;
>> endOffset = tokenGroup.matchEndOffset;
>> tokenText = text.substring(startOffset, endOffset);
>>
>>
>> This leads me to believe there is some problem with mb string encoding
>> and Lucene's counting.
>>
>> Any ideas here?  Tomcat is configured with UTF-8 btw.
>>
>> Best,
>> Jacob
>>
>>
>>
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Distributed Search

2009-02-23 Thread gwk

Hello,

The wiki states 'When duplicate doc IDs are received, Solr chooses the 
first doc and discards subsequent ones', I was wondering whether "the 
first doc" is the doc of the shard which responds first or the doc in 
the first shard in the shards GET parameter?


Regards,

gwk


faceted query - min and max numeric values

2009-02-23 Thread Yves Hougardy
Hi,I'm trying to set up a request with two "query.facet" which would give me
the Min a Max prices over all the query results.
What i'm interested in is not really the facet counts but those two
prices...
Faceting on the price field would enumerate all the different prices w/
their facet count. On the other hand a query facet would just give me the
facet count, right?

So, what's the best way to get those min and max values ? (I'm using
solr1.3), is their any specific component for this or
should I extend the SimpleFacets class (if so,, where do i configure/declare
my new component ?)

Thanks in advance

-- 
Yves Hougardyhttp://www.clever-age.com
Clever Age - conseil en architecture technique
Tél: +33 1 53 34 66 10


Re: faceted query - min and max numeric values

2009-02-23 Thread Erik Hatcher
Have a look at the StatsComponent, added after Solr 1.3 release  
though.  You can grab a nightly build to have it built-in.


More info available here: 


Erik


On Feb 23, 2009, at 9:10 AM, Yves Hougardy wrote:

Hi,I'm trying to set up a request with two "query.facet" which would  
give me

the Min a Max prices over all the query results.
What i'm interested in is not really the facet counts but those two
prices...
Faceting on the price field would enumerate all the different prices  
w/
their facet count. On the other hand a query facet would just give  
me the

facet count, right?

So, what's the best way to get those min and max values ? (I'm using
solr1.3), is their any specific component for this or
should I extend the SimpleFacets class (if so,, where do i configure/ 
declare

my new component ?)

Thanks in advance

--
Yves Hougardyhttp://www.clever-age.com
Clever Age - conseil en architecture technique
Tél: +33 1 53 34 66 10




Re: Distributed Search

2009-02-23 Thread Koji Sekiguchi

gwk wrote:

Hello,

The wiki states 'When duplicate doc IDs are received, Solr chooses the 
first doc and discards subsequent ones', I was wondering whether "the 
first doc" is the doc of the shard which responds first or the doc in 
the first shard in the shards GET parameter?


Regards,

gwk



It is the doc of the shard which responds first, if my memory is correct...

Koji




Re: Boosting Code

2009-02-23 Thread Marc Sturlese

If you mean at indexing time, you set field boost via data-config.xml. That
boost is parsed from there and set to the lucene document going through
DocBuilder,java, SolrInputDocuemnt.java and DocuemntBuilder.java
In case you want to set full-document boost (not just to a field) you can do
it setting a value to the key $docBoost via transformer. That value is set
using same classes (DocBuilder,java, SolrInputDocuemnt.java and
DocuemntBuilder.java).



dabboo wrote:
> 
> Hi,
> 
> Can anyone please tell me where I can find the actual logic/implementation
> of field boosting in Solr. I am looking for classes.
> 
> Thanks,
> Amit Garg
> 

-- 
View this message in context: 
http://www.nabble.com/Boosting-Code-tp22119017p22162823.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Defining shards in solrconfig with multiple cores

2009-02-23 Thread jdleider

Excellent, I have created a new request handler that has all my shards,
boosts and other defaults as /business. /select is set to be the default
request handler. The queries seem to run fine now and after a bit of spot
checking they seem to be getting the same results. I have one question
though.

How can I be sure the shards are applying the same boosts as the one
receiving the query? I access the url:
solr1:8080/solr/core0/business?q=pizza Are the shards applying the boost on
their index with the /business request handler or is it up to the
originating shard to apply the boosts? 

Thanks for your help!

- Justin



Yonik Seeley-2 wrote:
> 
> On Fri, Feb 20, 2009 at 10:32 AM, jdleider 
> wrote:
>> However when i try to /select using
>> this shards param in the solrconfig.xml the query just hangs.
> 
> The basic /select url should normally not have shards set as a
> default... this will cause infinite recursion when the top level
> searcher sends requests to the sub-searchers until you exhaust all
> threads and run into a distributed deadlock.  Set up another handler
> with the default shards param instead.
> 
> -Yonik
> Lucene/Solr? http://www.lucidimagination.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Defining-shards-in-solrconfig-with-multiple-cores-tp22120446p22163349.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Defining shards in solrconfig with multiple cores

2009-02-23 Thread Yonik Seeley
On Mon, Feb 23, 2009 at 10:12 AM, jdleider  wrote:
> How can I be sure the shards are applying the same boosts as the one
> receiving the query?

Defaults always apply at any search handler... so if you put default
params on /business, they will become part of the shard sub-requests.
You can verify by looking at the logs and the shard search requests
received by each shard.

-Yonik
Lucene/Solr? http://www.lucidimagination.com


Re: fastest way to index/reindex

2009-02-23 Thread Josiane Gamgo
How fast is the search if the MergeFactor of Lucene Index is set to 20 or
more?did somebody uses Luke to optimize the indexing process? I would like
to know how fast is Luke.
Thanks


On Tue, Jan 27, 2009 at 3:52 PM, Ian Connor  wrote:

> When you query by *:*, what order does it use. Is there a chance they will
> come in a different order as you page through the results (and
> miss/dupicate
> some). Is it best to put the order explicitly by 'id' or is that implied
> already?
>
> On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor  wrote:
>
> > *:* took it up to 45/sec from 28/sec so a nice 60% bump in performance -
> > thanks!
> >
> >
> > On Sun, Jan 25, 2009 at 5:46 PM, Ryan McKinley 
> wrote:
> >
> >> I don't know of any standard export/import tool -- i think luke has
> >> something, but it will be faster if you write your own.
> >>
> >> Rather then id:[* TO *], just try *:*  -- this should match all
> documents
> >> without using a range query.
> >>
> >>
> >>
> >> On Jan 25, 2009, at 3:16 PM, Ian Connor wrote:
> >>
> >>  Hi,
> >>>
> >>> Given the only real way to reindex is to save the document again, what
> is
> >>> the fastest way to extract all the documents from a solr index to
> resave
> >>> them.
> >>>
> >>> I have tried the id:[* TO *] trick however, it takes a while once you
> get
> >>> a
> >>> few thousand into the index. Are there any tools that will quickly
> export
> >>> the index to a text file or making queries 1000 at a time is the best
> >>> option
> >>> and dealing with the time it takes to query once you are deep into the
> >>> index?
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Ian Connor
> >>>
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Ian Connor
> >
>
>
>
> --
> Regards,
>
> Ian Connor
> 1 Leighton St #723
> Cambridge, MA 02141
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Fax: +1(770) 818 5697
> Skype: ian.connor
>


Re: fastest way to index/reindex

2009-02-23 Thread Erick Erickson
please don't hijack topic threads, start a new one

http://en.wikipedia.org/wiki/Thread_hijacking

Best
Erick

MergeFactor isn't very related to searching, Luke isn't used in
the indexing process and why do you care how fast Luke is?

When you start a new post on this topic, please give an idea of
the problem you're trying to solve or that you are having, it'll lead
to much better answers.



On Mon, Feb 23, 2009 at 11:07 AM, Josiane Gamgo wrote:

> How fast is the search if the MergeFactor of Lucene Index is set to 20 or
> more?did somebody uses Luke to optimize the indexing process? I would like
> to know how fast is Luke.
> Thanks
>
>
> On Tue, Jan 27, 2009 at 3:52 PM, Ian Connor  wrote:
>
> > When you query by *:*, what order does it use. Is there a chance they
> will
> > come in a different order as you page through the results (and
> > miss/dupicate
> > some). Is it best to put the order explicitly by 'id' or is that implied
> > already?
> >
> > On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor 
> wrote:
> >
> > > *:* took it up to 45/sec from 28/sec so a nice 60% bump in performance
> -
> > > thanks!
> > >
> > >
> > > On Sun, Jan 25, 2009 at 5:46 PM, Ryan McKinley 
> > wrote:
> > >
> > >> I don't know of any standard export/import tool -- i think luke has
> > >> something, but it will be faster if you write your own.
> > >>
> > >> Rather then id:[* TO *], just try *:*  -- this should match all
> > documents
> > >> without using a range query.
> > >>
> > >>
> > >>
> > >> On Jan 25, 2009, at 3:16 PM, Ian Connor wrote:
> > >>
> > >>  Hi,
> > >>>
> > >>> Given the only real way to reindex is to save the document again,
> what
> > is
> > >>> the fastest way to extract all the documents from a solr index to
> > resave
> > >>> them.
> > >>>
> > >>> I have tried the id:[* TO *] trick however, it takes a while once you
> > get
> > >>> a
> > >>> few thousand into the index. Are there any tools that will quickly
> > export
> > >>> the index to a text file or making queries 1000 at a time is the best
> > >>> option
> > >>> and dealing with the time it takes to query once you are deep into
> the
> > >>> index?
> > >>>
> > >>> --
> > >>> Regards,
> > >>>
> > >>> Ian Connor
> > >>>
> > >>
> > >>
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Ian Connor
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Ian Connor
> > 1 Leighton St #723
> > Cambridge, MA 02141
> > Call Center Phone: +1 (714) 239 3875 (24 hrs)
> > Fax: +1(770) 818 5697
> > Skype: ian.connor
> >
>


Re: Question about etag

2009-02-23 Thread Pascal Dimassimo

I finally found the reason of this behavior. I realize that if I waited a
couple of minutes, Firefox would send the "if-none-match" header which was
responded by the 304 code by solr.

What happens is that Firefox keeps a disk cache. If a response contains the
header "Last-Modified", even if there is a etag header, Firefox computes an
expiration date which was about 5 minutes for my request. And during that
period, the request was served from the cache. 

You can see the expiration date by looking at about:cache in Firefox. The
rules to compute the expiration time depending on the headers is described
here: https://developer.mozilla.org/En/HTTP_Caching_FAQ

I realize that this was a Firefox issue. Sorry to have disrupt this list.


Pascal Dimassimo wrote:
> 
> Sorry, the xml of the solrconfig.xml was lost. It is
> 
> 
> 
> 
> Hi guys,
>  
> I'm having trouble understanding the behavior of firefox and the etag.
>  
> After cleaning the cache, I send this request from firefox:
>  
> GET /solr/select/?q=television HTTP/1.1
> Host: localhost:8088
> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6)
> Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729)
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en-us,en;q=0.5
> Accept-Encoding: gzip,deflate
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive: 300
> Connection: keep-alive
> Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD
>  
> Which solr responds with:
>  
> HTTP/1.1 200 OK
> Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT
> ETag: "NmViOTJkMjc1ODgwMDAwMFNvbHI="
> Content-Type: text/xml; charset=utf-8
> Transfer-Encoding: chunked
> Server: Jetty(6.1.3)
> (#data following#)
>  
> So far so good. But then, I press F5 to refresh the page. Now if I
> understand correctly the way the etag works, firefox should send the
> request with a "if-none-match" along with the etag and then the server
> should return a 304 "not modified" code.
>  
> But what happens is that firefox just don't send anything. In the firebug
> window, I only see "0 requests". Just to make sure I test with tcpmon and
> nothing is sent by firefox.
>  
> Is this making sense? Am I missing something?
>  
> My solrconfig.xml has this config:
> 
> 
>  
>  
> Thanks!
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Question-about-etag-tp22125449p22167528.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: LocalSolr distributed search

2009-02-23 Thread Rajiv2

Great this works. 

BTW, there seems to be a problem w/ LocalSolrQueryComponent. It doesn't seem
to be respecting any filtering parameters in the solr request (fq). 

thanks,
Rajiv


pjaol wrote:
> 
> Hi
> 
> Most of the localsolr / locallucene doc's are a little out of date I'll
> get to updating them soon
> the most relevant ones are on http://www.gissearch.com/
> 
> To use it in a distributed form, it should already be built into the trunk
> version 
> Use the standard query component as your primary entry point and add the
> following parameters
> shards=host:port/solr_path,host2:port/solr_path,host3...
> shards.qt=geo //or whatever you've called it.
> 
> The standard query component will then perform the distributed search, and
> aggregate the results.
> 
> HTH
> Patrick
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/LocalSolr-distributed-search-tp22091124p22168097.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: exceeded limit of maxWarmingSearchers

2009-02-23 Thread mahendra mahendra
Hi Shalin,
 
Which auto-warming I have to remove to make commits faster. The below 
configuration have in solrconfig.xml. Also, what is maxWarmingSearchers value.
 
 
   
    
 
  
 
Thanks for your help!!

Thanks & Regards,
Mahendra

--- On Mon, 2/23/09, Shalin Shekhar Mangar  wrote:

From: Shalin Shekhar Mangar 
Subject: Re: exceeded limit of maxWarmingSearchers
To: solr-user@lucene.apache.org
Date: Monday, February 23, 2009, 10:35 AM

On Mon, Feb 23, 2009 at 10:23 AM, mahendra mahendra <
mahendra_featu...@yahoo.com> wrote:

> Hi,
>
> I have scheduled Incremental indexing to run for every 2 min. Some times
> due to more number of records the first instance of the incremental
couldn't
> complete before second instance start. This is causing the below error.
>
> org.apache.solr.common.SolrException: Error opening new searcher. exceeded
> limit of maxWarmingSearchers=2, try again later.
> Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
> again later.
>
> Is there any configuration parameter to increase maxWarmingSearchers.
> Any help would appriciate !!


Increasing maxWarmingSearchers will make things even slower. The right
problem to solve is to increase the time between commits or to remove
auto-warming to make commits faster.
-- 
Regards,
Shalin Shekhar Mangar.



  

Re: embedded wildcard search not working?

2009-02-23 Thread Jim Adams
Some of the wildcards work, but not all of them.  Unsurprisingly, the ones
that seem to work are ones that are wildcards in the 'base' of the word.

Thanks for the tip on the lowercase before stop words.

On Wed, Feb 18, 2009 at 12:35 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Jim,
>
> Does app*l or even a*p* work?  Perhaps "apple" gets stemmed to something
> that doesn't end in "e", such as "appl"?
> Regarding your config, you probably want to lowercase before removing stop
> words, so you'll want to change the order of those filters a bit.  That's
> not related to your wildcard question.
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
> 
> From: Jim Adams 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 18, 2009 6:30:22 AM
> Subject: embedded wildcard search not working?
>
> This is a straightforward question, but I haven't been able to figure out
> what is up with my application.
>
> I seem to be able to search on trailing wildcards just find.  For example,
> fieldName:a* will return documents with apple, ardvaark, etc. in them.  But
> if I was to try and search on a field containing 'apple' with 'a*e' I would
> return nothing.
>
> My gut is telling me that I should be using a different data type or a
> different filter option.  Here is how my text type is defined:
>
> 
>   
> 
>  words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" />
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>  words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
> 
>  protected="protwords.txt"/>
> 
>   
>
> Thanks for your help.
> 
>


general survey of master/replica setups

2009-02-23 Thread Brian Whitman
Say you have a bunch of solr servers that index new data, and then some
replica/"slave" setup that snappulls from the master on a cron or some
schedule. Live internet facing queries hit the replica, not the master, as
indexes/commits on the master slow down queries.
But even the query-only solr installs need to "snap-install" every so often,
triggering a commit, and there is a slowdown in queries when this happens.
Measured avg QTimes during normal times are 400ms, during commit/snapinstall
times they dip into the seconds. Say in the 5m between snappulls 1000
documents have been updated/deleted/added.

How do people mitigate the effect of the commit on replica query instances?


Where are the facets configured?

2009-02-23 Thread Villemos, Gert
New to Lucene / SOLr, but reading and learning fast.
 
I'm missing some fundamental mental link to understand how the facetted 
browsing works in SOLr. I have read the Wiki pages on facets and on 
configuration files. I have looked in the examples provided with SOLr. And 
searched Google / the mailing list. And yet something refuses to make click in 
my mind... (mental deficit?)
 
My problem is; where are the facets defined? I expect to se them in the 
configuration, yet I dont.
 
I see that document 'fields' are defined in the schema.xml / solrconfig.xml. 
And I see that 'facets' are used in the queries. 
 
1. Is the available facets based on the configured fields? A one-to-one mapping?
2. How can I then define sub-facets (i.e. 'Author' -> 'Ray Bradbury')?
3. Some places 'dynamic' facets are mentioned. What is that?
 
Sorry for this question which I fear is very basic. Thanks for all the good 
documentation.
 
Villemos.


Please help Logica to respect the environment by not printing this email  /  
Merci d'aider Logica à préserver l'environnement en évitant d'imprimer ce mail 
/  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei 
die Umwelt zu schuetzen  /  Por favor ajude a Logica a respeitar o ambiente não 
imprimindo este correio electrónico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: Where are the facets configured?

2009-02-23 Thread Yonik Seeley
On Mon, Feb 23, 2009 at 6:27 PM, Villemos, Gert
 wrote:
> My problem is; where are the facets defined? I expect to se them in the 
> configuration, yet I dont.

Solr can facet "on the fly".  The first time you request faceting on a
field, it may take longer though as internal data structures are built
and cached.  The only current requirement is that a field be indexed.

Try it and see... simply add facet=on&facet.field=cat to a query on
the example index.

> 2. How can I then define sub-facets (i.e. 'Author' -> 'Ray Bradbury')?

Facet values (facet constraints like 'Ray Bradbury') are unique
indexed terms for the field.

> 3. Some places 'dynamic' facets are mentioned. What is that?

Probably facet queries, where you can get the count for arbitrary
queries.  See facet.query params.


-Yonik
Lucene/Solr? http://www.lucidimagination.com


AW: Where are the facets configured?

2009-02-23 Thread Villemos, Gert
>> My problem is; where are the facets defined? I expect to se them in the 
>> configuration, yet I dont.

>Solr can facet "on the fly".  The first time you request faceting on a
> field, it may take longer though as internal data structures are built
> and cached.  The only current requirement is that a field be indexed.

> Try it and see... simply add facet=on&facet.field=cat to a query on
> the example index.
 
 

The result is nicely ordered into facets, with elements with counts. Have I 
understood your answer correctly that the returned elements for the 'cat' facet 
searc, i.e. 'search', 'memory', 'graphics', etc, are all specific values that 
the indexed field 'cat' of an added documents have had?

I.e. there will be one document with something like



  

music

...

and three documents with

electronic

 


Please help Logica to respect the environment by not printing this email  /  
Merci d'aider Logica à préserver l'environnement en évitant d'imprimer ce mail 
/  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei 
die Umwelt zu schuetzen  /  Por favor ajude a Logica a respeitar o ambiente não 
imprimindo este correio electrónico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Direct control over document position in search results

2009-02-23 Thread Ercan, Tolga
Hello,

I was wondering if there was any facility to directly manipulate search results 
based on business criteria to place documents at a fixed position in those 
results. For example, when I issue a query, the first four results would be 
based on natural search relevancy, then the fifth result would be based on the 
most relevant document when doctype:video (if I had a doctype field of course), 
then results 6...* would resume natural search relevancy?

Or perhaps a variation on this, if the document where doctype:video would 
appear at a fixed position or better... For example, if somebody searched for 
"my widget video", there would be a relevant document at a higher position than 
#5...

Thanks!
~t


Re: general survey of master/replica setups

2009-02-23 Thread Otis Gospodnetic

Hi Brian,

If you have enough servers - take that machine that's doing snapinstall out of 
the pool for a bit, do snapinstall, warm it up well, put it back in the pool.  
You'd really need to have enough servers, so that when you do this you can 
avoid having multiple live query slaves with different indices (and different 
results).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Brian Whitman 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, February 24, 2009 7:16:20 AM
> Subject: general survey of master/replica setups
> 
> Say you have a bunch of solr servers that index new data, and then some
> replica/"slave" setup that snappulls from the master on a cron or some
> schedule. Live internet facing queries hit the replica, not the master, as
> indexes/commits on the master slow down queries.
> But even the query-only solr installs need to "snap-install" every so often,
> triggering a commit, and there is a slowdown in queries when this happens.
> Measured avg QTimes during normal times are 400ms, during commit/snapinstall
> times they dip into the seconds. Say in the 5m between snappulls 1000
> documents have been updated/deleted/added.
> 
> How do people mitigate the effect of the commit on replica query instances?



arcane queryParser parseException

2009-02-23 Thread Brian Whitman
server:/solr/select?q=field:"''anything can go here;" --> Lexical error,
encountered  after : "\"\'\'anything can go here"
server:/solr/select?q=field:"'anything' anything can go here;" --> Same
problem

server:/solr/select?q=field:"'anything' anything can go here\;" --> No
problem (but ClientUtils's escape does not escape semicolons.)

server:/solr/select?q=field:"anything can go here;" --> no problem

server:/solr/select?q=field:"''anything can go here" --> no problem

As far as I can tell, two apostrophes, then a semicolon causes the lexical
error. There can be text within the apostrophes. If you leave out the
semicolon it's ok. But you can keep the semicolon if you remove the two
apostrophes.

This is on trunk solr.


sint vs integer

2009-02-23 Thread Jonathan Haddad
What are the differences between using an sint and an integer, aside
from the range queries on sint?  If I've indexed a field as an
integer, and I try to sort on it, will there be performance problem?

About 1.5 million documents in index.
-- 
Jonathan Haddad
http://www.rustyrazorblade.com


Re: arcane queryParser parseException

2009-02-23 Thread Chris Hostetter

: server:/solr/select?q=field:"'anything' anything can go here\;" --> No
: problem (but ClientUtils's escape does not escape semicolons.)

ClientUtils doesn't escape it because it's not a special character in the 
SolrQueryParser.

it *is* a special character to the OldLuceneQParserPlugin if (and only if) 
there is no "sort" param specified (this is all due to legacy behavior)

Try either of these...

  ?defType=lucene&q=%22%27%27anything+can+go+here;%22
  ?sort=score+desc&q=%22%27%27anything+can+go+here;%22

(we should probably change the default QParser to "lucene" and document in 
the upgrade notes how legacy users of the ";" sort sequence can get the
old behavior by setting defType=lucenePlusSort as an invariant param. ... 
anyone want to submit a patch?)

-Hoss



Re: arcane queryParser parseException

2009-02-23 Thread Ryan McKinley


On Feb 23, 2009, at 9:13 PM, Chris Hostetter wrote:



: server:/solr/select?q=field:"'anything' anything can go here\;" -- 
> No

: problem (but ClientUtils's escape does not escape semicolons.)

ClientUtils doesn't escape it because it's not a special character  
in the

SolrQueryParser.



I went ahead and added it since it does not hurt anything to escape  
more things -- it just makes the final string ugly.


In 1.3 the escape method covered everything:

http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.3.0/client/java/solrj/src/org/apache/solr/client/solrj/util/ClientUtils.java

 public static String escapeQueryChars( String input )
  {
Matcher matcher = escapePattern.matcher( input );
return matcher.replaceAll( "$1" );
  }




Re: arcane queryParser parseException

2009-02-23 Thread Chris Hostetter

: I went ahead and added it since it does not hurt anything to escape more
: things -- it just makes the final string ugly.

: In 1.3 the escape method covered everything:

H good call, i didn't realize the escape method had been so 
blanket in 1.3.  this way we protect people who were using it in 1.3 and 
relied on it to protect them from the legacy ";" behavior.

-Hoss