why is query so slow

2009-03-17 Thread pcurila

hello,

I created index with 1.5m docs. When I am post query without facets it
returns in a moment.
When I post query with one facets it takes 14s.


0
14263
−

true
on
1
0
zamok

-1
wasCreatedBy_fct
10
2.2





when I add filter that returns only one docs it takes same time



0
13249
−

true
on
1
0
zamok

-1
wasCreatedBy_fct
wasCreatedBy_fct="Martin Benka"
10
2.2


−


Why?
Can anybody explain me what am I doing wrong and how to speed up response
time.

Peter
-- 
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: why is query so slow

2009-03-17 Thread Erik Hatcher
How many terms are in the wasCreatedBy_fct field?   How is that field  
and its type configured?


Solr 1.3?  Or trunk?  Trunk contains massive faceting speed  
improvements.


Erik



On Mar 17, 2009, at 4:21 AM, pcurila wrote:



hello,

I created index with 1.5m docs. When I am post query without facets it
returns in a moment.
When I post query with one facets it takes 14s.


0
14263
−

true
on
1
0
zamok

-1
wasCreatedBy_fct
10
2.2





when I add filter that returns only one docs it takes same time



0
13249
−

true
on
1
0
zamok

-1
wasCreatedBy_fct
wasCreatedBy_fct="Martin Benka"
10
2.2


−


Why?
Can anybody explain me what am I doing wrong and how to speed up  
response

time.

Peter
--
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: why is query so slow

2009-03-17 Thread Toby Cole

Peter,
	If possible try running a 1.4-snapshot of Solr, the faceting  
improvements are quite remarkable.
However, if you can't run unreleased code, it might be an idea to try  
reducing the number of unique terms (try indexing surnames only?).

Toby.

On 17 Mar 2009, at 10:01, pcurila wrote:



I am using 1.3


How many terms are in the wasCreatedBy_fct field?   How is that field
and its type configured?

field contains author names and there are lots of them.

here is type configuration:

positionIncrementGap="100">






stored="true"

multiValued="true"/>




--
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html
Sent from the Solr - User mailing list archive at Nabble.com.



Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: why is query so slow

2009-03-17 Thread pcurila

I am using 1.3

> How many terms are in the wasCreatedBy_fct field?   How is that field  
> and its type configured? 
field contains author names and there are lots of them. 

here is type configuration:












-- 
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html
Sent from the Solr - User mailing list archive at Nabble.com.



Lemmatisation search in Solr

2009-03-17 Thread dabboo

Hi,

I am implementing Lemmatisation in Solr, which means if user looks for
"Mouse" then it should display results of Mouse and Mice both. I understand
that this is something context search. I think of using synonym for this but
then synonyms.txt will be having so many records and this will keep on
adding.

Please suggest how I can implement it in some other way.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
Sent from the Solr - User mailing list archive at Nabble.com.



Special Characters search in solr

2009-03-17 Thread dabboo

Hi,

I am searching with any query string, which contains special characters like
è in it. for e.g. If I search for tèst then it shud return all the results
which contains tèst and test etc. There are other special characters also.

I have updated my server.xml file of tomcat server and included UTF-8 as
encoding type in the server entry but still it is not working.

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
Sent from the Solr - User mailing list archive at Nabble.com.



stop word search

2009-03-17 Thread revas
Hi,

I have a query like this

content:the AND iuser_id:5

which means return all docs of user id 5 which have the word "the" in
content .Since 'the' is a stop word ,this query executes as just user_id :5
inspite of the "AND" clause ,Whereas the expected result here is since there
is no result for  "the " ,no results shloud be returned.

Am i missing anythin here?

Regards


Re: Indexing the directory

2009-03-17 Thread Eric Pugh

Victor,

I'd recommend look at the tutorial at http://lucene.apache.org/solr/tutorial.html 
 and using the list for more specific questions.  Also, there a list  
of companies (as well as mine!) that do support of Solr at http://wiki.apache.org/solr/Support 
 that eTrade can contract with to provide indepth support.


Eric Pugh

On Mar 16, 2009, at 6:25 PM, Huang, Zijian(Victor) wrote:




Hi, all:
   I am new to SOLR, can anyone please tell me what do I do to index
a some text files in a local directory?

Thanks

Victor




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: Lemmatisation search in Solr

2009-03-17 Thread Grant Ingersoll
Have you looked for any open source lemmatizers?  I didn't find any in  
a quick search, but there probably are some out there.


Also, is there a particular reason you are after lemmatization instead  
of stemming?  Maybe a "light" stemmer plus synonyms might suffice?


On Mar 17, 2009, at 6:02 AM, dabboo wrote:



Hi,

I am implementing Lemmatisation in Solr, which means if user looks for
"Mouse" then it should display results of Mouse and Mice both. I  
understand
that this is something context search. I think of using synonym for  
this but

then synonyms.txt will be having so many records and this will keep on
adding.

Please suggest how I can implement it in some other way.

Thanks,
Amit Garg
--
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Special Characters search in solr

2009-03-17 Thread Grant Ingersoll
You will need to create a field that handles the accents in order to  
do this.  Start by looking at the ISOLatin1AccentFilter.


-Grant

On Mar 17, 2009, at 7:31 AM, dabboo wrote:



Hi,

I am searching with any query string, which contains special  
characters like
è in it. for e.g. If I search for tèst then it shud return all the  
results
which contains tèst and test etc. There are other special characters  
also.


I have updated my server.xml file of tomcat server and included  
UTF-8 as

encoding type in the server entry but still it is not working.

Please suggest.

Thanks,
Amit Garg
--
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Lemmatisation search in Solr

2009-03-17 Thread dabboo

stemming and synonyms are working fine in the application but these are
working individually. I guess I will need to add the values in synomyms.txt
to achieve it. Am I right?

Actually its the project requirement to implement the lemmatisation. I also
looked out for lemmatisation but couldnt get any.

Thanks,
Amit



Grant Ingersoll-6 wrote:
> 
> Have you looked for any open source lemmatizers?  I didn't find any in  
> a quick search, but there probably are some out there.
> 
> Also, is there a particular reason you are after lemmatization instead  
> of stemming?  Maybe a "light" stemmer plus synonyms might suffice?
> 
> On Mar 17, 2009, at 6:02 AM, dabboo wrote:
> 
>>
>> Hi,
>>
>> I am implementing Lemmatisation in Solr, which means if user looks for
>> "Mouse" then it should display results of Mouse and Mice both. I  
>> understand
>> that this is something context search. I think of using synonym for  
>> this but
>> then synonyms.txt will be having so many records and this will keep on
>> adding.
>>
>> Please suggest how I can implement it in some other way.
>>
>> Thanks,
>> Amit Garg
>> -- 
>> View this message in context:
>> http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22558113.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Special Characters search in solr

2009-03-17 Thread dabboo

I have added this filter factory in my schema.xml also but still that is not
working. I am sorry but I didnt get as how to create the field to handle the
accents.

Please help.


Grant Ingersoll-6 wrote:
> 
> You will need to create a field that handles the accents in order to  
> do this.  Start by looking at the ISOLatin1AccentFilter.
> 
> -Grant
> 
> On Mar 17, 2009, at 7:31 AM, dabboo wrote:
> 
>>
>> Hi,
>>
>> I am searching with any query string, which contains special  
>> characters like
>> è in it. for e.g. If I search for tèst then it shud return all the  
>> results
>> which contains tèst and test etc. There are other special characters  
>> also.
>>
>> I have updated my server.xml file of tomcat server and included  
>> UTF-8 as
>> encoding type in the server entry but still it is not working.
>>
>> Please suggest.
>>
>> Thanks,
>> Amit Garg
>> -- 
>> View this message in context:
>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558192.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Special Characters search in solr

2009-03-17 Thread dabboo

This is the entry in schema.xml


  











  
  
  

  







 

   
  




dabboo wrote:
> 
> I have added this filter factory in my schema.xml also but still that is
> not working. I am sorry but I didnt get as how to create the field to
> handle the accents.
> 
> Please help.
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> You will need to create a field that handles the accents in order to  
>> do this.  Start by looking at the ISOLatin1AccentFilter.
>> 
>> -Grant
>> 
>> On Mar 17, 2009, at 7:31 AM, dabboo wrote:
>> 
>>>
>>> Hi,
>>>
>>> I am searching with any query string, which contains special  
>>> characters like
>>> è in it. for e.g. If I search for tèst then it shud return all the  
>>> results
>>> which contains tèst and test etc. There are other special characters  
>>> also.
>>>
>>> I have updated my server.xml file of tomcat server and included  
>>> UTF-8 as
>>> encoding type in the server entry but still it is not working.
>>>
>>> Please suggest.
>>>
>>> Thanks,
>>> Amit Garg
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: stop word search

2009-03-17 Thread Erick Erickson
Well, by definition, using an analyzer that removes stopwords
*should* do this at query time. This assumes that you used
an analyzer that removed stopwords at index and query time.
The stopwords are not in the index.

You can get the behavior you expect by using an analyzer at
query time that does NOT remove stopwords, and one at
indexing time that *does* remove stopwords. Gut I'm having a
hard time imagining that this would result in a good user experience.

I mean anytime that you had a stopword in the query where the
stopword was required, no results would be returned. Which would
be hard to explain to a user

What is it you're trying to accomplish?

Best
Erick



On Tue, Mar 17, 2009 at 7:40 AM, revas  wrote:

> Hi,
>
> I have a query like this
>
> content:the AND iuser_id:5
>
> which means return all docs of user id 5 which have the word "the" in
> content .Since 'the' is a stop word ,this query executes as just user_id :5
> inspite of the "AND" clause ,Whereas the expected result here is since
> there
> is no result for  "the " ,no results shloud be returned.
>
> Am i missing anythin here?
>
> Regards
>


Re: Special Characters search in solr

2009-03-17 Thread Erick Erickson
Did you reindex after you incorporated the ISOLatin... filter?

On Tue, Mar 17, 2009 at 8:40 AM, dabboo  wrote:

>
> This is the entry in schema.xml
>
> omitNorms="true">
>  
>
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>
>
>  
>  
>
> 
> ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>
>  outputUnigramIfNoNgram="true" maxShingleSize="99"/>
>
>
>  
>
>
>
>
> dabboo wrote:
> >
> > I have added this filter factory in my schema.xml also but still that is
> > not working. I am sorry but I didnt get as how to create the field to
> > handle the accents.
> >
> > Please help.
> >
> >
> > Grant Ingersoll-6 wrote:
> >>
> >> You will need to create a field that handles the accents in order to
> >> do this.  Start by looking at the ISOLatin1AccentFilter.
> >>
> >> -Grant
> >>
> >> On Mar 17, 2009, at 7:31 AM, dabboo wrote:
> >>
> >>>
> >>> Hi,
> >>>
> >>> I am searching with any query string, which contains special
> >>> characters like
> >>> è in it. for e.g. If I search for tèst then it shud return all the
> >>> results
> >>> which contains tèst and test etc. There are other special characters
> >>> also.
> >>>
> >>> I have updated my server.xml file of tomcat server and included
> >>> UTF-8 as
> >>> encoding type in the server entry but still it is not working.
> >>>
> >>> Please suggest.
> >>>
> >>> Thanks,
> >>> Amit Garg
> >>> --
> >>> View this message in context:
> >>>
> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr search with Auto Spellchecker

2009-03-17 Thread Shyamsunder Reddy
I have the same question in mind. How can I configure the same standard request 
handler to handle the spell check for given query?
I mean instead of calling 
http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=globl for 
spelling checking the following query request
should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=globl


--- On Wed, 3/11/09, Shalin Shekhar Mangar  wrote:

From: Shalin Shekhar Mangar 
Subject: Re: Solr search with Auto Spellchecker
To: solr-user@lucene.apache.org
Date: Wednesday, March 11, 2009, 9:33 AM

On Wed, Mar 11, 2009 at 7:00 PM, Narayanan, Karthikeyan <
karthikeyan.naraya...@gs.com> wrote:

> Is it possible get the search results from the spell corrected word in a
> single solr search query?.  Like I search for the word "globl" and the
> correct spelling is "global".. The query should return results matching
> with the word "global".  Would appreciate any ideas..
>
>
No, you'll need to make two queries.

-- 
Regards,
Shalin Shekhar Mangar.



  

Re: How to use spell checker

2009-03-17 Thread Shyamsunder Reddy
How can I configure the same standard request handler to handle the spell check 
for given query? I mean instead of calling 
http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=elepents for 
spelling checking the following query request
should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=elepents

Thanks


--- On Tue, 3/3/09, Grant Ingersoll  wrote:

From: Grant Ingersoll 
Subject: Re: How to use spell checker
To: solr-user@lucene.apache.org
Date: Tuesday, March 3, 2009, 2:03 PM

See http://wiki.apache.org/solr/SpellCheckComponent


On Mar 3, 2009, at 1:23 AM, dabboo wrote:

> 
> Hi,
> 
> I am trying to implement the spell check feature in solr with lucene. for
> e.g. if any record contains "elephants" and user enters "elepents", even
> then also, it should return the results with the correct spelling i.e.
> "elephants".
> 
> Please suggest.
> 
> Thanks,
> Amit Garg
> --View this message in context: 
> http://www.nabble.com/How-to-use-spell-checker-tp22303127p22303127.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search




  

spellchecker: returning results even with misspelt words

2009-03-17 Thread Ingo Renner

Hi all,

I'd like to achieve the following:

When searching for e.g. two words, one of them being spelt correctly  
the other one misspelt I'd like to receive results for the correct  
word but would still like to get spelling suggestions for the wrong  
word.


Currently when I search for misspelt words I get suggestions, but no  
results at all although there would be results when searching for the  
correct word only.


Hope you understand what I want to achieve as it's a little hard to  
explain.



all the best
Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





Re: Solr search with Auto Spellchecker

2009-03-17 Thread Ingo Renner


Am 17.03.2009 um 14:39 schrieb Shyamsunder Reddy:

I have the same question in mind. How can I configure the same  
standard request handler to handle the spell check for given query?
I mean instead of calling http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=globl 
 for spelling checking the following query request

should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=globl


in your solrconfig.xml add the spellchecker component to your standard  
request handler.


  default="true">


 
   explicit
   
 
 
   spellcheck
 
  


best
Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





Re: Special Characters search in solr

2009-03-17 Thread dabboo

Yes, I did and below is my debugQuery result.

 
- 
- 
  0 
  47 
- 
  10 
  0 
  on 
  Colo� 
  dismaxrequest 
  true 
  2.2 
  
  
   
- 
  Colo� 
  Colo� 
  +DisjunctionMaxQuery((programJacketImage_program_s:colo |
courseCodeSeq_course_s:colo | authorLastName_product_s:colo |
era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo |
discCode_course_s:colo | sourceGroupName_course_s:colo |
indexType_course_s:colo | prdMainTitle_product_s:colo |
isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo |
discipline_product_s:colo | courseJacketImage_course_s:colo |
imprint_product_s:colo | introText_program_s:colo |
productType_product_s:colo | isbn13_product_s:colo |
copyrightYear_product_s:colo | prdPubDate_product_s:colo |
programType_program_s:colo | editor_product_s:colo |
courseType_course_s:colo | courseId_course_s:colo |
categoryIds_product_s:colo | contentType_product_s:colo |
indexType_program_s:colo | strapline_product_s:colo |
subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo
| subject_product_s:colo | edition_product_s:colo | IndexId_s:colo |
programId_program_s:colo)~0.01) () all:english^90.0 all:hindi^123.0
all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0
all:erer^4545.0 
  +(programJacketImage_program_s:colo |
courseCodeSeq_course_s:colo | authorLastName_product_s:colo |
era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo |
discCode_course_s:colo | sourceGroupName_course_s:colo |
indexType_course_s:colo | prdMainTitle_product_s:colo |
isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo |
discipline_product_s:colo | courseJacketImage_course_s:colo |
imprint_product_s:colo | introText_program_s:colo |
productType_product_s:colo | isbn13_product_s:colo |
copyrightYear_product_s:colo | prdPubDate_product_s:colo |
programType_program_s:colo | editor_product_s:colo |
courseType_course_s:colo | courseId_course_s:colo |
categoryIds_product_s:colo | contentType_product_s:colo |
indexType_program_s:colo | strapline_product_s:colo |
subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo
| subject_product_s:colo | edition_product_s:colo | IndexId_s:colo |
programId_program_s:colo)~0.01 () all:english^90.0 all:hindi^123.0
all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0
all:erer^4545.0 
   
  DismaxQParser 


It is actually converting "Coloèr" to "Colo�" and hence not searching. It is
behaving the same even before adding the ISOLatin1AccentFilter.

Please suggest.

Thanks,
Amit Garg

Erick Erickson wrote:
> 
> Did you reindex after you incorporated the ISOLatin... filter?
> 
> On Tue, Mar 17, 2009 at 8:40 AM, dabboo  wrote:
> 
>>
>> This is the entry in schema.xml
>>
>>> positionIncrementGap="100"
>> omitNorms="true">
>>  
>>
>>
>>
>>
>>>ignoreCase="true"
>>words="stopwords.txt"
>>enablePositionIncrements="true"
>>/>
>>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>> protected="protwords.txt"/>
>>
>>
>>
>>  
>>  
>>
>> 
>>> ignoreCase="true" expand="true"/>
>>> words="stopwords.txt"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>> protected="protwords.txt"/>
>>
>>
>> > outputUnigramIfNoNgram="true" maxShingleSize="99"/>
>>
>>
>>  
>>
>>
>>
>>
>> dabboo wrote:
>> >
>> > I have added this filter factory in my schema.xml also but still that
>> is
>> > not working. I am sorry but I didnt get as how to create the field to
>> > handle the accents.
>> >
>> > Please help.
>> >
>> >
>> > Grant Ingersoll-6 wrote:
>> >>
>> >> You will need to create a field that handles the accents in order to
>> >> do this.  Start by looking at the ISOLatin1AccentFilter.
>> >>
>> >> -Grant
>> >>
>> >> On Mar 17, 2009, at 7:31 AM, dabboo wrote:
>> >>
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am searching with any query string, which contains special
>> >>> characters like
>> >>> è in it. for e.g. If I search for tèst then it shud return all the
>> >>> results
>> >>> which contains tèst and test etc. There are other special characters
>> >>> also.
>> >>>
>> >>> I have updated my server.xml file of tomcat server and included
>> >>> UTF-8 as
>> >>> encoding type in the server entry but still it is not working.
>> >>>
>> >>> Please suggest.
>> >>>
>> >>> Thanks,
>> >>> Amit Garg
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://

Solr: delta-import, help needed

2009-03-17 Thread Giovanni De Stefano
Hello all,

I have a table TEST in an Oracle DB with the following columns: URI
(varchar), CONTENT (varchar), CREATION_TIME (date).

The primary key both in the DB and Solr is URI.

Here is my data-config.xml:


  
  

  
  

  


The problem is that anytime I perform a delta-import, the index keeps being
populated as if new documents were added. In other words, I am not able to
UPDATE an existing document or REMOVE a document that is not anymore in the
DB.

What am I missing? How should I specify my deltaQuery?

Thanks a lot in advance!

Giovanni


Re: Is optimize always a commit?

2009-03-17 Thread sunnyfr

Hi 

If I want to commit without optimize.
Because Ive that : > start
commit(optimize=true,waitFlush=false,waitSearcher=true)
but I don't want to optimize otherwise my replication will take every time
the full index folder.

Thanks a lot guys for ur help,



ryantxu wrote:
> 
> yes.  optimize also commits
> 
> Maximilian Hütter wrote:
>> Hi,
>> 
>> maybe this is a stupid question, but is a optimize always a commit?
>> In the log it looks like it:
>> 
>> start commit(optimize=true,waitFlush=false,waitSearcher=true)
>> 
>> I just wanted to be sure.
>> 
>> Best regards,
>> 
>> Max
>> 
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-optimize-always-a-commit--tp15498266p22562206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellchecker: returning results even with misspelt words

2009-03-17 Thread Shyamsunder Reddy
I think if you use spellcheck.collate=true, you will still receive the
results for correct word and suggestion for wrong word.

I have name field (which is first name+last name) configured for spell
check. I have name entry: GUY  SHUMAKER. I am trying to find out person
names where either 'GUY' or 'SHUMAKER' or both are spelled wrong.

1. Last Name spelled wrong as 'SHAMAKER'

http://localhost:8090/solr/select?q=NAME:GUY%20SHAMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true

It return all results that match 'GUY' and spelling suggestion for
'SHAMAKER' as 'SHUMAKER'

2. First Name spelled wrong as 'GYY'

http://localhost:8090/solr/select?q=NAME:GYY
SHUMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true

It return No results and spelling suggestion for 'GYY' as 'GUY' and
collation as NAME:guy SHUMAKER

Note:But here I expected result that match SHUAMKER

3. Both first name and last name spelled wrong as: GYY SHAMAKER
http://localhost:8090/solr/select?q=NAME:GYY%20SHAMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true

Here no results, but received suggestion for both words and collation.

It is similar to your scenario?

Also why NO results are returned for case 2.

--- On Tue, 3/17/09, Ingo Renner  wrote:

From: Ingo Renner 
Subject: spellchecker: returning results even with misspelt words
To: solr-user@lucene.apache.org
Date: Tuesday, March 17, 2009, 9:52 AM

Hi all,

I'd like to achieve the following:

When searching for e.g. two words, one of them being spelt correctly the other 
one misspelt I'd like to receive results for the correct word but would still 
like to get spelling suggestions for the wrong word.

Currently when I search for misspelt words I get suggestions, but no results at 
all although there would be results when searching for the correct word only.

Hope you understand what I want to achieve as it's a little hard to explain.


all the best
Ingo

--Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2






  

Re: stemming (maybe?) question

2009-03-17 Thread Jon Drukman

Yonik Seeley wrote:

Not sure... I just took the stock solr example, and it worked fine.

I inserted "o'meara" into example/exampledocs/solr.xml
 Advanced o'meara Full-Text Search
Capabilities using Lucene

the indexed everything:  ./post.sh *.xml

Then queried in various ways:
q=o'meara
q=omeara
q=o%20meara

All of the queries found the solr doc.


i grabbed the original example schema.xml and made my username field use 
the following definition:


positionIncrementGap="100">

  

generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>



  
  

synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>



  



i removed the stopwords and porter stuff because for proper names i 
don't want that.


seems to work fine now, thanks!
-jsd-



Re: optimize an index as fast as possible

2009-03-17 Thread Marc Sturlese

Thanks Mark, that really did the job! The speed loss in update time is more
than compensated at optimizing time!

Now I am trying to do another test... but not sure if Lucene have this
option, I am using Lucene 2.9-dev.

As I am working with 3G index and always have to optimize (as I said before,
I tried not to optimize to send my index via rsync faster but the speed loss
to serve request in the slaves was huge). I wander if it's possible to do
"block optimizing" (I have just invented the word). The example would be...
I have a 3G index optimized. I start executing updates to the index.. would
be possible to keep doing optimizes just on the new created segments?... so
I would still have the 3G index and would be building another big index from
the segments created from the updates This way I would have to send via
rsync to the slaves just the new "blog" (suposing the slaves already had the
3G index because I would have sended it before).
Is there any way to do something similar to that?
This has come to my mind cause I have to serve the index to the slaves as
many times as possible... and optimizing the index in just one "block" makes
rsync job to take a long time.

Thanks in advance


markrmiller wrote:
> 
> Marc Sturlese wrote:
>> Hey there,
>> I am creating an index of 3G... it's fast indexing but optimization takes
>> about 10 min. I need to optimize it every time I update as if I don't do
>> that, search requests will be much slower.
>> Wich parameter configuration would be the best to make optimize as fast
>> as
>> possible (I don't mind to use a lot of memory, at least for testing, if I
>> can speed up the process).
>> Actually I am using for the IndexWriter:
>>
>> 1024
>> 2147483647
>> 1
>> 1000
>> 1
>> 10
>> Am I missing any important parameter to do that job?
>> Thanks in advance
>>
>>   
> How about using a merge factor of 2? This way you are pretty much always 
> optimized (old large segment, new small segment at most) - you pay a bit 
> in update speed, but I've found it to be very reasonable for many 
> applications.
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/optimize-an-index-as-fast-as-possible-tp22543267p22565158.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Shard Query Problem

2009-03-17 Thread Chris Hostetter

: here is the whole file, if it helps

as i said before, i don't know much about the inner workings of 
distributed search, but nothing about your config seems odd to me.  it 
seems like it should work fine.

a wild the shot in the dark: instead of using a requestHandler named 
"standard" and urls that start with 
"http://lca2-s5-pc04:8080/solr/select?"; try registering a handler name 
starting with a slash (ie: http://lca2-s5-pc04:8080/solr/foo?shards=...";)

This suggestion is based on the suposition that *maybe* the legacy support 
for /select and the qt param doesn't play nicely with distributed 
searching ... but as i said, this is really just a wild guess.


: 
: 
: 
: 
:   
: 
: 
${solr.abortOnConfigurationError:true}
: 
:   
:   ${solr.data.dir:./solr/data}
: 
: 
:   
:
: false
: 
: 10
: 
: 
: 
: 32
: 2147483647
: 1
: 1000
: 1
: 
: 
: 
: 
: 
: 

: 
: 
: 
: 

: 
: 
: single
:   
: 
:   
: 
: false
: 32
: 10
: 
: 
: 2147483647
: 1
: 
: 
: false
:   
: 
:   
:   
: 
:   
:   
: 
: 
: 
: 
: 
:   1
:   100
: 
: 
: 
: 
: 
: 
: 
:   
: 
: 
:   
: 
: 1024
: 
: 
: 
: 
: 
:
: 
: 
:   
: 
: 
: 
: true
: 
: 
: 
: 
:
: 
:
: 50
: 
: 
: 200
: 
: 
: 
: 
: 
: 
: 
:   
:  solr 0 10 
:  rocks 0 10 
: static newSearcher warming query from
: solrconfig.xml
:   
: 
: 
: 
: 
:   
:  fast_warm 0 10 
: static firstSearcher warming query from
: solrconfig.xml
:   
: 
: 
: 
: false
: 
: 
: 2
: 
:   
: 
:   
:   
: 
: 
: 
: 
: 
: 
:
:
:
: 
:   
: 
: 
:   
:   
: 
:  
:explicit
: 
:10
:*
:2.1
: 
:  
:   
: 
: 
:   
:   
: 
:  dismax
:  explicit
:  0.01
:  
: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
:  
:  
: text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
:  
:  
: ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
:  
:  
: id,name,price,score
:  
:  
: 2<-1 5<-2 6<90%
:  
:  100
:  *:*
:  
:  text features name
:  
:  0
:  
:  name
:  regex 
: 
:   
: 
:   
:   
: 
:  dismax
:  explicit
:  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
:  2<-1 5<-2 6<90%
:  
:  incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2
: 
: 
: 
:   inStock:true
: 
: 
: 
:   cat
:   manu_exact
:   price:[* TO 500]
:   price:[500 TO *]
: 
:   
: 
: 
:   
: 
:
:   
: 
: textSpell
: 
: 
:   default
:   spell
:   ./spellchecker1
: 
: 
: 
:   jarowinkler
:   spell
:   
:   org.apache.lucene.search.spell.JaroWinklerDistance
:   ./spellchecker2
: 
: 
: 
: 
:   solr.FileBasedSpellChecker
:   file
:   spellings.txt
:   UTF-8
:   ./spellcheckerFile
: 
:   
: 
:   
:   
: 
:   
:   false
:   
:   false
:   
:   1
: 
: 
:   spellcheck
: 
:   
: 
:   
:   
: 
: string
: elevate.xml
:   
: 
:   
:   
: 
:   explicit
: 
: 
:   elevator
: 
:   
: 
: 
:   
:   
: 
:   
:   
: 
: 
:   
:   
: 
: 
:   
:   
: 
:   
:   
: 
:   standard
:   solrpingquery
:   all
: 
:   
: 
:   
:   
: 
:  explicit 
:  true
: 
:   
: 
:   
:
:
:
: 
:  100
: 
:
: 
:
:
: 
:   
:   70
:   
:   0.5
:   
:   [-\w ,/\n\"']{20,200}
: 
:
: 
:
:
: 
:  
:  
: 
:
:   
: 
: 
:   
: 
:   
:   
: 5
:   
: 
: 
:   
: 
:   
: 
:   
:   
: solr
: 
: 
:   
: 
: 
: 
: Thanks,
: Cheers,
: Anshul
: 
: On Fri, Mar 6, 2009 at 7:53 PM, Anshul jain  wrote:
: 
: > Hi Chris,
: >
: > Thanks for the reply. here are the requesthandlers from solrconfig.xml:
: >
: >
: > 
: > 
: >  
: >explicit
: >
: >10
: >*
: >2.1
: >
: >  
: >   
: >
: >
: > 
: > 
: >  dismax
: >  explicit
: >  0.01
: >  
: > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
: >  
: >  
: > text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
: >  
: >  
: > ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
: >  
: >  
: > id,name,price,score
: >  
: >  
: > 2<-1 5<-2 6<90%
: >  
: >  100
: >  *:*
: >  
: >
: >  text features name
: >  
: >  0
: >  
: >  name
: >  regex 
: > 
: >   
: >
: > Thanks,
: > Anshul
: >
: >
: >
: > On Wed, Mar 4, 2009 at 9:50 AM, Chris Hostetter 
wrote:
: >
: >>
: >

More replication questions

2009-03-17 Thread Vauthrin, Laurent
Hello,

 

I have a couple of questions relating to replication in Solr.  As far as
I understand it, the replication approach for both 1.3 and 1.4 involves
having the slaves poll the master for updates to the index.  We're
curious to know if it's possible to have a more dynamic/quicker way to
propagate updates.

 

1.   Is there a built-in mechanism for pushing out
updates(/inserts/deletes) received by the master to the slaves?

2.   Is it discouraged to post updates to multiple Solr instances?
(all instances can receive updates and fulfill query requests)

3.   If that sort of capability is not supported, why was it not
implemented this way?  (So that we don't repeat any mistakes)

4.   Has anyone else on the list attempted to do this?  The intent
here is to achieve optimal performance while have the freshest data
possible if that's possible.

 

Thanks,
Laurent



Solr SpellCheker configuration for multiple fields same time

2009-03-17 Thread Shyamsunder Reddy
My advanced search option allows users to search for three different fields 
same time.
The fields are - first name, last name and org name. Now I have to add spell 
checking feature for the fields.

When wrong spelling is entered for each of these words like first name: jahn, 
last name: smath, org and org name: bpple

the search result should return a suggestion like (collation) firstname:john 
AND lastname:smith AND orgname: apple


What is the best approach to implement spell checking for these three different 
fields:

1. Build a single directory for all fields by copying them into a 'spell' field 
as:
    schema.xml configuration
    
    
  
    
    
    
  
    
    
    
 
    
    
    
    
    
  
    solrconfig.xml configuration
        
    textSpell    
    
  default
  spell
  ./spellchecker
    
    

Now the queries:
1a. 
/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true

The spell check searches against the dictionary './spllechecker' returns the 
suggestions as
FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as expected.

1b. /select?q=LAST_NAME:jahn&spellcheck=true
The spell check searches against the dictionary './spllechecker' returns the 
suggestions for LAST_NAME as 'john'
But there is no last name 'john' for the field LAST_NAME. So the sub sequent 
search returns NO results, which is not accepted.

So, this approach seems to be wrong for me..

2. Build a separate directory for each field. 
    schema.xml configuration
    
    
  
    
    
    
  
    
    
    
 
    
    
    
    
    
    
    
  
    solrconfig.xml configuration
        
    textSpell    
    
      firstname
      spell_fname
      ./fname_spellchecker
      
    
      lastname
      spell_lname
      ./lname_spellchecker
      
    
      oname
      spell_org_name
      ./orgname_spellchecker
    
    
    
Now the queries:
1a. 
/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true

How can I mention in the query to search against different dictionaries for 
different fields like
FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker and ORG_NAME 
in orgname_spellchecker?

Or can I make the spell checker to store the field names and its values.

Please discuss my approaches and suggest a solution?



  

NPE creating EmbeddedSolrServer

2009-03-17 Thread Alexandre Rafalovitch
Hello,

I am trying to create a basic single-core embedded Solr instance. I
figured out how to setup a single core instance and got (I believe)
all files in right places. However,  I am unable to run trivial code
without exception:

SolrServer solr = new EmbeddedSolrServer(
new CoreContainer(
"D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm",
new
File("D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr.xml")),
"core");

The exception (with context) is:

WARNING: No queryConverter defined, using default converter
Mar 17, 2009 6:15:01 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@b02928 main
Mar 17, 2009 6:15:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:147)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Mar 17, 2009 6:15:02 PM org.apache.solr.core.SolrCore execute
INFO: [core] webapp=null path=null
params={start=0&q=fast_warm&rows=10} status=500 QTime=47

I am not sure where to look further. Source code (at my level of
knowledge) is not very helpful.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/


Re: Indexing issue

2009-03-17 Thread Chris Hostetter

: I have two cores in different machines which are referring to the same data 
directory.

this isn't really considered a supported configuration ... both solr 
instances are going to try and "own" the directory for updating, and 
unless you do somethign special to ensure only one has control you are
going to have problems...

: below error.   HTTP Status 500 - java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 
: system cannot find the file specified) java.lang.RuntimeException: 
: java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 

...like this.  one core is mucking with the files in a way the other core 
doesn't know about.

: I have changed lockType to simple and none, but still no luck…
: Could you please correct me if I am doing wrong?

"none" isn't going to help you -- it's just going to make the problem 
worse (two misconfigured  instances of Solr in the same JVM could corrupt 
eachother with lockType=none).

"simple" is only going to help you on some filesystems -- sicne you said 
these two solr instances are running on different machines, that implies 
NFS (or something like it) and SimpleFSLockFactory doesn't work reliably 
in those cases.

If you want to get something like this working, you'll probably need 
to setup your own network based lockType (instead of relying on the 
filesystem)


-Hoss


Re: Relevancy and date sorting in same search?

2009-03-17 Thread Chris Hostetter

: I'm trying to think of a way to use both relevancy and date sorting in
: the same search. If documents are recent (say published within the last
: 2 years), I want to use all of the boost functions, BQ parameters, and
: normal Lucene scoring functions, but for documents older than two years,
: I don't want any of those scores to apply - only a sort by date.

Yonik recently commited some code that makes it possible to express a 
function query string that refers to an arbitrary param name which is 
evaluated as a query and the scores for each document are used 
as a ValueSource input for the function.  I imagine you could combine 
this with a custom function that returns the value from one ValueSource if 
it's low enough (the date field) otherwise it returns the value from an 
alternate ValueSource (the query)


-Hoss



Re: muticore setup with tomcat

2009-03-17 Thread Chris Hostetter

You haven't really given us a lot of information to work with...

what shows up in your logs?
what did you name the context fragment file?
where did you put the context fragment file?
where did you put the multicore directory?

sharing *exact* directory lisings and the *exact* commands you've 
executed is much more likely to help people understand what you're seeing.

For example: the SolrTomcat wiki page shows an exact set of shell commands 
to install solr and tomcat on linux or cygwin and get it running against a 
simple example ... if you can provide a similar set commands showing 
*exactly* what you've done, people might be able to spot the problem (or 
try the steps themselve and reproduce the problem)

http://wiki.apache.org/solr/SolrTomcat

: Date: Mon, 9 Mar 2009 14:55:47 +0530

: Hi,
: 
: I am trying to do amulticore set up..
: 
: I added the following from the 1.3 solr download to new dir called multicore
: 
: core0 ,core1,solr.xml and solr.war
: 
: in the tomcat context fragment i have defined as
: 
: 
:
: 
: http://localhost:8080/multicore/admin
: http://localhost:8080/multicore/admin/core0
: 
: The above 2 ursl give me resource not found error
: 
: the solr.xml is the default one from the download.
: 
: Please tell me as to what needs to be changed to make this work in tomcat
: 
: Regards
: Sujatha
: 



-Hoss



Re: multicore file path

2009-03-17 Thread Chris Hostetter

This is a "feature" of the ShowFileRequestHandler -- it doesn't let people 
browse files outside of hte conf directory.

I suppose this behavior could be made configurable (right now the only 
config option is "hidden" for excluding specific files ... we could have 
an option to "allow" files that would normally be hidden)


: Date: Mon, 9 Mar 2009 07:33:48 -0400
: From: "Gargate, Siddharth" 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: multicore file path
: 
: I am trying out multicore environment with single schema and solrconfig
: file. Below is the folder structure 
:  
: Solr/
:conf/
:  schema.xml
:  solrconfig.xml
:core0/
:data/
:core1/
:data/
:tomcat/
: 
: The solrhome property is set in tomcat as -Dsolr.solr.home=../..
: 
: And the solr.xml file is
: 
: 
: 
:   
:   
: 
: 
: 
: Everything is working fine, but when I try to access schema file from
: admin UI I am getting following error
: http://localhost:8080/solr/core0/admin/file/?file=../../conf/schema.xml
:  HTTP Status 403 - Invalid path: ../../conf/schema.xml
:  description Access to the specified resource (Invalid path:
: ../../conf/schema.xml) has been forbidden.
: 
: 



-Hoss



NPE in MultiSegmentReader$MultiTermDocs.doc

2009-03-17 Thread Comron Sattari
I've recently upgraded to Solr 1.3 using Lucene 2.4. One of the reasons I
upgraded was because of the nicer SearchComponent architecture that let me
add a needed feature to the default request handler. Simply put, I needed to
filter a query based on some additional parameters. So I subclassed
QueryComponent and replaced the line

   rb.setQuery( parser.getQuery() );

with one that wrapped the parsed query in a FilteredQuery

  Query query = parser.getQuery();
  String arguments = params.get("param-name");
  if( arguments != null) {
query = new FilteredQuery(query, new MyCustomFilter(arguments));
  }
  rb.setQuery(query);

The filter class I used can be seen here: http://privatepaste.com/021ZH27tKG.
And is nearly verbatim from the Lucene in Action book, when describing a way
to do security filtering.

This seems to work fine, although I'm getting some strange behavior when
exercising this code through some unit tests from my Rails app. Sometimes I
get an NPE when doing the filtering.

  at top level in at
org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.doc(MultiSegmentReader.java
at line 533
  at $MyCustomFilter.bits(Unknown Source)
  at top level in at org.apache.lucene.search.Filter.getDocIdSet(Filter.java
at line 49
  at top level in at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java at line
105
  at top level in at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java at line 132
  at top level in at org.apache.lucene.search.Searcher.search(Searcher.java
at line 126

After some detective work I decided the problem had to do with an empty
index, and the termDocs iterator has 0 elements to iterate over and was
throwing this error. Since there is no size() method or analogous method on
the TermDocs iterator, I decided to use docFreq(term) as you can see in the
source code. But this didn't solve my problem either. This error is being
throw even when docFreq(term) returns more then 0 documents. I can't for the
life of me figure out why this iterator's doc() method throwing an NPE.
(Well I can deduce that the current member is null, but I don't know why.)
Is the index corrupted? I can see record in the index that should match my
Term through the solr /admin/ interface, and docFreq(term) returns a number
> 0. Yet this NPE keeps showing up.

Any help or guidance would be appreciated. If you need to see more source
I'd be happy to provide it, but I'm sure thats all the relevant stuff.

(I cross posted this to both Solr and Lucene lists.)

Comron


Re: muticore setup with tomcat

2009-03-17 Thread Cheng Zhang

below is my setup, 


   


then under /home/zhangyongjiang/applications/solr, I have solr.xml as below,


 
  
  
  
  
 


under /home/zhangyongjiang/applications/solr, I created core1/, core2/, core3/, 
core4 subdirectories.


hope it helps.





- Original Message 
From: Chris Hostetter 
To: solr-user@lucene.apache.org
Sent: Tuesday, March 17, 2009 3:46:11 PM
Subject: Re: muticore setup with tomcat


You haven't really given us a lot of information to work with...

what shows up in your logs?
what did you name the context fragment file?
where did you put the context fragment file?
where did you put the multicore directory?

sharing *exact* directory lisings and the *exact* commands you've 
executed is much more likely to help people understand what you're seeing.

For example: the SolrTomcat wiki page shows an exact set of shell commands 
to install solr and tomcat on linux or cygwin and get it running against a 
simple example ... if you can provide a similar set commands showing 
*exactly* what you've done, people might be able to spot the problem (or 
try the steps themselve and reproduce the problem)

http://wiki.apache.org/solr/SolrTomcat

: Date: Mon, 9 Mar 2009 14:55:47 +0530

: Hi,
: 
: I am trying to do amulticore set up..
: 
: I added the following from the 1.3 solr download to new dir called multicore
: 
: core0 ,core1,solr.xml and solr.war
: 
: in the tomcat context fragment i have defined as
: 
: 
:
: 
: http://localhost:8080/multicore/admin
: http://localhost:8080/multicore/admin/core0
: 
: The above 2 ursl give me resource not found error
: 
: the solr.xml is the default one from the download.
: 
: Please tell me as to what needs to be changed to make this work in tomcat
: 
: Regards
: Sujatha
: 



-Hoss


Re: custom hitcollector example

2009-03-17 Thread Chris Hostetter

: Can someone point to or provide an example of how to incorporate a 
: custom hitcollector when using Solr?

this is somewhat hard to do in non trivial ways because you would need to 
by-pass a lot of hte Solr code that builds up DocLists and DocSets ... 

if however you don't need either of those, or things that depend on them,
you can write a customm RequestHandler or SearchComponent and
call any method you want on the (Solr)IndexSearcher and pass in your 
HitCollector.

depending on what your HitCollector does you could also use it to build up 
a DocSet that you then pass as a filter to the existing Solr methods 
... assuming the reason you want to use it is to filter results, if you 
want to use it to "visit" every match, you could let solr do the 
search, get the resulting DocSet and then iterate over the each match 
calling your HitCollector. 



-Hoss



Re: Re[2]: the time factor

2009-03-17 Thread Chris Hostetter

: How come if  bq doesn't influence what matches -- that's q -- bq only
: influence
: the scores of existing matches if they also match the bq 

because that's the way it was designed ... "bq" is "boost query" it's 
designed to boost the scores of documents that match the "q" param.

: when I put :
: as bq=(country:FR)^2 (status_official:1 status_new:1)^2.5 
: Ive no result
: 
: if I put just bq=(country:FR)^2  Or bq=(status_official:1 status_new:1)^2.5 
: or even bq=(country:FR)^2 OR (status_official:1 status_new:1)^2.5 
: I will have one result.

i can't explain that ... you'd need to post all of the things people 
usually ask about to trouble shoot what might be happening (configs for 
request handler, full query string, debugQuery="true" output, etc...)



-Hoss



Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: bq works only with q.alt query and not with q queries. So, in your case you
: would be using qf parameter for field boosting, you will have to give both
: the fields in qf parameter i.e. both title and media.

FWIW: that statement is false.  the "boost query" (bq) is added to the 
query regardless of wether "q" or "q.alt" is ultimately used.

if you turn on debugQUery=true and look at your resulting query string, 
you can see exactly what the resulting query is (parsedQuery)

Using the example setup, compare the output from these examples...

http://localhost:8983/solr/select/?q.alt=baz&q=solr&defType=dismax&qf=name+cat&bq=foo&debugQuery=true
http://localhost:8983/solr/select/?q.alt=solr&q=&defType=dismax&qf=name+cat&bq=foo&debugQuery=true


-Hoss



Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: Is not particularly helpful. I tried adding adding a bq argument to my
: search: 
: 
: &bq=media:DVD^2
: 
: (yes, this is an index of films!) but I find when I start adding more
: and more:
: 
: &bq=media:DVD^2&bq=media:BLU-RAY^1.5
: 
: I find the negative results - e.g. films that are DVD but are not
: BLU-RAY get negatively affected in their score. In the end it all seems

that shouldn't be happening ... the outermost BooleanQuery (that the 
main "q" and all of hte "bq" queries are added to) has it's 
"coordFactor" disabled, so documents aren't penalized for not matching bq 
caluses.

What you may be seeing is that the raw numeric score values you see 
getting returned by Solr are lower for documents that match "DVD" when you add 
teh 
"BLU-RAY" bq ... that's totally possible because *absolute* scores from 
one query can't be compared to scores from another query -- what's important is 
that 
the *relative* order of scores from doc1 and doc2 should be consistent 
(ie: the score for a doc matching DVD might go down when you add the 
BLUERAY bq, but the scores for *all* documents not matching BLUERAY should 
go down some)

The important thing to look for is:
  1) are DVD docs sorting higher then they would without the DVD bq?
  2) are BLURAY docs sorting higher then they would without the BLURAY bq?
  3) are two docs that are equivilent except for a DVD?BLUERAY distinction 
 sorting such that the BLURAY doc comes first?


...the answers to all of those should be yes.  if you're seeing otherwise, 
please post the query tostrings for both queries, and the score 
explanations for the docs in question against both queries.




-Hoss



Re: fl wildcards

2009-03-17 Thread Chris Hostetter

FWIW: there has been a lot of dicsussion around how wildcards should work 
in various params that involve field names in the past: search the 
archives for "glob" or "globbing" and you'll find several.

: That makes sense, since hl.fl probably can get away with calculating in the
: writer, and not as part of the core. However, I really need wildcard (or
: globbing) support for field lists as part of the common query parameter "fl".
: Again, if someone can just point me to where the Solr core is using the
: contents of the fl param, I am happy to implement this, if only locally for my
: purposes.

It's complicated... the SolrQueryResponse has a setReturnFields method 
where the RequestHandler can specify which fields should be returned and 
the ResponseWRiters use that when outputing DocList (writer fetches the 
Document by internal id) ... but with the addition of distributed 
searching now there are also SolrDocumentList objects and whoever puts the 
SolrDocumentList in the response decides which fields to populate.

if you grep the code base for CommonParams.FL and setReturnFields you 
should find all of the various touch points.

if you're really interested in pursuing this, some brainstorming on
how to deal with field globs in a very general and robust way were 
discussed about a year ago, and i posted notes on the wiki...

   http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

...but no one has actively pursued it enough to figure out what the real 
ramifications/benefits could be.



-Hoss



Re: Compound word search (maybe DisMaxQueryPaser problem)

2009-03-17 Thread Chris Hostetter

: My original assumption for the DisMax Handler was, that it will just take the
: original query string and pass it to every field in its fieldlist using the
: fields configured analyzer stack. Maybe in the end add some stuff for the
: special options and so ... and then send the query to lucene. Can you explain
: why this approach was not choosen?

because then it wouldn't be the DisMaxRequestHandler.

seriously: the point of dismax is to build up a DisjunctionMaxQuery for 
each "chunk" in the query string and populate those DisjunctionMaxQueries 
with the Queries produced by analyzing that "chunk" against each field in 
the qf -- then all of the DisjunctionMaxQueries are grouped into a 
BooleanQuery with a minNrSHouldMatch.

if you look at the query toString from debugQuery (using a non trivial qf 
param and a q string containing more then one "chunk") you can see what i 
mean.  your example shows it pretty well actaully...

: > : > : > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)

the point is to build those DisjunctionMaxQueries -- so that each "chunk" 
only contributes significantly based on the highest scoring field that 
chunk appears in ... if your example someone typing "blue tooth" can get a 
match when a doc matches blue in one field and tooth in another -- that 
wouldn't be possible with the appraoch you describe.  the Query structure 
also means that a doc where "tooth" appears in both the category and name 
fields but "blue" doesn't appear at all won't score as high as a doc that 
matches "blue" in category and "tooth" in name (allthough you have to look 
at the score explanations to really see hwat i mean by that)


There are certainly a lot of improvements that could be made to dismax ... 
more customiation in terms of how the querystrings is parsed before 
building up the DisjunctionMaxQueries and calling the individual field 
analyzers would certainly be one way it could improve ... but so far no 
one has attempted anything like that.




-Hoss



Re: muticore setup with tomcat

2009-03-17 Thread Chris Hostetter


: below is my setup, 
: 
: 
:
: 

you provided that information before, but you still haven't answered the 
most of the questions i asked you...

: You haven't really given us a lot of information to work with...
: 
: what shows up in your logs?
: what did you name the context fragment file?
: where did you put the context fragment file?
: where did you put the multicore directory?

...the answer to that last question is the only new piece of information 
you provided.

My other comments also still hold true...

: sharing *exact* directory lisings and the *exact* commands you've 
: executed is much more likely to help people understand what you're seeing.

...please cut and paste directory listings of the directories in question, 
cust/paste how you are running tomcat, which directory you are running 
tomcat in, what log messages you are getting, etc...

: For example: the SolrTomcat wiki page shows an exact set of shell commands 
: to install solr and tomcat on linux or cygwin and get it running against a 
: simple example ... if you can provide a similar set commands showing 
: *exactly* what you've done, people might be able to spot the problem (or 
: try the steps themselve and reproduce the problem)
: 
: http://wiki.apache.org/solr/SolrTomcat



-Hoss



Re: Solr 1.4: filter documens using fields

2009-03-17 Thread Chris Hostetter

: I'm using StandardRequestHandler and I wanted to filter results by two fields
: in order to avoid duplicate results (in this case the documents are very
: similar, with differences in fields that are not returned in a query
: response).
...
: I'm manage to do the filtering in the client, but then the paging doesn't work
: as it should (some pages may contain more duplicated results than others).
: Is there a way (query or other RequestHandler) to do this?

not at the moment, but some people have been working on trying to solve 
the broader problem in an efficient way...

https://issues.apache.org/jira/browse/SOLR-236



-Hoss



Re: Operators and Minimum Match with Dismax handler

2009-03-17 Thread Chris Hostetter
: I have an index which we are setting the default operator to AND.
: Am I right in saying that using the dismax handler, the default operator in
: the schema file is effectively ignored? (This is the conclusion I've made
: from testing myself)

correct.

: The issue I have with this, is that if I want to include an OR in my phrase,
: these are effectively getting ignored. The parser is still trying to match
: 100% of the search terms
: 
: e.g. 'lucene OR query' still only finds matches for 'lucene AND query'
: the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) ()

correct.  dismax isn't designed to be used that way (it's a fluke of 
the implementation that using " AND " works at all)

: Does anyone have any advise as to how I could deal with this kkind of
: problem?

i would set your mm to something smaller and let your users use "+" when 
they want to make something required.  if you really want to support the 
AND/OR/NOT type sequence  ... don't use dismax: that type of syntax is 
what the standard parser is for.


-Hoss



Re: Custom handler that forwards a request to another core

2009-03-17 Thread Chris Hostetter

: My problem was that the XMLResponseWriter is using the searcher of the
: original request to get the matching documents (in the method writeDocList
: of the class XMLWriter). Since the DocList contains id from the index of the
: second core, there were not valid in the index of the core receiving the
: request.

correct.  to deal with this type of problem in distributed search, the 
SolrDocumentList class was introduced -- if you call getSearcher() on your 
LocalSolrQueryRequest you can use that to build up a SolrDocumentList from 
the DocList, and then add that to your response.

BTW...

: > public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
: > response)
...
: > request = new LocalSolrQueryRequest(coreToRequest, params);
: > 
: > SolrRequestHandler mlt = coreToRequest.getRequestHandler("/mlt");
: > coreToRequest.execute(mlt, request, response);

this doesn't look safe ... SolrQueryRequest objects need to be closed 
when they're finished, and you aren't doing that here.  as a result the 
searcher ref obtained for the life of the request won't be closed.


-Hoss



Re: com.ctc.wstx.exc.WstxLazyException exception while passing the text content of a word doc to SOLR

2009-03-17 Thread Chris Hostetter

: I am using Apache POI parser to parse a Word Doc and extract the text
: content. Then i am passing the text content to SOLR. The Word document has
: many pictures, graphs and tables. But when i am passing the content to SOLR,
: it fails. Here is the exception trace.
: 
: 09:31:04,516 ERROR [STDERR] Mar 14, 2009 9:31:04 AM
: org.apache.solr.common.SolrException log
: SEVERE: [com.ctc.wstx.exc.WstxLazyException]
: com.ctc.wstx.exc.WstxParsingException: Illegal charact
: er entity: expansion character (code 0x7) not a valid XML character
:  at [row,col {unknown-source}]: [40,18]

the error string is fairly self explanatory: on line 40, column 18 you 
have a character that isn't legal in XML (0x7)

(not all UTF-8 characters are legal in XML)

If search the solr archives for "Illegal character" you'll find lots of 
discussion about this and how to deal with this in general.

You might also want to check out this comment pointing out some advantages 
in using Tika instead of using POI directly...

https://issues.apache.org/jira/browse/LUCENE-1559?#action_12681347

..lastly you might wnat to check out this plugin and do all hte hard work 
server side...

http://wiki.apache.org/solr/ExtractingRequestHandler




-Hoss



RE: Operators and Minimum Match with Dismax handler

2009-03-17 Thread Dean Missikowski (Consultant), CLSA
I'm using dismax with the default operator set to AND, and don't use
Minimum Match (commented out in solrconfig.xml), meaning 100% of the
terms must match.  Then in my application logic I use a regex that
checks if the query contains " OR ", and if it does I add &mm=1 to the
solr request to effectively turn the query into an OR. This trick
doesn't work for complex boolean queries but works for simple xxx OR
yyy. 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 18/03/2009 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Operators and Minimum Match with Dismax handler

: I have an index which we are setting the default operator to AND.
: Am I right in saying that using the dismax handler, the default
operator in
: the schema file is effectively ignored? (This is the conclusion I've
made
: from testing myself)

correct.

: The issue I have with this, is that if I want to include an OR in my
phrase,
: these are effectively getting ignored. The parser is still trying to
match
: 100% of the search terms
: 
: e.g. 'lucene OR query' still only finds matches for 'lucene AND query'
: the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) ()

correct.  dismax isn't designed to be used that way (it's a fluke of 
the implementation that using " AND " works at all)

: Does anyone have any advise as to how I could deal with this kkind of
: problem?

i would set your mm to something smaller and let your users use "+" when

they want to make something required.  if you really want to support the

AND/OR/NOT type sequence  ... don't use dismax: that type of syntax is 
what the standard parser is for.


-Hoss


CLSA CLEAN & GREEN: Please consider our environment before printing this email.
The content of this communication is subject to CLSA Legal and Regulatory 
Notices. 
These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon 
request.




Re: Phrase slop / Proximity search

2009-03-17 Thread Chris Hostetter

: Can I set the phrase slop value to standard request handler? I want it
: to be configurable in solrconfig.xml file.

if you mean when a user enters a query like...

+fieldA:"some phrase" +(fieldB:true fieldC:1234)

..you want to be able to control what slop value gets used for "some 
phrase" then at the moment the only way to configure that is to put it in 
the query string...

+fieldA:"some phrase"~3 +(fieldB:true fieldC:1234)

...it's the kind of thing that could be set as a property on the query 
parser, but no one has implemented that.  (patches welcome!)




-Hoss



Re: More replication questions

2009-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent
 wrote:
> Hello,
>
>
>
> I have a couple of questions relating to replication in Solr.  As far as
> I understand it, the replication approach for both 1.3 and 1.4 involves
> having the slaves poll the master for updates to the index.  We're
> curious to know if it's possible to have a more dynamic/quicker way to
> propagate updates.
>
>
>
> 1.       Is there a built-in mechanism for pushing out
> updates(/inserts/deletes) received by the master to the slaves?
The pull mechanism in 1.4 can be good enough. The 'pollInterval' can
be as small as 1 sec. So you will get the updates within a second
.Isn't it not good enough?
>
> 2.       Is it discouraged to post updates to multiple Solr instances?
> (all instances can receive updates and fulfill query requests)
This is prone to serious errors all the solr instances may not be in sync
>
> 3.       If that sort of capability is not supported, why was it not
> implemented this way?  (So that we don't repeat any mistakes)
A push based replication is in the cards. the implementation is not
trivial. In Solr commits are already expensive s a second's delay may
be alright .
>
> 4.       Has anyone else on the list attempted to do this?  The intent
> here is to achieve optimal performance while have the freshest data
> possible if that's possible.
>
>
>
> Thanks,
> Laurent
>
>



-- 
--Noble Paul


Re: CJKAnalyzer and Chinese Text sort

2009-03-17 Thread Sachin
Created SOLR-1073 in JIRA with the class file:
https://issues.apache.org/jira/browse/SOLR-1073

-- Original Message --
From: Chris Hostetter 
To: solr-user@lucene.apache.org
Subject: Re: CJKAnalyzer and Chinese Text sort
Date: Mon, 16 Mar 2009 21:34:09 -0700 (PDT)


: Thanks Hoss for your comments! I don't mind submitting it as a patch, 
: shall I create a issue in Jira and submit the patch with that? Also, I 

yep, just attach the patch file.

: didn't modify the core solr for locale based sorting; I just added the 
: created a jar file with the class file & copied it over to the lib 
: folder. As part of the patch, shall I add it to the core solr code-base 
: (users who want to use this don't need anything extra to do) or add it 
: as a contrib field (they need to compile it as jar and copy it over to 
: the lib folder)?

go ahead and attach what you've got (Yonik's Law of patches) but i'm 
guessing it would probably make sense if these changes ultimately became 
part of the core StrField ... there shouldn't be any down side (as long as 
it doesn't adversely affect the performance for people that don't want to 
use hte feature)

:   http://wiki.apache.org/solr/HowToContribute


-Hoss




Stuck in a dead end job?? Click to start living your dreams by earning an 
online degree.
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxVwhsXZmn8Fh5mJqQTtwqvDiT5dxityHQk9LzIqLNu2xV1qEwUgbW/


Re: Solr: delta-import, help needed

2009-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
are you sure your schema.xml has a  field to UPDATE docs.

to remove deleted docs you must have deletedPkQuery attribute in the root entity

On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano
 wrote:
> Hello all,
>
> I have a table TEST in an Oracle DB with the following columns: URI
> (varchar), CONTENT (varchar), CREATION_TIME (date).
>
> The primary key both in the DB and Solr is URI.
>
> Here is my data-config.xml:
>
> 
>      driver="oracle.jdbc.driver.OracleDriver"
>    url="jdbc:oracle:thin:@localhost:1521/XE"
>    user="username"
>    password="password"
>  />
>  
>            name="test_item"
>        pk="URI"
>        query="select URI,CONTENT from TEST"
> *        deltaQuery="select URI,CONTENT from TEST where
> TO_CHAR(CREATION_TIME,'-MM-DD HH:MI:SS') >
> '${dataimporter.last_index_time}'" *
>    >
>      
>      
>    
>  
> 
>
> The problem is that anytime I perform a delta-import, the index keeps being
> populated as if new documents were added. In other words, I am not able to
> UPDATE an existing document or REMOVE a document that is not anymore in the
> DB.
>
> What am I missing? How should I specify my deltaQuery?
>
> Thanks a lot in advance!
>
> Giovanni
>



-- 
--Noble Paul


How to index in Solr?

2009-03-17 Thread Gosavi.Shyam

Hi,
I am new user of solr and I don't know how to index 
can any one tell me setting so that I can make index and search
and also how to crawl any web site and local system using solr?

Thanks In advance.

-Sanjshra

-- 
View this message in context: 
http://www.nabble.com/How-to-index-in-Solr--tp22573301p22573301.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to index in Solr?

2009-03-17 Thread Shalin Shekhar Mangar
On Wed, Mar 18, 2009 at 11:42 AM, Gosavi.Shyam wrote:

>
> Hi,
> I am new user of solr and I don't know how to index
> can any one tell me setting so that I can make index and search
> and also how to crawl any web site and local system using solr?
>

I think it will be best to start with the Solr tutorial --
http://lucene.apache.org/solr/tutorial.html

Setup an instance of solr and index the example data provided with the solr
download to understand how it is used. Also, take a look at the wiki which
has a lot of useful documentation -- http://wiki.apache.org/solr

Solr is a search server. It is not a crawler. You'd need to use an external
crawler such as Nutch or Heretrix or Droids to crawl websites. Search the
mailing list archives for past discussions on this topic.

-- 
Regards,
Shalin Shekhar Mangar.