why is query so slow
hello, I created index with 1.5m docs. When I am post query without facets it returns in a moment. When I post query with one facets it takes 14s. 0 14263 − true on 1 0 zamok -1 wasCreatedBy_fct 10 2.2 when I add filter that returns only one docs it takes same time 0 13249 − true on 1 0 zamok -1 wasCreatedBy_fct wasCreatedBy_fct="Martin Benka" 10 2.2 − Why? Can anybody explain me what am I doing wrong and how to speed up response time. Peter -- View this message in context: http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why is query so slow
How many terms are in the wasCreatedBy_fct field? How is that field and its type configured? Solr 1.3? Or trunk? Trunk contains massive faceting speed improvements. Erik On Mar 17, 2009, at 4:21 AM, pcurila wrote: hello, I created index with 1.5m docs. When I am post query without facets it returns in a moment. When I post query with one facets it takes 14s. 0 14263 − true on 1 0 zamok -1 wasCreatedBy_fct 10 2.2 when I add filter that returns only one docs it takes same time 0 13249 − true on 1 0 zamok -1 wasCreatedBy_fct wasCreatedBy_fct="Martin Benka" 10 2.2 − Why? Can anybody explain me what am I doing wrong and how to speed up response time. Peter -- View this message in context: http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why is query so slow
Peter, If possible try running a 1.4-snapshot of Solr, the faceting improvements are quite remarkable. However, if you can't run unreleased code, it might be an idea to try reducing the number of unique terms (try indexing surnames only?). Toby. On 17 Mar 2009, at 10:01, pcurila wrote: I am using 1.3 How many terms are in the wasCreatedBy_fct field? How is that field and its type configured? field contains author names and there are lots of them. here is type configuration: positionIncrementGap="100"> stored="true" multiValued="true"/> -- View this message in context: http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html Sent from the Solr - User mailing list archive at Nabble.com. Toby Cole Software Engineer Semantico E: toby.c...@semantico.com W: www.semantico.com
Re: why is query so slow
I am using 1.3 > How many terms are in the wasCreatedBy_fct field? How is that field > and its type configured? field contains author names and there are lots of them. here is type configuration: -- View this message in context: http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html Sent from the Solr - User mailing list archive at Nabble.com.
Lemmatisation search in Solr
Hi, I am implementing Lemmatisation in Solr, which means if user looks for "Mouse" then it should display results of Mouse and Mice both. I understand that this is something context search. I think of using synonym for this but then synonyms.txt will be having so many records and this will keep on adding. Please suggest how I can implement it in some other way. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html Sent from the Solr - User mailing list archive at Nabble.com.
Special Characters search in solr
Hi, I am searching with any query string, which contains special characters like è in it. for e.g. If I search for tèst then it shud return all the results which contains tèst and test etc. There are other special characters also. I have updated my server.xml file of tomcat server and included UTF-8 as encoding type in the server entry but still it is not working. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html Sent from the Solr - User mailing list archive at Nabble.com.
stop word search
Hi, I have a query like this content:the AND iuser_id:5 which means return all docs of user id 5 which have the word "the" in content .Since 'the' is a stop word ,this query executes as just user_id :5 inspite of the "AND" clause ,Whereas the expected result here is since there is no result for "the " ,no results shloud be returned. Am i missing anythin here? Regards
Re: Indexing the directory
Victor, I'd recommend look at the tutorial at http://lucene.apache.org/solr/tutorial.html and using the list for more specific questions. Also, there a list of companies (as well as mine!) that do support of Solr at http://wiki.apache.org/solr/Support that eTrade can contract with to provide indepth support. Eric Pugh On Mar 16, 2009, at 6:25 PM, Huang, Zijian(Victor) wrote: Hi, all: I am new to SOLR, can anyone please tell me what do I do to index a some text files in a local directory? Thanks Victor - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Lemmatisation search in Solr
Have you looked for any open source lemmatizers? I didn't find any in a quick search, but there probably are some out there. Also, is there a particular reason you are after lemmatization instead of stemming? Maybe a "light" stemmer plus synonyms might suffice? On Mar 17, 2009, at 6:02 AM, dabboo wrote: Hi, I am implementing Lemmatisation in Solr, which means if user looks for "Mouse" then it should display results of Mouse and Mice both. I understand that this is something context search. I think of using synonym for this but then synonyms.txt will be having so many records and this will keep on adding. Please suggest how I can implement it in some other way. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special Characters search in solr
You will need to create a field that handles the accents in order to do this. Start by looking at the ISOLatin1AccentFilter. -Grant On Mar 17, 2009, at 7:31 AM, dabboo wrote: Hi, I am searching with any query string, which contains special characters like è in it. for e.g. If I search for tèst then it shud return all the results which contains tèst and test etc. There are other special characters also. I have updated my server.xml file of tomcat server and included UTF-8 as encoding type in the server entry but still it is not working. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lemmatisation search in Solr
stemming and synonyms are working fine in the application but these are working individually. I guess I will need to add the values in synomyms.txt to achieve it. Am I right? Actually its the project requirement to implement the lemmatisation. I also looked out for lemmatisation but couldnt get any. Thanks, Amit Grant Ingersoll-6 wrote: > > Have you looked for any open source lemmatizers? I didn't find any in > a quick search, but there probably are some out there. > > Also, is there a particular reason you are after lemmatization instead > of stemming? Maybe a "light" stemmer plus synonyms might suffice? > > On Mar 17, 2009, at 6:02 AM, dabboo wrote: > >> >> Hi, >> >> I am implementing Lemmatisation in Solr, which means if user looks for >> "Mouse" then it should display results of Mouse and Mice both. I >> understand >> that this is something context search. I think of using synonym for >> this but >> then synonyms.txt will be having so many records and this will keep on >> adding. >> >> Please suggest how I can implement it in some other way. >> >> Thanks, >> Amit Garg >> -- >> View this message in context: >> http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22558113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special Characters search in solr
I have added this filter factory in my schema.xml also but still that is not working. I am sorry but I didnt get as how to create the field to handle the accents. Please help. Grant Ingersoll-6 wrote: > > You will need to create a field that handles the accents in order to > do this. Start by looking at the ISOLatin1AccentFilter. > > -Grant > > On Mar 17, 2009, at 7:31 AM, dabboo wrote: > >> >> Hi, >> >> I am searching with any query string, which contains special >> characters like >> è in it. for e.g. If I search for tèst then it shud return all the >> results >> which contains tèst and test etc. There are other special characters >> also. >> >> I have updated my server.xml file of tomcat server and included >> UTF-8 as >> encoding type in the server entry but still it is not working. >> >> Please suggest. >> >> Thanks, >> Amit Garg >> -- >> View this message in context: >> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special Characters search in solr
This is the entry in schema.xml dabboo wrote: > > I have added this filter factory in my schema.xml also but still that is > not working. I am sorry but I didnt get as how to create the field to > handle the accents. > > Please help. > > > Grant Ingersoll-6 wrote: >> >> You will need to create a field that handles the accents in order to >> do this. Start by looking at the ISOLatin1AccentFilter. >> >> -Grant >> >> On Mar 17, 2009, at 7:31 AM, dabboo wrote: >> >>> >>> Hi, >>> >>> I am searching with any query string, which contains special >>> characters like >>> è in it. for e.g. If I search for tèst then it shud return all the >>> results >>> which contains tèst and test etc. There are other special characters >>> also. >>> >>> I have updated my server.xml file of tomcat server and included >>> UTF-8 as >>> encoding type in the server entry but still it is not working. >>> >>> Please suggest. >>> >>> Thanks, >>> Amit Garg >>> -- >>> View this message in context: >>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> > > -- View this message in context: http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: stop word search
Well, by definition, using an analyzer that removes stopwords *should* do this at query time. This assumes that you used an analyzer that removed stopwords at index and query time. The stopwords are not in the index. You can get the behavior you expect by using an analyzer at query time that does NOT remove stopwords, and one at indexing time that *does* remove stopwords. Gut I'm having a hard time imagining that this would result in a good user experience. I mean anytime that you had a stopword in the query where the stopword was required, no results would be returned. Which would be hard to explain to a user What is it you're trying to accomplish? Best Erick On Tue, Mar 17, 2009 at 7:40 AM, revas wrote: > Hi, > > I have a query like this > > content:the AND iuser_id:5 > > which means return all docs of user id 5 which have the word "the" in > content .Since 'the' is a stop word ,this query executes as just user_id :5 > inspite of the "AND" clause ,Whereas the expected result here is since > there > is no result for "the " ,no results shloud be returned. > > Am i missing anythin here? > > Regards >
Re: Special Characters search in solr
Did you reindex after you incorporated the ISOLatin... filter? On Tue, Mar 17, 2009 at 8:40 AM, dabboo wrote: > > This is the entry in schema.xml > > omitNorms="true"> > > > > > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > outputUnigramIfNoNgram="true" maxShingleSize="99"/> > > > > > > > > dabboo wrote: > > > > I have added this filter factory in my schema.xml also but still that is > > not working. I am sorry but I didnt get as how to create the field to > > handle the accents. > > > > Please help. > > > > > > Grant Ingersoll-6 wrote: > >> > >> You will need to create a field that handles the accents in order to > >> do this. Start by looking at the ISOLatin1AccentFilter. > >> > >> -Grant > >> > >> On Mar 17, 2009, at 7:31 AM, dabboo wrote: > >> > >>> > >>> Hi, > >>> > >>> I am searching with any query string, which contains special > >>> characters like > >>> è in it. for e.g. If I search for tèst then it shud return all the > >>> results > >>> which contains tèst and test etc. There are other special characters > >>> also. > >>> > >>> I have updated my server.xml file of tomcat server and included > >>> UTF-8 as > >>> encoding type in the server entry but still it is not working. > >>> > >>> Please suggest. > >>> > >>> Thanks, > >>> Amit Garg > >>> -- > >>> View this message in context: > >>> > http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >>> > >> > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Solr search with Auto Spellchecker
I have the same question in mind. How can I configure the same standard request handler to handle the spell check for given query? I mean instead of calling http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=globl for spelling checking the following query request should take care of both querying and spell checking: http://localhost:8983/solr/select?q=globl --- On Wed, 3/11/09, Shalin Shekhar Mangar wrote: From: Shalin Shekhar Mangar Subject: Re: Solr search with Auto Spellchecker To: solr-user@lucene.apache.org Date: Wednesday, March 11, 2009, 9:33 AM On Wed, Mar 11, 2009 at 7:00 PM, Narayanan, Karthikeyan < karthikeyan.naraya...@gs.com> wrote: > Is it possible get the search results from the spell corrected word in a > single solr search query?. Like I search for the word "globl" and the > correct spelling is "global".. The query should return results matching > with the word "global". Would appreciate any ideas.. > > No, you'll need to make two queries. -- Regards, Shalin Shekhar Mangar.
Re: How to use spell checker
How can I configure the same standard request handler to handle the spell check for given query? I mean instead of calling http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=elepents for spelling checking the following query request should take care of both querying and spell checking: http://localhost:8983/solr/select?q=elepents Thanks --- On Tue, 3/3/09, Grant Ingersoll wrote: From: Grant Ingersoll Subject: Re: How to use spell checker To: solr-user@lucene.apache.org Date: Tuesday, March 3, 2009, 2:03 PM See http://wiki.apache.org/solr/SpellCheckComponent On Mar 3, 2009, at 1:23 AM, dabboo wrote: > > Hi, > > I am trying to implement the spell check feature in solr with lucene. for > e.g. if any record contains "elephants" and user enters "elepents", even > then also, it should return the results with the correct spelling i.e. > "elephants". > > Please suggest. > > Thanks, > Amit Garg > --View this message in context: > http://www.nabble.com/How-to-use-spell-checker-tp22303127p22303127.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
spellchecker: returning results even with misspelt words
Hi all, I'd like to achieve the following: When searching for e.g. two words, one of them being spelt correctly the other one misspelt I'd like to receive results for the correct word but would still like to get spelling suggestions for the wrong word. Currently when I search for misspelt words I get suggestions, but no results at all although there would be results when searching for the correct word only. Hope you understand what I want to achieve as it's a little hard to explain. all the best Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2
Re: Solr search with Auto Spellchecker
Am 17.03.2009 um 14:39 schrieb Shyamsunder Reddy: I have the same question in mind. How can I configure the same standard request handler to handle the spell check for given query? I mean instead of calling http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=globl for spelling checking the following query request should take care of both querying and spell checking: http://localhost:8983/solr/select?q=globl in your solrconfig.xml add the spellchecker component to your standard request handler. default="true"> explicit spellcheck best Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2
Re: Special Characters search in solr
Yes, I did and below is my debugQuery result. - - 0 47 - 10 0 on Colo� dismaxrequest true 2.2 - Colo� Colo� +DisjunctionMaxQuery((programJacketImage_program_s:colo | courseCodeSeq_course_s:colo | authorLastName_product_s:colo | era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo | discCode_course_s:colo | sourceGroupName_course_s:colo | indexType_course_s:colo | prdMainTitle_product_s:colo | isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo | discipline_product_s:colo | courseJacketImage_course_s:colo | imprint_product_s:colo | introText_program_s:colo | productType_product_s:colo | isbn13_product_s:colo | copyrightYear_product_s:colo | prdPubDate_product_s:colo | programType_program_s:colo | editor_product_s:colo | courseType_course_s:colo | courseId_course_s:colo | categoryIds_product_s:colo | contentType_product_s:colo | indexType_program_s:colo | strapline_product_s:colo | subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo | subject_product_s:colo | edition_product_s:colo | IndexId_s:colo | programId_program_s:colo)~0.01) () all:english^90.0 all:hindi^123.0 all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0 all:erer^4545.0 +(programJacketImage_program_s:colo | courseCodeSeq_course_s:colo | authorLastName_product_s:colo | era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo | discCode_course_s:colo | sourceGroupName_course_s:colo | indexType_course_s:colo | prdMainTitle_product_s:colo | isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo | discipline_product_s:colo | courseJacketImage_course_s:colo | imprint_product_s:colo | introText_program_s:colo | productType_product_s:colo | isbn13_product_s:colo | copyrightYear_product_s:colo | prdPubDate_product_s:colo | programType_program_s:colo | editor_product_s:colo | courseType_course_s:colo | courseId_course_s:colo | categoryIds_product_s:colo | contentType_product_s:colo | indexType_program_s:colo | strapline_product_s:colo | subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo | subject_product_s:colo | edition_product_s:colo | IndexId_s:colo | programId_program_s:colo)~0.01 () all:english^90.0 all:hindi^123.0 all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0 all:erer^4545.0 DismaxQParser It is actually converting "Coloèr" to "Colo�" and hence not searching. It is behaving the same even before adding the ISOLatin1AccentFilter. Please suggest. Thanks, Amit Garg Erick Erickson wrote: > > Did you reindex after you incorporated the ISOLatin... filter? > > On Tue, Mar 17, 2009 at 8:40 AM, dabboo wrote: > >> >> This is the entry in schema.xml >> >>> positionIncrementGap="100" >> omitNorms="true"> >> >> >> >> >> >>>ignoreCase="true" >>words="stopwords.txt" >>enablePositionIncrements="true" >>/> >> >>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> >>> protected="protwords.txt"/> >> >> >> >> >> >> >> >>> ignoreCase="true" expand="true"/> >>> words="stopwords.txt"/> >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> >>> protected="protwords.txt"/> >> >> >> > outputUnigramIfNoNgram="true" maxShingleSize="99"/> >> >> >> >> >> >> >> >> dabboo wrote: >> > >> > I have added this filter factory in my schema.xml also but still that >> is >> > not working. I am sorry but I didnt get as how to create the field to >> > handle the accents. >> > >> > Please help. >> > >> > >> > Grant Ingersoll-6 wrote: >> >> >> >> You will need to create a field that handles the accents in order to >> >> do this. Start by looking at the ISOLatin1AccentFilter. >> >> >> >> -Grant >> >> >> >> On Mar 17, 2009, at 7:31 AM, dabboo wrote: >> >> >> >>> >> >>> Hi, >> >>> >> >>> I am searching with any query string, which contains special >> >>> characters like >> >>> è in it. for e.g. If I search for tèst then it shud return all the >> >>> results >> >>> which contains tèst and test etc. There are other special characters >> >>> also. >> >>> >> >>> I have updated my server.xml file of tomcat server and included >> >>> UTF-8 as >> >>> encoding type in the server entry but still it is not working. >> >>> >> >>> Please suggest. >> >>> >> >>> Thanks, >> >>> Amit Garg >> >>> -- >> >>> View this message in context: >> >>> >> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html >> >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >>> >> >> >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://
Solr: delta-import, help needed
Hello all, I have a table TEST in an Oracle DB with the following columns: URI (varchar), CONTENT (varchar), CREATION_TIME (date). The primary key both in the DB and Solr is URI. Here is my data-config.xml: The problem is that anytime I perform a delta-import, the index keeps being populated as if new documents were added. In other words, I am not able to UPDATE an existing document or REMOVE a document that is not anymore in the DB. What am I missing? How should I specify my deltaQuery? Thanks a lot in advance! Giovanni
Re: Is optimize always a commit?
Hi If I want to commit without optimize. Because Ive that : > start commit(optimize=true,waitFlush=false,waitSearcher=true) but I don't want to optimize otherwise my replication will take every time the full index folder. Thanks a lot guys for ur help, ryantxu wrote: > > yes. optimize also commits > > Maximilian Hütter wrote: >> Hi, >> >> maybe this is a stupid question, but is a optimize always a commit? >> In the log it looks like it: >> >> start commit(optimize=true,waitFlush=false,waitSearcher=true) >> >> I just wanted to be sure. >> >> Best regards, >> >> Max >> >> > > > -- View this message in context: http://www.nabble.com/Is-optimize-always-a-commit--tp15498266p22562206.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellchecker: returning results even with misspelt words
I think if you use spellcheck.collate=true, you will still receive the results for correct word and suggestion for wrong word. I have name field (which is first name+last name) configured for spell check. I have name entry: GUY SHUMAKER. I am trying to find out person names where either 'GUY' or 'SHUMAKER' or both are spelled wrong. 1. Last Name spelled wrong as 'SHAMAKER' http://localhost:8090/solr/select?q=NAME:GUY%20SHAMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true It return all results that match 'GUY' and spelling suggestion for 'SHAMAKER' as 'SHUMAKER' 2. First Name spelled wrong as 'GYY' http://localhost:8090/solr/select?q=NAME:GYY SHUMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true It return No results and spelling suggestion for 'GYY' as 'GUY' and collation as NAME:guy SHUMAKER Note:But here I expected result that match SHUAMKER 3. Both first name and last name spelled wrong as: GYY SHAMAKER http://localhost:8090/solr/select?q=NAME:GYY%20SHAMAKER&fq=TYPE:PERSON&spellcheck=true&spellcheck.collate=true Here no results, but received suggestion for both words and collation. It is similar to your scenario? Also why NO results are returned for case 2. --- On Tue, 3/17/09, Ingo Renner wrote: From: Ingo Renner Subject: spellchecker: returning results even with misspelt words To: solr-user@lucene.apache.org Date: Tuesday, March 17, 2009, 9:52 AM Hi all, I'd like to achieve the following: When searching for e.g. two words, one of them being spelt correctly the other one misspelt I'd like to receive results for the correct word but would still like to get spelling suggestions for the wrong word. Currently when I search for misspelt words I get suggestions, but no results at all although there would be results when searching for the correct word only. Hope you understand what I want to achieve as it's a little hard to explain. all the best Ingo --Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2
Re: stemming (maybe?) question
Yonik Seeley wrote: Not sure... I just took the stock solr example, and it worked fine. I inserted "o'meara" into example/exampledocs/solr.xml Advanced o'meara Full-Text Search Capabilities using Lucene the indexed everything: ./post.sh *.xml Then queried in various ways: q=o'meara q=omeara q=o%20meara All of the queries found the solr doc. i grabbed the original example schema.xml and made my username field use the following definition: positionIncrementGap="100"> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> i removed the stopwords and porter stuff because for proper names i don't want that. seems to work fine now, thanks! -jsd-
Re: optimize an index as fast as possible
Thanks Mark, that really did the job! The speed loss in update time is more than compensated at optimizing time! Now I am trying to do another test... but not sure if Lucene have this option, I am using Lucene 2.9-dev. As I am working with 3G index and always have to optimize (as I said before, I tried not to optimize to send my index via rsync faster but the speed loss to serve request in the slaves was huge). I wander if it's possible to do "block optimizing" (I have just invented the word). The example would be... I have a 3G index optimized. I start executing updates to the index.. would be possible to keep doing optimizes just on the new created segments?... so I would still have the 3G index and would be building another big index from the segments created from the updates This way I would have to send via rsync to the slaves just the new "blog" (suposing the slaves already had the 3G index because I would have sended it before). Is there any way to do something similar to that? This has come to my mind cause I have to serve the index to the slaves as many times as possible... and optimizing the index in just one "block" makes rsync job to take a long time. Thanks in advance markrmiller wrote: > > Marc Sturlese wrote: >> Hey there, >> I am creating an index of 3G... it's fast indexing but optimization takes >> about 10 min. I need to optimize it every time I update as if I don't do >> that, search requests will be much slower. >> Wich parameter configuration would be the best to make optimize as fast >> as >> possible (I don't mind to use a lot of memory, at least for testing, if I >> can speed up the process). >> Actually I am using for the IndexWriter: >> >> 1024 >> 2147483647 >> 1 >> 1000 >> 1 >> 10 >> Am I missing any important parameter to do that job? >> Thanks in advance >> >> > How about using a merge factor of 2? This way you are pretty much always > optimized (old large segment, new small segment at most) - you pay a bit > in update speed, but I've found it to be very reasonable for many > applications. > > -- > - Mark > > http://www.lucidimagination.com > > > > > -- View this message in context: http://www.nabble.com/optimize-an-index-as-fast-as-possible-tp22543267p22565158.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shard Query Problem
: here is the whole file, if it helps as i said before, i don't know much about the inner workings of distributed search, but nothing about your config seems odd to me. it seems like it should work fine. a wild the shot in the dark: instead of using a requestHandler named "standard" and urls that start with "http://lca2-s5-pc04:8080/solr/select?"; try registering a handler name starting with a slash (ie: http://lca2-s5-pc04:8080/solr/foo?shards=...";) This suggestion is based on the suposition that *maybe* the legacy support for /select and the qt param doesn't play nicely with distributed searching ... but as i said, this is really just a wild guess. : : : : : : : ${solr.abortOnConfigurationError:true} : : : ${solr.data.dir:./solr/data} : : : : : false : : 10 : : : : 32 : 2147483647 : 1 : 1000 : 1 : : : : : : : : : : : : : single : : : : : false : 32 : 10 : : : 2147483647 : 1 : : : false : : : : : : : : : : : : : 1 : 100 : : : : : : : : : : : : : 1024 : : : : : : : : : : : : : true : : : : : : : : 50 : : : 200 : : : : : : : : : solr 0 10 : rocks 0 10 : static newSearcher warming query from : solrconfig.xml : : : : : : : fast_warm 0 10 : static firstSearcher warming query from : solrconfig.xml : : : : : false : : : 2 : : : : : : : : : : : : : : : : : : : : : : :explicit : :10 :* :2.1 : : : : : : : : : dismax : explicit : 0.01 : : text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 : : : text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 : : : ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 : : : id,name,price,score : : : 2<-1 5<-2 6<90% : : 100 : *:* : : text features name : : 0 : : name : regex : : : : : : : dismax : explicit : text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 : 2<-1 5<-2 6<90% : : incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2 : : : : inStock:true : : : : cat : manu_exact : price:[* TO 500] : price:[500 TO *] : : : : : : : : : : textSpell : : : default : spell : ./spellchecker1 : : : : jarowinkler : spell : : org.apache.lucene.search.spell.JaroWinklerDistance : ./spellchecker2 : : : : : solr.FileBasedSpellChecker : file : spellings.txt : UTF-8 : ./spellcheckerFile : : : : : : : : false : : false : : 1 : : : spellcheck : : : : : : : string : elevate.xml : : : : : : explicit : : : elevator : : : : : : : : : : : : : : : : : : : : : : standard : solrpingquery : all : : : : : : : explicit : true : : : : : : : : : 100 : : : : : : : : 70 : : 0.5 : : [-\w ,/\n\"']{20,200} : : : : : : : : : : : : : : : : : : 5 : : : : : : : : : : solr : : : : : : : Thanks, : Cheers, : Anshul : : On Fri, Mar 6, 2009 at 7:53 PM, Anshul jain wrote: : : > Hi Chris, : > : > Thanks for the reply. here are the requesthandlers from solrconfig.xml: : > : > : > : > : > : >explicit : > : >10 : >* : >2.1 : > : > : > : > : > : > : > : > dismax : > explicit : > 0.01 : > : > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 : > : > : > text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 : > : > : > ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 : > : > : > id,name,price,score : > : > : > 2<-1 5<-2 6<90% : > : > 100 : > *:* : > : > : > text features name : > : > 0 : > : > name : > regex : > : > : > : > Thanks, : > Anshul : > : > : > : > On Wed, Mar 4, 2009 at 9:50 AM, Chris Hostetter wrote: : > : >> : >
More replication questions
Hello, I have a couple of questions relating to replication in Solr. As far as I understand it, the replication approach for both 1.3 and 1.4 involves having the slaves poll the master for updates to the index. We're curious to know if it's possible to have a more dynamic/quicker way to propagate updates. 1. Is there a built-in mechanism for pushing out updates(/inserts/deletes) received by the master to the slaves? 2. Is it discouraged to post updates to multiple Solr instances? (all instances can receive updates and fulfill query requests) 3. If that sort of capability is not supported, why was it not implemented this way? (So that we don't repeat any mistakes) 4. Has anyone else on the list attempted to do this? The intent here is to achieve optimal performance while have the freshest data possible if that's possible. Thanks, Laurent
Solr SpellCheker configuration for multiple fields same time
My advanced search option allows users to search for three different fields same time. The fields are - first name, last name and org name. Now I have to add spell checking feature for the fields. When wrong spelling is entered for each of these words like first name: jahn, last name: smath, org and org name: bpple the search result should return a suggestion like (collation) firstname:john AND lastname:smith AND orgname: apple What is the best approach to implement spell checking for these three different fields: 1. Build a single directory for all fields by copying them into a 'spell' field as: schema.xml configuration solrconfig.xml configuration textSpell default spell ./spellchecker Now the queries: 1a. /select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true The spell check searches against the dictionary './spllechecker' returns the suggestions as FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as expected. 1b. /select?q=LAST_NAME:jahn&spellcheck=true The spell check searches against the dictionary './spllechecker' returns the suggestions for LAST_NAME as 'john' But there is no last name 'john' for the field LAST_NAME. So the sub sequent search returns NO results, which is not accepted. So, this approach seems to be wrong for me.. 2. Build a separate directory for each field. schema.xml configuration solrconfig.xml configuration textSpell firstname spell_fname ./fname_spellchecker lastname spell_lname ./lname_spellchecker oname spell_org_name ./orgname_spellchecker Now the queries: 1a. /select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true How can I mention in the query to search against different dictionaries for different fields like FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker and ORG_NAME in orgname_spellchecker? Or can I make the spell checker to store the field names and its values. Please discuss my approaches and suggest a solution?
NPE creating EmbeddedSolrServer
Hello, I am trying to create a basic single-core embedded Solr instance. I figured out how to setup a single core instance and got (I believe) all files in right places. However, I am unable to run trivial code without exception: SolrServer solr = new EmbeddedSolrServer( new CoreContainer( "D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm", new File("D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr.xml")), "core"); The exception (with context) is: WARNING: No queryConverter defined, using default converter Mar 17, 2009 6:15:01 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@b02928 main Mar 17, 2009 6:15:01 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:147) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Mar 17, 2009 6:15:02 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=null path=null params={start=0&q=fast_warm&rows=10} status=500 QTime=47 I am not sure where to look further. Source code (at my level of knowledge) is not very helpful. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/
Re: Indexing issue
: I have two cores in different machines which are referring to the same data directory. this isn't really considered a supported configuration ... both solr instances are going to try and "own" the directory for updating, and unless you do somethign special to ensure only one has control you are going to have problems... : below error. HTTP Status 500 - java.io.FileNotFoundException: : \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The : system cannot find the file specified) java.lang.RuntimeException: : java.io.FileNotFoundException: : \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The ...like this. one core is mucking with the files in a way the other core doesn't know about. : I have changed lockType to simple and none, but still no luck… : Could you please correct me if I am doing wrong? "none" isn't going to help you -- it's just going to make the problem worse (two misconfigured instances of Solr in the same JVM could corrupt eachother with lockType=none). "simple" is only going to help you on some filesystems -- sicne you said these two solr instances are running on different machines, that implies NFS (or something like it) and SimpleFSLockFactory doesn't work reliably in those cases. If you want to get something like this working, you'll probably need to setup your own network based lockType (instead of relying on the filesystem) -Hoss
Re: Relevancy and date sorting in same search?
: I'm trying to think of a way to use both relevancy and date sorting in : the same search. If documents are recent (say published within the last : 2 years), I want to use all of the boost functions, BQ parameters, and : normal Lucene scoring functions, but for documents older than two years, : I don't want any of those scores to apply - only a sort by date. Yonik recently commited some code that makes it possible to express a function query string that refers to an arbitrary param name which is evaluated as a query and the scores for each document are used as a ValueSource input for the function. I imagine you could combine this with a custom function that returns the value from one ValueSource if it's low enough (the date field) otherwise it returns the value from an alternate ValueSource (the query) -Hoss
Re: muticore setup with tomcat
You haven't really given us a lot of information to work with... what shows up in your logs? what did you name the context fragment file? where did you put the context fragment file? where did you put the multicore directory? sharing *exact* directory lisings and the *exact* commands you've executed is much more likely to help people understand what you're seeing. For example: the SolrTomcat wiki page shows an exact set of shell commands to install solr and tomcat on linux or cygwin and get it running against a simple example ... if you can provide a similar set commands showing *exactly* what you've done, people might be able to spot the problem (or try the steps themselve and reproduce the problem) http://wiki.apache.org/solr/SolrTomcat : Date: Mon, 9 Mar 2009 14:55:47 +0530 : Hi, : : I am trying to do amulticore set up.. : : I added the following from the 1.3 solr download to new dir called multicore : : core0 ,core1,solr.xml and solr.war : : in the tomcat context fragment i have defined as : : : : : http://localhost:8080/multicore/admin : http://localhost:8080/multicore/admin/core0 : : The above 2 ursl give me resource not found error : : the solr.xml is the default one from the download. : : Please tell me as to what needs to be changed to make this work in tomcat : : Regards : Sujatha : -Hoss
Re: multicore file path
This is a "feature" of the ShowFileRequestHandler -- it doesn't let people browse files outside of hte conf directory. I suppose this behavior could be made configurable (right now the only config option is "hidden" for excluding specific files ... we could have an option to "allow" files that would normally be hidden) : Date: Mon, 9 Mar 2009 07:33:48 -0400 : From: "Gargate, Siddharth" : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: multicore file path : : I am trying out multicore environment with single schema and solrconfig : file. Below is the folder structure : : Solr/ :conf/ : schema.xml : solrconfig.xml :core0/ :data/ :core1/ :data/ :tomcat/ : : The solrhome property is set in tomcat as -Dsolr.solr.home=../.. : : And the solr.xml file is : : : : : : : : : Everything is working fine, but when I try to access schema file from : admin UI I am getting following error : http://localhost:8080/solr/core0/admin/file/?file=../../conf/schema.xml : HTTP Status 403 - Invalid path: ../../conf/schema.xml : description Access to the specified resource (Invalid path: : ../../conf/schema.xml) has been forbidden. : : -Hoss
NPE in MultiSegmentReader$MultiTermDocs.doc
I've recently upgraded to Solr 1.3 using Lucene 2.4. One of the reasons I upgraded was because of the nicer SearchComponent architecture that let me add a needed feature to the default request handler. Simply put, I needed to filter a query based on some additional parameters. So I subclassed QueryComponent and replaced the line rb.setQuery( parser.getQuery() ); with one that wrapped the parsed query in a FilteredQuery Query query = parser.getQuery(); String arguments = params.get("param-name"); if( arguments != null) { query = new FilteredQuery(query, new MyCustomFilter(arguments)); } rb.setQuery(query); The filter class I used can be seen here: http://privatepaste.com/021ZH27tKG. And is nearly verbatim from the Lucene in Action book, when describing a way to do security filtering. This seems to work fine, although I'm getting some strange behavior when exercising this code through some unit tests from my Rails app. Sometimes I get an NPE when doing the filtering. at top level in at org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.doc(MultiSegmentReader.java at line 533 at $MyCustomFilter.bits(Unknown Source) at top level in at org.apache.lucene.search.Filter.getDocIdSet(Filter.java at line 49 at top level in at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java at line 105 at top level in at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java at line 132 at top level in at org.apache.lucene.search.Searcher.search(Searcher.java at line 126 After some detective work I decided the problem had to do with an empty index, and the termDocs iterator has 0 elements to iterate over and was throwing this error. Since there is no size() method or analogous method on the TermDocs iterator, I decided to use docFreq(term) as you can see in the source code. But this didn't solve my problem either. This error is being throw even when docFreq(term) returns more then 0 documents. I can't for the life of me figure out why this iterator's doc() method throwing an NPE. (Well I can deduce that the current member is null, but I don't know why.) Is the index corrupted? I can see record in the index that should match my Term through the solr /admin/ interface, and docFreq(term) returns a number > 0. Yet this NPE keeps showing up. Any help or guidance would be appreciated. If you need to see more source I'd be happy to provide it, but I'm sure thats all the relevant stuff. (I cross posted this to both Solr and Lucene lists.) Comron
Re: muticore setup with tomcat
below is my setup, then under /home/zhangyongjiang/applications/solr, I have solr.xml as below, under /home/zhangyongjiang/applications/solr, I created core1/, core2/, core3/, core4 subdirectories. hope it helps. - Original Message From: Chris Hostetter To: solr-user@lucene.apache.org Sent: Tuesday, March 17, 2009 3:46:11 PM Subject: Re: muticore setup with tomcat You haven't really given us a lot of information to work with... what shows up in your logs? what did you name the context fragment file? where did you put the context fragment file? where did you put the multicore directory? sharing *exact* directory lisings and the *exact* commands you've executed is much more likely to help people understand what you're seeing. For example: the SolrTomcat wiki page shows an exact set of shell commands to install solr and tomcat on linux or cygwin and get it running against a simple example ... if you can provide a similar set commands showing *exactly* what you've done, people might be able to spot the problem (or try the steps themselve and reproduce the problem) http://wiki.apache.org/solr/SolrTomcat : Date: Mon, 9 Mar 2009 14:55:47 +0530 : Hi, : : I am trying to do amulticore set up.. : : I added the following from the 1.3 solr download to new dir called multicore : : core0 ,core1,solr.xml and solr.war : : in the tomcat context fragment i have defined as : : : : : http://localhost:8080/multicore/admin : http://localhost:8080/multicore/admin/core0 : : The above 2 ursl give me resource not found error : : the solr.xml is the default one from the download. : : Please tell me as to what needs to be changed to make this work in tomcat : : Regards : Sujatha : -Hoss
Re: custom hitcollector example
: Can someone point to or provide an example of how to incorporate a : custom hitcollector when using Solr? this is somewhat hard to do in non trivial ways because you would need to by-pass a lot of hte Solr code that builds up DocLists and DocSets ... if however you don't need either of those, or things that depend on them, you can write a customm RequestHandler or SearchComponent and call any method you want on the (Solr)IndexSearcher and pass in your HitCollector. depending on what your HitCollector does you could also use it to build up a DocSet that you then pass as a filter to the existing Solr methods ... assuming the reason you want to use it is to filter results, if you want to use it to "visit" every match, you could let solr do the search, get the resulting DocSet and then iterate over the each match calling your HitCollector. -Hoss
Re: Re[2]: the time factor
: How come if bq doesn't influence what matches -- that's q -- bq only : influence : the scores of existing matches if they also match the bq because that's the way it was designed ... "bq" is "boost query" it's designed to boost the scores of documents that match the "q" param. : when I put : : as bq=(country:FR)^2 (status_official:1 status_new:1)^2.5 : Ive no result : : if I put just bq=(country:FR)^2 Or bq=(status_official:1 status_new:1)^2.5 : or even bq=(country:FR)^2 OR (status_official:1 status_new:1)^2.5 : I will have one result. i can't explain that ... you'd need to post all of the things people usually ask about to trouble shoot what might be happening (configs for request handler, full query string, debugQuery="true" output, etc...) -Hoss
Re: How to correctly boost results in Solr Dismax query
: bq works only with q.alt query and not with q queries. So, in your case you : would be using qf parameter for field boosting, you will have to give both : the fields in qf parameter i.e. both title and media. FWIW: that statement is false. the "boost query" (bq) is added to the query regardless of wether "q" or "q.alt" is ultimately used. if you turn on debugQUery=true and look at your resulting query string, you can see exactly what the resulting query is (parsedQuery) Using the example setup, compare the output from these examples... http://localhost:8983/solr/select/?q.alt=baz&q=solr&defType=dismax&qf=name+cat&bq=foo&debugQuery=true http://localhost:8983/solr/select/?q.alt=solr&q=&defType=dismax&qf=name+cat&bq=foo&debugQuery=true -Hoss
Re: How to correctly boost results in Solr Dismax query
: Is not particularly helpful. I tried adding adding a bq argument to my : search: : : &bq=media:DVD^2 : : (yes, this is an index of films!) but I find when I start adding more : and more: : : &bq=media:DVD^2&bq=media:BLU-RAY^1.5 : : I find the negative results - e.g. films that are DVD but are not : BLU-RAY get negatively affected in their score. In the end it all seems that shouldn't be happening ... the outermost BooleanQuery (that the main "q" and all of hte "bq" queries are added to) has it's "coordFactor" disabled, so documents aren't penalized for not matching bq caluses. What you may be seeing is that the raw numeric score values you see getting returned by Solr are lower for documents that match "DVD" when you add teh "BLU-RAY" bq ... that's totally possible because *absolute* scores from one query can't be compared to scores from another query -- what's important is that the *relative* order of scores from doc1 and doc2 should be consistent (ie: the score for a doc matching DVD might go down when you add the BLUERAY bq, but the scores for *all* documents not matching BLUERAY should go down some) The important thing to look for is: 1) are DVD docs sorting higher then they would without the DVD bq? 2) are BLURAY docs sorting higher then they would without the BLURAY bq? 3) are two docs that are equivilent except for a DVD?BLUERAY distinction sorting such that the BLURAY doc comes first? ...the answers to all of those should be yes. if you're seeing otherwise, please post the query tostrings for both queries, and the score explanations for the docs in question against both queries. -Hoss
Re: fl wildcards
FWIW: there has been a lot of dicsussion around how wildcards should work in various params that involve field names in the past: search the archives for "glob" or "globbing" and you'll find several. : That makes sense, since hl.fl probably can get away with calculating in the : writer, and not as part of the core. However, I really need wildcard (or : globbing) support for field lists as part of the common query parameter "fl". : Again, if someone can just point me to where the Solr core is using the : contents of the fl param, I am happy to implement this, if only locally for my : purposes. It's complicated... the SolrQueryResponse has a setReturnFields method where the RequestHandler can specify which fields should be returned and the ResponseWRiters use that when outputing DocList (writer fetches the Document by internal id) ... but with the addition of distributed searching now there are also SolrDocumentList objects and whoever puts the SolrDocumentList in the response decides which fields to populate. if you grep the code base for CommonParams.FL and setReturnFields you should find all of the various touch points. if you're really interested in pursuing this, some brainstorming on how to deal with field globs in a very general and robust way were discussed about a year ago, and i posted notes on the wiki... http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams ...but no one has actively pursued it enough to figure out what the real ramifications/benefits could be. -Hoss
Re: Compound word search (maybe DisMaxQueryPaser problem)
: My original assumption for the DisMax Handler was, that it will just take the : original query string and pass it to every field in its fieldlist using the : fields configured analyzer stack. Maybe in the end add some stuff for the : special options and so ... and then send the query to lucene. Can you explain : why this approach was not choosen? because then it wouldn't be the DisMaxRequestHandler. seriously: the point of dismax is to build up a DisjunctionMaxQuery for each "chunk" in the query string and populate those DisjunctionMaxQueries with the Queries produced by analyzing that "chunk" against each field in the qf -- then all of the DisjunctionMaxQueries are grouped into a BooleanQuery with a minNrSHouldMatch. if you look at the query toString from debugQuery (using a non trivial qf param and a q string containing more then one "chunk") you can see what i mean. your example shows it pretty well actaully... : > : > : > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1) the point is to build those DisjunctionMaxQueries -- so that each "chunk" only contributes significantly based on the highest scoring field that chunk appears in ... if your example someone typing "blue tooth" can get a match when a doc matches blue in one field and tooth in another -- that wouldn't be possible with the appraoch you describe. the Query structure also means that a doc where "tooth" appears in both the category and name fields but "blue" doesn't appear at all won't score as high as a doc that matches "blue" in category and "tooth" in name (allthough you have to look at the score explanations to really see hwat i mean by that) There are certainly a lot of improvements that could be made to dismax ... more customiation in terms of how the querystrings is parsed before building up the DisjunctionMaxQueries and calling the individual field analyzers would certainly be one way it could improve ... but so far no one has attempted anything like that. -Hoss
Re: muticore setup with tomcat
: below is my setup, : : : : you provided that information before, but you still haven't answered the most of the questions i asked you... : You haven't really given us a lot of information to work with... : : what shows up in your logs? : what did you name the context fragment file? : where did you put the context fragment file? : where did you put the multicore directory? ...the answer to that last question is the only new piece of information you provided. My other comments also still hold true... : sharing *exact* directory lisings and the *exact* commands you've : executed is much more likely to help people understand what you're seeing. ...please cut and paste directory listings of the directories in question, cust/paste how you are running tomcat, which directory you are running tomcat in, what log messages you are getting, etc... : For example: the SolrTomcat wiki page shows an exact set of shell commands : to install solr and tomcat on linux or cygwin and get it running against a : simple example ... if you can provide a similar set commands showing : *exactly* what you've done, people might be able to spot the problem (or : try the steps themselve and reproduce the problem) : : http://wiki.apache.org/solr/SolrTomcat -Hoss
Re: Solr 1.4: filter documens using fields
: I'm using StandardRequestHandler and I wanted to filter results by two fields : in order to avoid duplicate results (in this case the documents are very : similar, with differences in fields that are not returned in a query : response). ... : I'm manage to do the filtering in the client, but then the paging doesn't work : as it should (some pages may contain more duplicated results than others). : Is there a way (query or other RequestHandler) to do this? not at the moment, but some people have been working on trying to solve the broader problem in an efficient way... https://issues.apache.org/jira/browse/SOLR-236 -Hoss
Re: Operators and Minimum Match with Dismax handler
: I have an index which we are setting the default operator to AND. : Am I right in saying that using the dismax handler, the default operator in : the schema file is effectively ignored? (This is the conclusion I've made : from testing myself) correct. : The issue I have with this, is that if I want to include an OR in my phrase, : these are effectively getting ignored. The parser is still trying to match : 100% of the search terms : : e.g. 'lucene OR query' still only finds matches for 'lucene AND query' : the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) () correct. dismax isn't designed to be used that way (it's a fluke of the implementation that using " AND " works at all) : Does anyone have any advise as to how I could deal with this kkind of : problem? i would set your mm to something smaller and let your users use "+" when they want to make something required. if you really want to support the AND/OR/NOT type sequence ... don't use dismax: that type of syntax is what the standard parser is for. -Hoss
Re: Custom handler that forwards a request to another core
: My problem was that the XMLResponseWriter is using the searcher of the : original request to get the matching documents (in the method writeDocList : of the class XMLWriter). Since the DocList contains id from the index of the : second core, there were not valid in the index of the core receiving the : request. correct. to deal with this type of problem in distributed search, the SolrDocumentList class was introduced -- if you call getSearcher() on your LocalSolrQueryRequest you can use that to build up a SolrDocumentList from the DocList, and then add that to your response. BTW... : > public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse : > response) ... : > request = new LocalSolrQueryRequest(coreToRequest, params); : > : > SolrRequestHandler mlt = coreToRequest.getRequestHandler("/mlt"); : > coreToRequest.execute(mlt, request, response); this doesn't look safe ... SolrQueryRequest objects need to be closed when they're finished, and you aren't doing that here. as a result the searcher ref obtained for the life of the request won't be closed. -Hoss
Re: com.ctc.wstx.exc.WstxLazyException exception while passing the text content of a word doc to SOLR
: I am using Apache POI parser to parse a Word Doc and extract the text : content. Then i am passing the text content to SOLR. The Word document has : many pictures, graphs and tables. But when i am passing the content to SOLR, : it fails. Here is the exception trace. : : 09:31:04,516 ERROR [STDERR] Mar 14, 2009 9:31:04 AM : org.apache.solr.common.SolrException log : SEVERE: [com.ctc.wstx.exc.WstxLazyException] : com.ctc.wstx.exc.WstxParsingException: Illegal charact : er entity: expansion character (code 0x7) not a valid XML character : at [row,col {unknown-source}]: [40,18] the error string is fairly self explanatory: on line 40, column 18 you have a character that isn't legal in XML (0x7) (not all UTF-8 characters are legal in XML) If search the solr archives for "Illegal character" you'll find lots of discussion about this and how to deal with this in general. You might also want to check out this comment pointing out some advantages in using Tika instead of using POI directly... https://issues.apache.org/jira/browse/LUCENE-1559?#action_12681347 ..lastly you might wnat to check out this plugin and do all hte hard work server side... http://wiki.apache.org/solr/ExtractingRequestHandler -Hoss
RE: Operators and Minimum Match with Dismax handler
I'm using dismax with the default operator set to AND, and don't use Minimum Match (commented out in solrconfig.xml), meaning 100% of the terms must match. Then in my application logic I use a regex that checks if the query contains " OR ", and if it does I add &mm=1 to the solr request to effectively turn the query into an OR. This trick doesn't work for complex boolean queries but works for simple xxx OR yyy. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 18/03/2009 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Operators and Minimum Match with Dismax handler : I have an index which we are setting the default operator to AND. : Am I right in saying that using the dismax handler, the default operator in : the schema file is effectively ignored? (This is the conclusion I've made : from testing myself) correct. : The issue I have with this, is that if I want to include an OR in my phrase, : these are effectively getting ignored. The parser is still trying to match : 100% of the search terms : : e.g. 'lucene OR query' still only finds matches for 'lucene AND query' : the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) () correct. dismax isn't designed to be used that way (it's a fluke of the implementation that using " AND " works at all) : Does anyone have any advise as to how I could deal with this kkind of : problem? i would set your mm to something smaller and let your users use "+" when they want to make something required. if you really want to support the AND/OR/NOT type sequence ... don't use dismax: that type of syntax is what the standard parser is for. -Hoss CLSA CLEAN & GREEN: Please consider our environment before printing this email. The content of this communication is subject to CLSA Legal and Regulatory Notices. These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon request.
Re: Phrase slop / Proximity search
: Can I set the phrase slop value to standard request handler? I want it : to be configurable in solrconfig.xml file. if you mean when a user enters a query like... +fieldA:"some phrase" +(fieldB:true fieldC:1234) ..you want to be able to control what slop value gets used for "some phrase" then at the moment the only way to configure that is to put it in the query string... +fieldA:"some phrase"~3 +(fieldB:true fieldC:1234) ...it's the kind of thing that could be set as a property on the query parser, but no one has implemented that. (patches welcome!) -Hoss
Re: More replication questions
On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent wrote: > Hello, > > > > I have a couple of questions relating to replication in Solr. As far as > I understand it, the replication approach for both 1.3 and 1.4 involves > having the slaves poll the master for updates to the index. We're > curious to know if it's possible to have a more dynamic/quicker way to > propagate updates. > > > > 1. Is there a built-in mechanism for pushing out > updates(/inserts/deletes) received by the master to the slaves? The pull mechanism in 1.4 can be good enough. The 'pollInterval' can be as small as 1 sec. So you will get the updates within a second .Isn't it not good enough? > > 2. Is it discouraged to post updates to multiple Solr instances? > (all instances can receive updates and fulfill query requests) This is prone to serious errors all the solr instances may not be in sync > > 3. If that sort of capability is not supported, why was it not > implemented this way? (So that we don't repeat any mistakes) A push based replication is in the cards. the implementation is not trivial. In Solr commits are already expensive s a second's delay may be alright . > > 4. Has anyone else on the list attempted to do this? The intent > here is to achieve optimal performance while have the freshest data > possible if that's possible. > > > > Thanks, > Laurent > > -- --Noble Paul
Re: CJKAnalyzer and Chinese Text sort
Created SOLR-1073 in JIRA with the class file: https://issues.apache.org/jira/browse/SOLR-1073 -- Original Message -- From: Chris Hostetter To: solr-user@lucene.apache.org Subject: Re: CJKAnalyzer and Chinese Text sort Date: Mon, 16 Mar 2009 21:34:09 -0700 (PDT) : Thanks Hoss for your comments! I don't mind submitting it as a patch, : shall I create a issue in Jira and submit the patch with that? Also, I yep, just attach the patch file. : didn't modify the core solr for locale based sorting; I just added the : created a jar file with the class file & copied it over to the lib : folder. As part of the patch, shall I add it to the core solr code-base : (users who want to use this don't need anything extra to do) or add it : as a contrib field (they need to compile it as jar and copy it over to : the lib folder)? go ahead and attach what you've got (Yonik's Law of patches) but i'm guessing it would probably make sense if these changes ultimately became part of the core StrField ... there shouldn't be any down side (as long as it doesn't adversely affect the performance for people that don't want to use hte feature) : http://wiki.apache.org/solr/HowToContribute -Hoss Stuck in a dead end job?? Click to start living your dreams by earning an online degree. http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxVwhsXZmn8Fh5mJqQTtwqvDiT5dxityHQk9LzIqLNu2xV1qEwUgbW/
Re: Solr: delta-import, help needed
are you sure your schema.xml has a field to UPDATE docs. to remove deleted docs you must have deletedPkQuery attribute in the root entity On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano wrote: > Hello all, > > I have a table TEST in an Oracle DB with the following columns: URI > (varchar), CONTENT (varchar), CREATION_TIME (date). > > The primary key both in the DB and Solr is URI. > > Here is my data-config.xml: > > > driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@localhost:1521/XE" > user="username" > password="password" > /> > > name="test_item" > pk="URI" > query="select URI,CONTENT from TEST" > * deltaQuery="select URI,CONTENT from TEST where > TO_CHAR(CREATION_TIME,'-MM-DD HH:MI:SS') > > '${dataimporter.last_index_time}'" * > > > > > > > > > The problem is that anytime I perform a delta-import, the index keeps being > populated as if new documents were added. In other words, I am not able to > UPDATE an existing document or REMOVE a document that is not anymore in the > DB. > > What am I missing? How should I specify my deltaQuery? > > Thanks a lot in advance! > > Giovanni > -- --Noble Paul
How to index in Solr?
Hi, I am new user of solr and I don't know how to index can any one tell me setting so that I can make index and search and also how to crawl any web site and local system using solr? Thanks In advance. -Sanjshra -- View this message in context: http://www.nabble.com/How-to-index-in-Solr--tp22573301p22573301.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index in Solr?
On Wed, Mar 18, 2009 at 11:42 AM, Gosavi.Shyam wrote: > > Hi, > I am new user of solr and I don't know how to index > can any one tell me setting so that I can make index and search > and also how to crawl any web site and local system using solr? > I think it will be best to start with the Solr tutorial -- http://lucene.apache.org/solr/tutorial.html Setup an instance of solr and index the example data provided with the solr download to understand how it is used. Also, take a look at the wiki which has a lot of useful documentation -- http://wiki.apache.org/solr Solr is a search server. It is not a crawler. You'd need to use an external crawler such as Nutch or Heretrix or Droids to crawl websites. Search the mailing list archives for past discussions on this topic. -- Regards, Shalin Shekhar Mangar.