Re: Problem with Query Parser
> Hi everybody > > I have a simple but (for me) annoying problem. I'm happy > user of Solr > 1.4 with a small collection of documents. Today one of the > users has > reported that a query returns documents that are > non-pertinent to the > expression. I have spanish, portuguese and english text > inside the > collection. Using the Solr administration interface I've > found that > she was right, if I search for the spanish term > "represion", I found > just only the word root, I mean it returns every document > with the > term "repres". Using the admin-debug search I found this: > > > > name="rawquerystring">description:represion > name="querystring">description:represion > name="parsedquery">description:repres > name="parsedquery_toString">description:repres > > the "ion" part of the term was deleted by the query parser. > The first > question is: I don´t know now where should I see to > correct this, at > the schema.xml or at the solrconfig.xml. > The only thing that is suspicious to me is the > EnglishPorter. Yes you are right. "ion" part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. > I've deleted from the configuration but nothing changes. Should > I reindex the collection to see the changes? Yes re-index is necessary. > Should I delete also from the index section? You should remove English porter from both query and index analyzer. > What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Re: Problem with Query Parser
Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN wrote: > >> Hi everybody >> >> I have a simple but (for me) annoying problem. I'm happy >> user of Solr >> 1.4 with a small collection of documents. Today one of the >> users has >> reported that a query returns documents that are >> non-pertinent to the >> expression. I have spanish, portuguese and english text >> inside the >> collection. Using the Solr administration interface I've >> found that >> she was right, if I search for the spanish term >> "represion", I found >> just only the word root, I mean it returns every document >> with the >> term "repres". Using the admin-debug search I found this: >> >> >> >> > name="rawquerystring">description:represion >> > name="querystring">description:represion >> > name="parsedquery">description:repres >> > name="parsedquery_toString">description:repres >> >> the "ion" part of the term was deleted by the query parser. >> The first >> question is: I don´t know now where should I see to >> correct this, at >> the schema.xml or at the solrconfig.xml. > >> The only thing that is suspicious to me is the >> EnglishPorter. > > Yes you are right. "ion" part of the term was deleted by it. You can verify > this using /admin/analysis.jsp page. It will tell you which > TokenFilterFactory removes it. > >> I've deleted from the configuration but nothing changes. Should >> I reindex the collection to see the changes? > > Yes re-index is necessary. > >> Should I delete also from the index section? > > You should remove English porter from both query and index analyzer. > >> What I will loose deleting English porter? > > You will lose stemming functionality. But since you have spanish, portuguese > and english documents using English porter for all the documents is not > meaningful. > > > > >
Re: Problem with Query Parser
Another way to do multi-lingual indexing is to have a separate field for each language. Solr/Lucene have custom processing for some languages. On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli wrote: > Thanks Ahmet. Definitely using analyzer appears the english porter as > the killer ;) > Regards > German > > On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN wrote: >> >>> Hi everybody >>> >>> I have a simple but (for me) annoying problem. I'm happy >>> user of Solr >>> 1.4 with a small collection of documents. Today one of the >>> users has >>> reported that a query returns documents that are >>> non-pertinent to the >>> expression. I have spanish, portuguese and english text >>> inside the >>> collection. Using the Solr administration interface I've >>> found that >>> she was right, if I search for the spanish term >>> "represion", I found >>> just only the word root, I mean it returns every document >>> with the >>> term "repres". Using the admin-debug search I found this: >>> >>> >>> >>> >> name="rawquerystring">description:represion >>> >> name="querystring">description:represion >>> >> name="parsedquery">description:repres >>> >> name="parsedquery_toString">description:repres >>> >>> the "ion" part of the term was deleted by the query parser. >>> The first >>> question is: I don´t know now where should I see to >>> correct this, at >>> the schema.xml or at the solrconfig.xml. >> >>> The only thing that is suspicious to me is the >>> EnglishPorter. >> >> Yes you are right. "ion" part of the term was deleted by it. You can verify >> this using /admin/analysis.jsp page. It will tell you which >> TokenFilterFactory removes it. >> >>> I've deleted from the configuration but nothing changes. Should >>> I reindex the collection to see the changes? >> >> Yes re-index is necessary. >> >>> Should I delete also from the index section? >> >> You should remove English porter from both query and index analyzer. >> >>> What I will loose deleting English porter? >> >> You will lose stemming functionality. But since you have spanish, portuguese >> and english documents using English porter for all the documents is not >> meaningful. >> >> >> >> >> > -- Lance Norskog goks...@gmail.com
Seattle / NW Hadoop, Lucene, Apache "Cloud Stack" Meetup, Wed Oct 28 6:45pm
Greetings, (You're receiving this e-mail because you're on a DL or I think you'd be interested) It's time for another Hadoop/Lucene/Apache "Cloud" stack meetup! This month it'll be on Wednesday, the 28th, at 6:45 pm. A *huge* thanks for everyone who showed up last month, and to Facebook for sending someone awesome to speak about Hive. We learned quite a bit! For October, we will have someone speaking about Cascading, and how it helps workflow abstraction with MapReduce. Very useful stuff to know. We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're at the University of Washington, Allen Computer Science Center (not Electrical Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours (and there's usually food): we'll have two in-depth talks, and then several "lightning talks" of 5 minutes. We'll then have discussion and 'social time'. Let me know if you're interested in speaking or attending. We'd like to focus on education, so feel free to ask questions. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com -- http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase, and Distributed Lucene Consulting http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Solr 1.4 release candidate
FYI, the latest nightly includes more lucene bug fixes targeted toward Lucene 2.9.1 The (current) full list is here: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/CHANGES.txt?view=markup&pathrev=826563 -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 10:01 AM, Yonik Seeley wrote: > Folks, we've been in code freeze since Monday and a test release > candidate was created yesterday, however it already had to be updated > last night due to a serious bug found in Lucene. > > For now you can use the latest nightly build to get any recent changes > like this: > http://people.apache.org/builds/lucene/solr/nightly/ > > We'll probably release the final bits next week, so in the meantime, > download the latest nightly build and give it a spin! > > -Yonik > http://www.lucidimagination.com >
Re: Boosting of words
Hi Arslan, Yes,I am using Solr as an input to carrot. Yes,I am using org.carrot2.source.solr.SolrDocumentSource just to cluster search results. Currently we are focusing to Solr search results only. In future we will focuse to clustered search results. Now i am using Solr 1.3. Regards Bhaskar --- On Sat, 10/17/09, AHMET ARSLAN wrote: From: AHMET ARSLAN Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Saturday, October 17, 2009, 1:55 PM > I am using Solr 1.3. > I access Solr through carrot and use Java. What is the meaning of accessing solr through carrot? Are you using solr as an input to carrot? Using org.carrot2.source.solr.SolrDocumentSource just to cluster search results? Can we say that you are interested in clustered search results rather than search results them selfs? If yes solr 1.4 will have Grant Ingersoll's ClusteringComponent [1] which uses carrot2 to cluster search results. [1] http://wiki.apache.org/solr/ClusteringComponent