date:20091018

Re: Problem with Query Parser

2009-10-18 Thread AHMET ARSLAN


> Hi everybody
> 
> I have a simple but (for me) annoying problem. I'm happy
> user of Solr
> 1.4 with a small collection of documents. Today one of the
> users has
> reported that a query returns documents that are
> non-pertinent to the
> expression. I have spanish, portuguese and english text
> inside the
> collection. Using the Solr administration interface I've
> found that
> she was right, if I search for the spanish term
> "represion", I found
> just only the word root, I mean it returns every document
> with the
> term "repres". Using the admin-debug search I found this:
> 
> 
> 
>  name="rawquerystring">description:represion
>  name="querystring">description:represion
>  name="parsedquery">description:repres
>  name="parsedquery_toString">description:repres
> 
> the "ion" part of the term was deleted by the query parser.
> The first
> question is: I don´t know now where should I see to
> correct this, at
> the schema.xml or at the solrconfig.xml.

> The only thing that is suspicious to me is the
> EnglishPorter. 

Yes you are right. "ion" part of the term was deleted by it. You can verify 
this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory 
removes it.

> I've deleted from the configuration but nothing changes. Should
> I reindex the collection to see the changes? 

Yes re-index is necessary.

> Should I delete also from the index section? 

You should remove English porter from both query and index analyzer.

> What I will loose deleting English porter?

You will lose stemming functionality. But since you have spanish, portuguese 
and english documents using English porter for all the documents is not 
meaningful.

Re: Problem with Query Parser

2009-10-18 Thread Germán Biozzoli

Thanks Ahmet. Definitely using analyzer appears the english porter as
the killer ;)
Regards
German

On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN  wrote:
>
>> Hi everybody
>>
>> I have a simple but (for me) annoying problem. I'm happy
>> user of Solr
>> 1.4 with a small collection of documents. Today one of the
>> users has
>> reported that a query returns documents that are
>> non-pertinent to the
>> expression. I have spanish, portuguese and english text
>> inside the
>> collection. Using the Solr administration interface I've
>> found that
>> she was right, if I search for the spanish term
>> "represion", I found
>> just only the word root, I mean it returns every document
>> with the
>> term "repres". Using the admin-debug search I found this:
>>
>>
>> 
>> > name="rawquerystring">description:represion
>> > name="querystring">description:represion
>> > name="parsedquery">description:repres
>> > name="parsedquery_toString">description:repres
>>
>> the "ion" part of the term was deleted by the query parser.
>> The first
>> question is: I don´t know now where should I see to
>> correct this, at
>> the schema.xml or at the solrconfig.xml.
>
>> The only thing that is suspicious to me is the
>> EnglishPorter.
>
> Yes you are right. "ion" part of the term was deleted by it. You can verify 
> this using /admin/analysis.jsp page. It will tell you which 
> TokenFilterFactory removes it.
>
>> I've deleted from the configuration but nothing changes. Should
>> I reindex the collection to see the changes?
>
> Yes re-index is necessary.
>
>> Should I delete also from the index section?
>
> You should remove English porter from both query and index analyzer.
>
>> What I will loose deleting English porter?
>
> You will lose stemming functionality. But since you have spanish, portuguese 
> and english documents using English porter for all the documents is not 
> meaningful.
>
>
>
>
>

Re: Problem with Query Parser

2009-10-18 Thread Lance Norskog

Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some
languages.

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
 wrote:
> Thanks Ahmet. Definitely using analyzer appears the english porter as
> the killer ;)
> Regards
> German
>
> On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN  wrote:
>>
>>> Hi everybody
>>>
>>> I have a simple but (for me) annoying problem. I'm happy
>>> user of Solr
>>> 1.4 with a small collection of documents. Today one of the
>>> users has
>>> reported that a query returns documents that are
>>> non-pertinent to the
>>> expression. I have spanish, portuguese and english text
>>> inside the
>>> collection. Using the Solr administration interface I've
>>> found that
>>> she was right, if I search for the spanish term
>>> "represion", I found
>>> just only the word root, I mean it returns every document
>>> with the
>>> term "repres". Using the admin-debug search I found this:
>>>
>>>
>>> 
>>> >> name="rawquerystring">description:represion
>>> >> name="querystring">description:represion
>>> >> name="parsedquery">description:repres
>>> >> name="parsedquery_toString">description:repres
>>>
>>> the "ion" part of the term was deleted by the query parser.
>>> The first
>>> question is: I don´t know now where should I see to
>>> correct this, at
>>> the schema.xml or at the solrconfig.xml.
>>
>>> The only thing that is suspicious to me is the
>>> EnglishPorter.
>>
>> Yes you are right. "ion" part of the term was deleted by it. You can verify 
>> this using /admin/analysis.jsp page. It will tell you which 
>> TokenFilterFactory removes it.
>>
>>> I've deleted from the configuration but nothing changes. Should
>>> I reindex the collection to see the changes?
>>
>> Yes re-index is necessary.
>>
>>> Should I delete also from the index section?
>>
>> You should remove English porter from both query and index analyzer.
>>
>>> What I will loose deleting English porter?
>>
>> You will lose stemming functionality. But since you have spanish, portuguese 
>> and english documents using English porter for all the documents is not 
>> meaningful.
>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Seattle / NW Hadoop, Lucene, Apache "Cloud Stack" Meetup, Wed Oct 28 6:45pm

2009-10-18 Thread Bradford Stephens

Greetings,

(You're receiving this e-mail because you're on a DL or I think you'd
be interested)

It's time for another Hadoop/Lucene/Apache "Cloud" stack meetup! This
month it'll be on Wednesday, the 28th, at 6:45 pm.

A *huge* thanks for everyone who showed up last month, and to Facebook
for sending someone awesome to speak about Hive. We learned quite a
bit!

For October, we will have someone speaking about Cascading, and how it
helps workflow abstraction with MapReduce. Very useful stuff to know.

We've had great attendance in the past few months, let's keep it up!
I'm always amazed by the things I learn from everyone.

We're at the University of Washington, Allen Computer Science Center
(not Electrical Engineering)

Map: http://www.washington.edu/home/maps/?CSE

Room: 303 -or- the Entry level. If there are changes, signs will be posted.

More Info:

The meetup is about 2 hours (and there's usually food): we'll have two
in-depth talks, and then several "lightning talks" of 5 minutes. We'll
then have discussion and 'social time'. Let me know if you're
interested in speaking or attending. We'd like to focus on education,
so feel free to ask questions.

Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com

-- 
http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase,
and Distributed Lucene Consulting

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Solr 1.4 release candidate

2009-10-18 Thread Yonik Seeley

FYI, the latest nightly includes more lucene bug fixes targeted toward
Lucene 2.9.1
The (current) full list is here:
http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/CHANGES.txt?view=markup&pathrev=826563

-Yonik
http://www.lucidimagination.com


On Wed, Oct 14, 2009 at 10:01 AM, Yonik Seeley
 wrote:
> Folks, we've been in code freeze since Monday and a test release
> candidate was created yesterday, however it already had to be updated
> last night due to a serious bug found in Lucene.
>
> For now you can use the latest nightly build to get any recent changes
> like this:
> http://people.apache.org/builds/lucene/solr/nightly/
>
> We'll probably release the final bits next week, so in the meantime,
> download the latest nightly build and give it a spin!
>
> -Yonik
> http://www.lucidimagination.com
>

Re: Boosting of words

2009-10-18 Thread bhaskar chandrasekar

Hi Arslan,

Yes,I am using Solr as an input to carrot.
Yes,I am using org.carrot2.source.solr.SolrDocumentSource just to cluster 
search results.
Currently we are focusing to Solr search results only.
In future we will focuse to clustered search results.
Now i am using Solr 1.3.

Regards
Bhaskar
--- On Sat, 10/17/09, AHMET ARSLAN  wrote:

From: AHMET ARSLAN 
Subject: Re: Boosting of words
To: solr-user@lucene.apache.org
Date: Saturday, October 17, 2009, 1:55 PM

> I am using Solr 1.3.
> I access Solr through carrot and use Java.

What is the meaning of accessing solr through carrot?
Are you using solr as an input to carrot? Using 
org.carrot2.source.solr.SolrDocumentSource just to cluster search results?
Can we say that you are interested in clustered search results rather than 
search results them selfs? If yes solr 1.4 will have Grant Ingersoll's 
ClusteringComponent [1] which uses carrot2 to cluster search results.

[1] http://wiki.apache.org/solr/ClusteringComponent

Re: Problem with Query Parser

Re: Problem with Query Parser

Re: Problem with Query Parser

Seattle / NW Hadoop, Lucene, Apache "Cloud Stack" Meetup, Wed Oct 28 6:45pm

Re: Solr 1.4 release candidate

Re: Boosting of words

6 matches

Site Navigation

Mail list logo

Footer information