Re: Problem using stop words

Alexei Martchenko Mon, 22 Aug 2011 08:29:20 -0700

That very txt said "A Spanish stop word list. Comments begin with vertical
bar. Each stop word is at the start of a line."


Solr's comments are #s not pipes.

Brazilian stopwords file is kinda raw...
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt

2011/8/22 Alexei Martchenko <ale...@superdownloads.com.br>

> Funny thing is that stopwords files in the examples shown in
> http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using
> pipe and other terms. See the spanish one in
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
>
> I never saw this format before.
>
> Lucas, try to use only one word per line, no pipes, no trailing spaces. and
> you can use all spanish accents too. Don't forget to save encoded as
> UTF-8... u can do that in Eclipse or even Windows Word can open and save
> txts in UTF-8.
>
>
>
> 2011/8/22 Erick Erickson <erickerick...@gmail.com>
>
>> What does the admin/analysis page show? And if you're really
>> putting the pipe symbol (|)  in you stopwords file, I have no clue what
>> Solr will make of it. The stopwords file format is usually just one
>> word per line.....
>>
>> I'm assuming your name of "string" for the field type is just a
>> placeholder
>> or you've replaced the example "string" fieldType, right?
>>
>>
>> Best
>> Erick
>>
>> On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez <lucas.mig...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am trying to use spanish stop words, but the stop words are not
>> working:
>> >
>> > Part of the schema.xml file:
>> >
>> > <fieldtype name="string"  class="solr.TextField"
>> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
>> >   <analyzer type="index">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >                <filter class="solr.LowerCaseFilterFactory" />
>> >                <filter class="solr.SnowballPorterFilterFactory"
>> language="Spanish" />
>> >                <filter class="solr.StopFilterFactory"
>> words="spanish_stop.txt"
>> > enablePositionIncrements="true" ignoreCase="true" />
>> >   </analyzer>
>> >   <analyzer type="query">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >                <filter class="solr.LowerCaseFilterFactory" />
>> >                <filter class="solr.SnowballPorterFilterFactory"
>> language="Spanish" />
>> >                <filter class="solr.StopFilterFactory"
>> words="spanish_stop.txt"
>> > enablePositionIncrements="true"  ignoreCase="true" />
>> >        </analyzer>
>> >   </fieldtype>
>> >
>> ___________________________________________________________________________
>> >
>> > A piece of the stopwords file:
>> >
>> > de             |  from, of
>> > la             |  the, her
>> > que            |  who, that
>> > el             |  the
>> > en             |  in
>> > y              |  and
>> > a              |  to
>> > los            |  the, them
>> > del            |  de + el
>> > se             |  himself, from him etc
>> > las            |  the, them
>> > por            |  for, by, etc
>> > un             |  a
>> > para           |  for
>> > con            |  with
>> > no             |  no
>> > una            |  a
>> > su             |  his, her
>> > al             |  a + el
>> >  | es         from SER
>> > lo             |  him
>> >
>> >
>> > Any idea? Thanks!
>> >
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: Problem using stop words

Reply via email to