No, I think you're right, i've never seen pipes as comments before...

2011/8/22 Erick Erickson <erickerick...@gmail.com>

> Ahh, you're right. I was waaaay off base there....
>
> So I guess the question is how you know the words aren't being removed? A
> common
> problem is to look at *stored* fields rather than what's actually in
> the inverted index.
> The TermsComponent can help here:
> http://wiki.apache.org/solr/TermsComponent
>
> Erick
>
> On Mon, Aug 22, 2011 at 11:28 AM, Alexei Martchenko
> <ale...@superdownloads.com.br> wrote:
> > That very txt said "A Spanish stop word list. Comments begin with
> vertical
> > bar. Each stop word is at the start of a line."
> >
> > Solr's comments are #s not pipes.
> >
> > Brazilian stopwords file is kinda raw...
> >
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt
> >
> > 2011/8/22 Alexei Martchenko <ale...@superdownloads.com.br>
> >
> >> Funny thing is that stopwords files in the examples shown in
> >> http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using
> >> pipe and other terms. See the spanish one in
> >>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
> >>
> >> I never saw this format before.
> >>
> >> Lucas, try to use only one word per line, no pipes, no trailing spaces.
> and
> >> you can use all spanish accents too. Don't forget to save encoded as
> >> UTF-8... u can do that in Eclipse or even Windows Word can open and save
> >> txts in UTF-8.
> >>
> >>
> >>
> >> 2011/8/22 Erick Erickson <erickerick...@gmail.com>
> >>
> >>> What does the admin/analysis page show? And if you're really
> >>> putting the pipe symbol (|)  in you stopwords file, I have no clue what
> >>> Solr will make of it. The stopwords file format is usually just one
> >>> word per line.....
> >>>
> >>> I'm assuming your name of "string" for the field type is just a
> >>> placeholder
> >>> or you've replaced the example "string" fieldType, right?
> >>>
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez <lucas.mig...@gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > I am trying to use spanish stop words, but the stop words are not
> >>> working:
> >>> >
> >>> > Part of the schema.xml file:
> >>> >
> >>> > <fieldtype name="string"  class="solr.TextField"
> >>> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>> >   <analyzer type="index">
> >>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>> >                <filter class="solr.LowerCaseFilterFactory" />
> >>> >                <filter class="solr.SnowballPorterFilterFactory"
> >>> language="Spanish" />
> >>> >                <filter class="solr.StopFilterFactory"
> >>> words="spanish_stop.txt"
> >>> > enablePositionIncrements="true" ignoreCase="true" />
> >>> >   </analyzer>
> >>> >   <analyzer type="query">
> >>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>> >                <filter class="solr.LowerCaseFilterFactory" />
> >>> >                <filter class="solr.SnowballPorterFilterFactory"
> >>> language="Spanish" />
> >>> >                <filter class="solr.StopFilterFactory"
> >>> words="spanish_stop.txt"
> >>> > enablePositionIncrements="true"  ignoreCase="true" />
> >>> >        </analyzer>
> >>> >   </fieldtype>
> >>> >
> >>>
> ___________________________________________________________________________
> >>> >
> >>> > A piece of the stopwords file:
> >>> >
> >>> > de             |  from, of
> >>> > la             |  the, her
> >>> > que            |  who, that
> >>> > el             |  the
> >>> > en             |  in
> >>> > y              |  and
> >>> > a              |  to
> >>> > los            |  the, them
> >>> > del            |  de + el
> >>> > se             |  himself, from him etc
> >>> > las            |  the, them
> >>> > por            |  for, by, etc
> >>> > un             |  a
> >>> > para           |  for
> >>> > con            |  with
> >>> > no             |  no
> >>> > una            |  a
> >>> > su             |  his, her
> >>> > al             |  a + el
> >>> >  | es         from SER
> >>> > lo             |  him
> >>> >
> >>> >
> >>> > Any idea? Thanks!
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> *Alexei Martchenko* | *CEO* | Superdownloads
> >> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> >> 5083.1018/5080.3535/5080.3533
> >>
> >>
> >
> >
> > --
> >
> > *Alexei Martchenko* | *CEO* | Superdownloads
> > ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> > 5083.1018/5080.3535/5080.3533
> >
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Reply via email to