Ahh, you're right. I was waaaay off base there.... So I guess the question is how you know the words aren't being removed? A common problem is to look at *stored* fields rather than what's actually in the inverted index. The TermsComponent can help here: http://wiki.apache.org/solr/TermsComponent
Erick On Mon, Aug 22, 2011 at 11:28 AM, Alexei Martchenko <ale...@superdownloads.com.br> wrote: > That very txt said "A Spanish stop word list. Comments begin with vertical > bar. Each stop word is at the start of a line." > > Solr's comments are #s not pipes. > > Brazilian stopwords file is kinda raw... > http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt > > 2011/8/22 Alexei Martchenko <ale...@superdownloads.com.br> > >> Funny thing is that stopwords files in the examples shown in >> http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using >> pipe and other terms. See the spanish one in >> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt >> >> I never saw this format before. >> >> Lucas, try to use only one word per line, no pipes, no trailing spaces. and >> you can use all spanish accents too. Don't forget to save encoded as >> UTF-8... u can do that in Eclipse or even Windows Word can open and save >> txts in UTF-8. >> >> >> >> 2011/8/22 Erick Erickson <erickerick...@gmail.com> >> >>> What does the admin/analysis page show? And if you're really >>> putting the pipe symbol (|) in you stopwords file, I have no clue what >>> Solr will make of it. The stopwords file format is usually just one >>> word per line..... >>> >>> I'm assuming your name of "string" for the field type is just a >>> placeholder >>> or you've replaced the example "string" fieldType, right? >>> >>> >>> Best >>> Erick >>> >>> On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez <lucas.mig...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > I am trying to use spanish stop words, but the stop words are not >>> working: >>> > >>> > Part of the schema.xml file: >>> > >>> > <fieldtype name="string" class="solr.TextField" >>> > positionIncrementGap="100" autoGeneratePhraseQueries="true"> >>> > <analyzer type="index"> >>> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> > <filter class="solr.LowerCaseFilterFactory" /> >>> > <filter class="solr.SnowballPorterFilterFactory" >>> language="Spanish" /> >>> > <filter class="solr.StopFilterFactory" >>> words="spanish_stop.txt" >>> > enablePositionIncrements="true" ignoreCase="true" /> >>> > </analyzer> >>> > <analyzer type="query"> >>> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> > <filter class="solr.LowerCaseFilterFactory" /> >>> > <filter class="solr.SnowballPorterFilterFactory" >>> language="Spanish" /> >>> > <filter class="solr.StopFilterFactory" >>> words="spanish_stop.txt" >>> > enablePositionIncrements="true" ignoreCase="true" /> >>> > </analyzer> >>> > </fieldtype> >>> > >>> ___________________________________________________________________________ >>> > >>> > A piece of the stopwords file: >>> > >>> > de | from, of >>> > la | the, her >>> > que | who, that >>> > el | the >>> > en | in >>> > y | and >>> > a | to >>> > los | the, them >>> > del | de + el >>> > se | himself, from him etc >>> > las | the, them >>> > por | for, by, etc >>> > un | a >>> > para | for >>> > con | with >>> > no | no >>> > una | a >>> > su | his, her >>> > al | a + el >>> > | es from SER >>> > lo | him >>> > >>> > >>> > Any idea? Thanks! >>> > >>> >> >> >> >> -- >> >> *Alexei Martchenko* | *CEO* | Superdownloads >> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) >> 5083.1018/5080.3535/5080.3533 >> >> > > > -- > > *Alexei Martchenko* | *CEO* | Superdownloads > ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) > 5083.1018/5080.3535/5080.3533 >