No, I think you're right, i've never seen pipes as comments before... 2011/8/22 Erick Erickson <erickerick...@gmail.com>
> Ahh, you're right. I was waaaay off base there.... > > So I guess the question is how you know the words aren't being removed? A > common > problem is to look at *stored* fields rather than what's actually in > the inverted index. > The TermsComponent can help here: > http://wiki.apache.org/solr/TermsComponent > > Erick > > On Mon, Aug 22, 2011 at 11:28 AM, Alexei Martchenko > <ale...@superdownloads.com.br> wrote: > > That very txt said "A Spanish stop word list. Comments begin with > vertical > > bar. Each stop word is at the start of a line." > > > > Solr's comments are #s not pipes. > > > > Brazilian stopwords file is kinda raw... > > > http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt > > > > 2011/8/22 Alexei Martchenko <ale...@superdownloads.com.br> > > > >> Funny thing is that stopwords files in the examples shown in > >> http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using > >> pipe and other terms. See the spanish one in > >> > http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt > >> > >> I never saw this format before. > >> > >> Lucas, try to use only one word per line, no pipes, no trailing spaces. > and > >> you can use all spanish accents too. Don't forget to save encoded as > >> UTF-8... u can do that in Eclipse or even Windows Word can open and save > >> txts in UTF-8. > >> > >> > >> > >> 2011/8/22 Erick Erickson <erickerick...@gmail.com> > >> > >>> What does the admin/analysis page show? And if you're really > >>> putting the pipe symbol (|) in you stopwords file, I have no clue what > >>> Solr will make of it. The stopwords file format is usually just one > >>> word per line..... > >>> > >>> I'm assuming your name of "string" for the field type is just a > >>> placeholder > >>> or you've replaced the example "string" fieldType, right? > >>> > >>> > >>> Best > >>> Erick > >>> > >>> On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez <lucas.mig...@gmail.com> > >>> wrote: > >>> > Hi, > >>> > > >>> > I am trying to use spanish stop words, but the stop words are not > >>> working: > >>> > > >>> > Part of the schema.xml file: > >>> > > >>> > <fieldtype name="string" class="solr.TextField" > >>> > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > >>> > <analyzer type="index"> > >>> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>> > <filter class="solr.LowerCaseFilterFactory" /> > >>> > <filter class="solr.SnowballPorterFilterFactory" > >>> language="Spanish" /> > >>> > <filter class="solr.StopFilterFactory" > >>> words="spanish_stop.txt" > >>> > enablePositionIncrements="true" ignoreCase="true" /> > >>> > </analyzer> > >>> > <analyzer type="query"> > >>> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>> > <filter class="solr.LowerCaseFilterFactory" /> > >>> > <filter class="solr.SnowballPorterFilterFactory" > >>> language="Spanish" /> > >>> > <filter class="solr.StopFilterFactory" > >>> words="spanish_stop.txt" > >>> > enablePositionIncrements="true" ignoreCase="true" /> > >>> > </analyzer> > >>> > </fieldtype> > >>> > > >>> > ___________________________________________________________________________ > >>> > > >>> > A piece of the stopwords file: > >>> > > >>> > de | from, of > >>> > la | the, her > >>> > que | who, that > >>> > el | the > >>> > en | in > >>> > y | and > >>> > a | to > >>> > los | the, them > >>> > del | de + el > >>> > se | himself, from him etc > >>> > las | the, them > >>> > por | for, by, etc > >>> > un | a > >>> > para | for > >>> > con | with > >>> > no | no > >>> > una | a > >>> > su | his, her > >>> > al | a + el > >>> > | es from SER > >>> > lo | him > >>> > > >>> > > >>> > Any idea? Thanks! > >>> > > >>> > >> > >> > >> > >> -- > >> > >> *Alexei Martchenko* | *CEO* | Superdownloads > >> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) > >> 5083.1018/5080.3535/5080.3533 > >> > >> > > > > > > -- > > > > *Alexei Martchenko* | *CEO* | Superdownloads > > ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) > > 5083.1018/5080.3535/5080.3533 > > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533