https://github.com/apache/lucene-solr/blob/lucene_solr_4_9_0/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/StopAnalyzer.java#L51
If you don't set the attribute in XML file, it falls back to the default definitions. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 3:16 PM, Aman Tandon <amantandon...@gmail.com> wrote: > Hi jack, > > > it will use the internal *Lucene hardwired list* of stop words > > > I am unaware of this, could you please provide the more information about > this. > > > With Regards > Aman Tandon > > > On Tue, Jul 15, 2014 at 7:21 AM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> You could try experimenting with CommonGramsFilterFactory and >> CommonGramsQueryFilter (slightly different). There is actually a lot >> of cool analyzers bundled with Solr. You can find full list on my site >> at: http://www.solr-start.com/info/analyzers >> >> Regards, >> Alex. >> Personal: http://www.outerthoughts.com/ and @arafalov >> Solr resources: http://www.solr-start.com/ and @solrstart >> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 >> >> >> On Tue, Jul 15, 2014 at 8:42 AM, Teague James <teag...@insystechinc.com> >> wrote: >> > Alex, >> > >> > Thanks! Great suggestion. I figured out that it was the >> EdgeNGramFilterFactory. Taking that out of the mix did it. >> > >> > -Teague >> > >> > -----Original Message----- >> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >> > Sent: Monday, July 14, 2014 9:14 PM >> > To: solr-user >> > Subject: Re: Of, To, and Other Small Words >> > >> > Have you tried the Admin UI's Analyze screen. Because it will show you >> what happens to the text as it progresses through the tokenizers and >> filters. No need to reindex. >> > >> > Regards, >> > Alex. >> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: >> http://www.solr-start.com/ and @solrstart Solr popularizers community: >> https://www.linkedin.com/groups?gid=6713853 >> > >> > >> > On Tue, Jul 15, 2014 at 8:10 AM, Teague James <teag...@insystechinc.com> >> wrote: >> >> Hi Anshum, >> >> >> >> Thanks for replying and suggesting this, but the field type I am using >> (a modified text_general) in my schema has the file set to 'stopwords.txt'. >> >> >> >> <fieldType name="text_general" class="solr.TextField" >> positionIncrementGap="100"> >> >> <analyzer type="index"> >> >> <tokenizer >> class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" words="stopwords.txt" /> >> >> <!-- in this example, we will only use synonyms >> at query time >> >> <filter class="solr.SynonymFilterFactory" >> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>--> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> <!-- CHANGE: The NGramFilterFactory was added >> to provide partial word search. This can be changed to >> >> EdgeNGramFilterFactory side="front" to only >> match front sided partial searches if matching any >> >> part of a word is undesireable.--> >> >> <filter class="solr.NGramFilterFactory" >> minGramSize="3" maxGramSize="10" /> >> >> <!-- CHANGE: The PorterStemFilterFactory was >> added to allow matches for 'cat' and 'cats' by searching for 'cat' --> >> >> <filter class="solr.PorterStemFilterFactory"/> >> >> </analyzer> >> >> <analyzer type="query"> >> >> <tokenizer >> class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" words="stopwords.txt" /> >> >> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> <!-- CHANGE: The PorterStemFilterFactory was >> added to allow matches for 'cat' and 'cats' by searching for 'cat' --> >> >> <filter class="solr.PorterStemFilterFactory"/> >> >> </analyzer> >> >> </fieldType> >> >> >> >> Just to be double sure I cleared the list in stopwords_en.txt, >> restarted Solr, re-indexed, and searched with still zero results. Any other >> suggestions on where I might be able to control this behavior? >> >> >> >> -Teague >> >> >> >> >> >> -----Original Message----- >> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net] >> >> Sent: Monday, July 14, 2014 4:04 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: Of, To, and Other Small Words >> >> >> >> Hi Teague, >> >> >> >> The StopFilterFactory (which I think you're using) by default uses >> lang/stopwords_en.txt (which wouldn't be empty if you check). >> >> What you're looking at is the stopword.txt. You could either empty that >> file out or change the field type for your field. >> >> >> >> >> >> On Mon, Jul 14, 2014 at 12:53 PM, Teague James < >> teag...@insystechinc.com> wrote: >> >>> Hello all, >> >>> >> >>> I am working with Solr 4.9.0 and am searching for phrases that >> >>> contain words like "of" or "to" that Solr seems to be ignoring at >> index time. >> >>> Here's what I tried: >> >>> >> >>> curl http://localhost/solr/update?commit=true -H "Content-Type: >> text/xml" >> >>> --data-binary '<add><doc><field name="id">100</field><field >> >>> name="content">blah blah blah knowledge of science blah blah >> >>> blah</field></doc></add>' >> >>> >> >>> Then, using a broswer: >> >>> >> >>> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq= >> >>> i >> >>> d:100 >> >>> >> >>> I get zero hits. Search for "knowledge" or "science" and I'll get hits. >> >>> "knowledge of" or "of science" and I get zero hits. I don't want to >> >>> use proximity if I can avoid it, as this may introduce too many >> >>> undesireable results. Stopwords.txt is blank, yet clearly Solr is >> ignoring "of" and "to" >> >>> and possibly more words that I have not discovered through testing >> >>> yet. Is there some other configuration file that contains these small >> >>> words? Is there any way to force Solr to pay attention to them and >> >>> not drop them from the phrase? Any advice is appreciated! Thanks! >> >>> >> >>> -Teague >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> >> >> Anshum Gupta >> >> http://www.anshumgupta.net >> >> >> > >>