Beating Hossman to the punch....
http://people.apache.org/~hossman/#threadhijack<http://people.apache.org/%7Ehossman/#threadhijack> Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking On Feb 18, 2008 10:48 AM, Reece <[EMAIL PROTECTED]> wrote: > Hello Everyone, > > I'm having some issues getting SOLR to work with our data. I'm using > it to index incident data for our technical support department. The > two main issues: > > 1) As an example, searching for "binarydata_groupdocument_fk" returns > nothing, while searching for "BinaryData_GroupDocument_FK" returns > results. I have the lowercasefilterfactory applied to both the index > and query analyzers. Does this not actually set everything to lower > case? From the wiki at > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says > "Creates tokens by lowercasing all letters and dropping non-letters" > but that does not seem to be happening here. > > 2) Some of our data is one sentence. Some is over 5 MB of text. When > searching for a term, it's returning the one sentence data first > because the fieldNorm is so different (0.4 for one, 0.002 for others). > Is there a way to disable using the fieldnorm in the score > calculation? An alternative I tried was posting parts of the data in > as different values of the field (so having multiple tags of that > field-name in the add xml post), but that appeared to have zero effect > on the results - even the querydebugger showed the exact same > calculation for the search. Does anyone know how to disable the > fieldnorm, or have the score created from adding the scores from each > value of a multivalued field? > > 3) I discovered that searching for '"certificate not found"' (using > the double quotes for a phrase here) did not return any results, even > though the phrase did exist (and was lower case originally too, so > different than my first issue). I discovered it was because of the > stopword "not", but the same stopfilterfactory was applied to both the > index and query analyzers. Am I doing something wrong there? As a > workaround I'm having php manually removing stopwords from the > querystring, which is a real pain. > > Here is my fieldtype I do the actual searches on: > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > Any help or advice would be greatly appreciated, thanks! > > -Reece >