Also, avoid stemming URLs. I used a stemmer that turned my "best.com" URL into "good.com". The Lucene StandardAnalyzer works pretty hard to avoid that. --wunder
On 12/13/06 9:33 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > When indexing (and searching), make sure you are using an Analyzer that > lower-cases (or upper-cases) tokens. > These are from Lucene, so Solr has them, too: > ./src/java/org/apache/lucene/analysis/LowerCaseTokenizer.java > ./src/java/org/apache/lucene/analysis/LowerCaseFilter.java > > Otis > > ----- Original Message ---- > From: Wade Leftwich <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, December 13, 2006 11:32:11 PM > Subject: Case sensitivity on hostnames and email addresses > > I've run into some unexpected case sensitivity on searches, at least > unexpected by me. > > If you index a text field containing this sentence: > > A sentence containing CamelCase words by [EMAIL PROTECTED] is found > at StudlyCaps.org > > The document will be found by searching for "camelcase" but not for > "[EMAIL PROTECTED]" or "studlycaps.org". > > This happens with the Standard or the DisMax query handler. > > A bit of a problem for me, because I'm indexing a bunch of business > magazines, and domain names are frequently capitalized, often in CamelCase. > > Is this maybe a bug? Or a WAD? > > -- Wade Leftwich > Ithaca, NY > > > >