Also, avoid stemming URLs. I used a stemmer that turned my
"best.com" URL into "good.com". The Lucene StandardAnalyzer
works pretty hard to avoid that. --wunder

On 12/13/06 9:33 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> When indexing (and searching), make sure you are using an Analyzer that
> lower-cases (or upper-cases) tokens.
> These are from Lucene, so Solr has them, too:
>   ./src/java/org/apache/lucene/analysis/LowerCaseTokenizer.java
>   ./src/java/org/apache/lucene/analysis/LowerCaseFilter.java
> 
> Otis
> 
> ----- Original Message ----
> From: Wade Leftwich <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, December 13, 2006 11:32:11 PM
> Subject: Case sensitivity on hostnames and email addresses
> 
> I've run into some unexpected case sensitivity on searches, at least
> unexpected by me.
> 
> If you index a text field containing this sentence:
> 
> A sentence containing CamelCase words by [EMAIL PROTECTED] is found
> at StudlyCaps.org
> 
> The document will be found by searching for "camelcase" but not for
> "[EMAIL PROTECTED]" or "studlycaps.org".
> 
> This happens with the Standard or the DisMax query handler.
> 
> A bit of a problem for me, because I'm indexing a bunch of business
> magazines, and domain names are frequently capitalized, often in CamelCase.
> 
> Is this maybe a bug? Or a WAD?
> 
> -- Wade Leftwich
> Ithaca, NY
> 
> 
> 
> 

Reply via email to