Re: Special character and wildcard matching

2015-02-24 Thread Alexandre Rafalovitch
On 24 February 2015 at 15:50, Jack Krupansky wrote: > It's a string field, so there shouldn't be any analysis. (read back in the > thread for the field and field type.) It's a multi-term expansion. There is _some_ analysis one way or another :-) Solr Analyzers, Tokenizers, Filters, URPs and

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
It's a string field, so there shouldn't be any analysis. (read back in the thread for the field and field type.) -- Jack Krupansky On Tue, Feb 24, 2015 at 3:19 PM, Alexandre Rafalovitch wrote: > What happens if the query does not have wildcard expansion (*)? If the > behavior is correct, then t

Re: Special character and wildcard matching

2015-02-24 Thread Alexandre Rafalovitch
What happens if the query does not have wildcard expansion (*)? If the behavior is correct, then the issue is somehow with the MultitermQueryAnalysis (a hidden automatically generated analyzer chain): http://wiki.apache.org/solr/MultitermQueryAnalysis Which would still make it a bug, but at least

Re: Special character and wildcard matching

2015-02-24 Thread Arun Rangarajan
Thanks, Jack. I have filed a tkt: https://issues.apache.org/jira/browse/SOLR-7154 On Tue, Feb 24, 2015 at 11:43 AM, Jack Krupansky wrote: > Thanks. That at least verifies that the accented e is stored in the field. > I don't see anything wrong here, so it is as if the Lucene prefix query was >

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
Thanks. That at least verifies that the accented e is stored in the field. I don't see anything wrong here, so it is as if the Lucene prefix query was mapping the accented characters. It's not supposed to do that, but... Go ahead and file a Jira bug. Include all of the details that you provided in

Re: Special character and wildcard matching

2015-02-24 Thread Arun Rangarajan
Exact query: /select?q=raw_name:beyonce*&wt=json&fl=raw_name Response: { "responseHeader": {"status": 0,"QTime": 0,"params": { "fl": "raw_name", "q": "raw_name:beyonce*", "wt": "json" } }, "response": {"numFound": 2,"start": 0,"docs": [ {"raw_n

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
Please post the info I requested - the exact query, and the Solr response. -- Jack Krupansky On Tue, Feb 24, 2015 at 12:45 PM, Arun Rangarajan wrote: > In our case, the lower-casing is happening in a custom Java indexer code, > via Java's String.toLowerCase() method. > > I used the analysis too

Re: Special character and wildcard matching

2015-02-24 Thread Arun Rangarajan
In our case, the lower-casing is happening in a custom Java indexer code, via Java's String.toLowerCase() method. I used the analysis tool in Solr admin (with Jetty). I believe the raw bytes explain this. Attached are the results for beyonce in file beyonce_no_spl_chars.JPG and beyoncé in file b

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
But how is that lowercasing occurring? I mean, solr.StrField doesn't do that. Some containers default to automatically mapping accented characters, so that the accented "e" would then get indexed as a normal "e", and then your wildcard would match it, and an accented "e" in a query would get mappe

Re: Special character and wildcard matching

2015-02-23 Thread Arun Rangarajan
Yes, it is a string field and not a text field. Lower-casing done to do case-insensitive matching. On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky wrote: > Is it really a string field - as opposed to a text field? Show us the field > and field type. > > Besides, if it really were a "raw" nam

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a "raw" name, wouldn't that be a capital "B"? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan wrote: > I have a string field raw_name like this in my docume