Thanks Shawn. Just ran the analysis between 4.6 and 4.10, there seems to be only difference between the outputs positionLength value is set in 4.10. Does that mean anything.
Version 4.10 SF text raw_bytes start end positionLength type position message [6d 65 73 73 61 67 65] 0 7 1 ALNUM 1 Version 4.6 SF text raw_bytes type start end position message [6d 65 73 73 61 67 65] ALNUM 0 7 1 Thanks, Rishi. -----Original Message----- From: Shawn Heisey <apa...@elyograg.org> To: solr-user <solr-user@lucene.apache.org> Sent: Fri, Feb 20, 2015 6:51 pm Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 4:24 PM, Rishi Easwaran wrote: > Also, the tokenizer we use is very similar to the following. > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex > > > From the looks of it the text is being indexed as a single token and not broken across whitespace. I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the "verbose" flag and type the actual field contents into the "index" side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn