Re: Tokenising on Each Letter

2010-08-24 Thread Erick Erickson
t a quick test and it > definitely seems to work for the examples I've gave. > > Thanks again, > > Scott > > > From: Nikolas Tautenhahn [via Lucene] > Sent: Monday, August 23, 2010 3:14 PM > To: Scottie > Subject: Re: Tokenising on Each Letter > > > Hi Scotti

Re: Tokenising on Each Letter

2010-08-23 Thread Scottie
Nikolas, thanks a lot for that, I've just gave it a quick test and it definitely seems to work for the examples I've gave. Thanks again, Scott From: Nikolas Tautenhahn [via Lucene] Sent: Monday, August 23, 2010 3:14 PM To: Scottie Subject: Re: Tokenising on Each Letter

Re: Tokenising on Each Letter

2010-08-23 Thread Nikolas Tautenhahn
Hi Scottie, > Could you elaborate about N gram for me, based on my schema? just a quick reply: > positionIncrementGap="100"> > > > > > generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1" splitOnNumerics="0

Re: Tokenising on Each Letter

2010-08-23 Thread Scottie
Probably a good idea to post the relevant information! I guess I thought it would be a really obvious answer but it seems its a bit more complex ;) It seems you may be correct about the catenat

Re: Tokenising on Each Letter

2010-08-22 Thread Erick Erickson
I suspect (though I can't say for sure since you didn't include your schema definition, both type and actual field def) that your problem stems from WordDelimiterFilterFactory options. The default in the schema usually has catenateall=0. In which case you have the tokens "ads" and "12" but not "ads