The main one is that you can get an explosion in the number of terms, depending on your input, especially if you have things that aren't regular text. Imagine partone-1 partone-2 partone-3
parttwo-1 parttwo-2 parttwo-3 if catenateall is set to 0, you;d get 5 tokens here. If it was set to 1 you'd get 11 tokens. Which doesn't seem like a lot until you have hundreds of thousands of patterns like this. So, give it a whirl and see what pops out with your particular corpus, but keep an eye on the number of unique terms that end up in the field. Best Erick On Thu, Nov 17, 2011 at 12:18 PM, Brendan Grainger <brendan.grain...@gmail.com> wrote: > Hi, > > The default for catenateAll is 0 which we've been using on the > WordDelimiterFilter. What would be the possibly negative implications of > setting this to 1? So that: > > wi-fi-800 > > would produce the tokens: > > wi, fi, wifi, 800, wifi800 > > for example? > > Thanks