Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
this is the full source code, but be warned, i'm not a java developer, and i have no background in lucine/solr development: // ConcatFilter import java.io.IOException; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenS

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin
Thanks Robert, I had been trying to get your ConcatFilter to work, but I'm not sure what i need in the classpath and where Token comes from. Will check the thread you mention. Best Nick On 11 Nov 2010, at 18:13, Robert Gründler wrote: > I've posted a ConcaFilter in my previous mail which does

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
I've posted a ConcaFilter in my previous mail which does concatenate tokens. This works fine, but i realized that what i wanted to achieve is implemented easier in another way (by using 2 separate field types). Have a look at a previous mail i wrote to the list and the reply from Ahmet Arslan (

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin
Hi Robert, All, I have a similar problem, here is my fieldType, http://paste.pocoo.org/show/289910/ I want to include stopword removal and lowercase the incoming terms. The idea being to take, "Foo Bar Baz Ltd" and turn it into "foobarbaz" for the EdgeNgram filter factory. If anyone can tell me

Re: Concatenate multiple tokens into one

2010-11-10 Thread Robert Gründler
On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote: > Are you sure you really want to throw out stopwords for your use case? I > don't think autocompletion will work how you want if you do. in our case i think it makes sense. the content is targetting the electronic music / dj scene, so we

RE: Concatenate multiple tokens into one

2010-11-10 Thread Jonathan Rochkind
Are you sure you really want to throw out stopwords for your use case? I don't think autocompletion will work how you want if you do. And if you don't... then why use the WhitespaceTokenizer and then try to jam the tokens back together? Why not just NOT tokenize in the first place. Use the Ke