Can't you use solr.PatternTokenizerFactory for this task?
On Friday, January 9, 2015 1:48 PM, tomas.kalas <kala...@email.cz> wrote: Hello, i have a question what i have to use tokenizer or filter ? I need separate 2 chanels. I wrote this here earlier, but realize it with solr basic tools it is not probably possible. And i',m trying to write own tool for this task. I have this input <d1>Hello</d1><d2>Hello</d2><d1>How are you ?</d1><d2>Fine and you're?</d2> .... d1 - direction1 d2 - direction2 and i want to output only d1 and between this result search some words, for example output should be: Output: [<d1>Hello</d1>,<d1>How are you?</d1><d1>....</d1>....] I wrote my idea in java, but i dont know where to incorporate it. If to Filter or Tokenizer and some advices how to start? I probably must extends some lucene library and include it easily modificated there isn't it ? Here is my code: package test1; import java.util.Arrays; public class Test1 { public static void main(String[] args) { String dialogue = "<d1>Hello</d1><d2>Hello</d2><d1>How are you ?</d1><d2>Fine and you're?</d2> ...."; String[] input = dialogue.split("(?<=</d[12]>)\\d*(?=<d[12]>)"); int countD1 = 0; for (String input1 : input) { if (input1.startsWith("<d1>")) { countD1++; } } String [] d1 = new String[countD1]; int array = 0; for (String input1 : input) { if (input1.startsWith("<d1>")) { d1[array] = input1; array++; } } String d1Out = Arrays.toString(d1); System.out.println(d1Out); //Return s1Out } } Thanks for you advices. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346.html Sent from the Solr - User mailing list archive at Nabble.com.