Can't you use solr.PatternTokenizerFactory for this task?


On Friday, January 9, 2015 1:48 PM, tomas.kalas <kala...@email.cz> wrote:
Hello, i have a question what i have to use tokenizer or filter ?
I need separate 2 chanels. I wrote this here earlier, but realize it with
solr basic tools it is not probably possible. And i',m trying to write own
tool for this task.
I have this input <d1>Hello</d1><d2>Hello</d2><d1>How are you ?</d1><d2>Fine
and you're?</d2> ....
d1 - direction1
d2 - direction2
and i want to output only d1 and between this result search some words, for
example output should be:
Output: [<d1>Hello</d1>,<d1>How are you?</d1><d1>....</d1>....] 

I wrote my idea in java, but i dont know where  to incorporate it. If to
Filter or Tokenizer and some advices how to start? I probably must extends
some lucene library and include it easily modificated there isn't it ?

Here is my code:

package test1;
import java.util.Arrays;

public class Test1 {


    public static void main(String[] args) {
        String dialogue = "<d1>Hello</d1><d2>Hello</d2><d1>How are you
?</d1><d2>Fine and you're?</d2> ....";

        String[] input = dialogue.split("(?<=</d[12]>)\\d*(?=<d[12]>)");

        int countD1 = 0;

        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
                countD1++;
            }
        }
        String [] d1 = new String[countD1];
        int array = 0;
        
        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
                d1[array] = input1;
                array++;
            }
        }
        String d1Out = Arrays.toString(d1);
        System.out.println(d1Out); 
//Return s1Out
         }
}

Thanks for you advices. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to