Is there any workaround in Solr/Carrot2 So that we could pass tokens that'd been filtered with customer tokenizer/filters instead of rawtext that it currently uses for clustering ?
I read an issue in following link too . https://issues.apache.org/jira/browse/SOLR-2917 Is writing our own parsers to filter text documents before indexing to SOLR could be only the right approach currently ? please let me know if anyone have come across this issue and have other better suggestions? -- Chandan Tamrakar * *