Hello Everyone, Standard way of tokenizing in solr would divide the text by white space in solr.
Is there a way by which we can index multi-term phrases like "Machine Learning" instead of "Machine", "Learning"? Is it possible to create a specific field type for such phrases which has its own indexing pipeline? I am open to storing n-grams but these n-grams would be across terms and not just one term? In other words, I don't want to store n-grams of the term "machine", I want to store n-grams for a sentence like below. "I like machine learning" --> "I like", "like machine", "machine learning" and so on..... It seems like Shingle Filter ( https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter) may be used for this. Is there a better alternative? I want to use this field as an input to Semantic Knowledge Graph. The plugin works great for words. But now I want to use it for phrases. Any idea around this would be really helpful. Thanks a lot! - Pratik