Hello Everyone,

Standard way of tokenizing in solr would divide the text by white space in
solr.

Is there a way by which we can index multi-term phrases like "Machine
Learning" instead of "Machine", "Learning"?
Is it possible to create a specific field type for such phrases which has
its own indexing pipeline? I am open to storing n-grams but these n-grams
would be across terms and not just one term? In other words, I don't want
to store n-grams of the term "machine", I want to store n-grams for a
sentence like below.

"I like machine learning" --> "I like", "like machine", "machine learning"
and so on.....

It seems like Shingle Filter (
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter)
may be used for this. Is there a better alternative?

I want to use this field as an input to Semantic Knowledge Graph. The
plugin works great for words. But now I want to use it for phrases. Any
idea around this would be really helpful.

Thanks a lot!

- Pratik

Reply via email to