Extracting important multi term phrases from the text

Pratik Patel Thu, 15 Nov 2018 08:01:09 -0800

Hello Everyone,

Standard way of tokenizing in solr would divide the text by white space in
solr.


Is there a way by which we can index multi-term phrases like "Machine
Learning" instead of "Machine", "Learning"?
Is it possible to create a specific field type for such phrases which has
its own indexing pipeline? I am open to storing n-grams but these n-grams
would be across terms and not just one term? In other words, I don't want
to store n-grams of the term "machine", I want to store n-grams for a
sentence like below.

"I like machine learning" --> "I like", "like machine", "machine learning"
and so on.....

It seems like Shingle Filter (
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter)
may be used for this. Is there a better alternative?

I want to use this field as an input to Semantic Knowledge Graph. The
plugin works great for words. But now I want to use it for phrases. Any
idea around this would be really helpful.

Thanks a lot!

- Pratik

Extracting important multi term phrases from the text

Reply via email to