Well, you have a crawling and extraction pipeline. You can probably inject
a classification algorithm somewhere in there, possibly NLP trained on
manual seed. Or just a list of typical words as a start.
This is kind of pre-Solr stage though.
Regards,
Alex
On 4 Jan 2016 7:37 pm, wrote:
> Hi
There is no way that you can do that in solr.
You'll have to write something at the app level, where you're crawling
your docs or write a custom update handler that will preprocess the crawled
docs and throw out the irrelevant ones.
One way you can do that is look at the doc title and the url fo