Well, you have a crawling and extraction pipeline. You can probably inject
a classification algorithm somewhere in there, possibly NLP trained on
manual seed. Or just a list of typical words as a start.

This is kind of pre-Solr stage though.

Regards,
    Alex
On 4 Jan 2016 7:37 pm, <liviuchrist...@yahoo.com.invalid> wrote:

> Hi everyone, I'm working on a search engine based on solr which indexes
> documents from a large variety of websites.
> The engine is focused on cook recipes. However, one problem is that these
> websites provide not only content related to cooking recipes but also
> content related to: fashion, travel, politics, liberty rights etc etc which
> are not what the user expects to find on a cooking recipes dedicated search
> engine.
> Is there any way to filter out content which is not related to the core
> business of the search engine?
> Something like parental control software maybe?
> Kind regards,Christian Christian Fotache Tel: 0728.297.207 Fax:
> 0351.411.570

Reply via email to