A tokenizer plugin is probably not what you want, you probably want
something more like an UpdateProcessor that can manipulate the whole
document as it comes into Solr. Or you may want to avoid having a Solr
plugin call to an API and do this work outside of Solr (what happens when
the API is down, should doc updates fail? for example).

A tokenizer plugin would definitely not be recommended. Tokenizers need to
fast, low-level code that split up text into tokens based on readily
accesible config & data. The overhead of a network call would be far too
high,

You probably want to put your extracted tags Into a different field anyway,
and a tokenizer only works on text within a single field.

-Doug

On Wed, Dec 6, 2017 at 10:57 PM Sreenivas.T <sree...@gmail.com> wrote:

> All,
>
> I need help from experts. We are trying to build a cognitive search
> platform with enterprise content from content sources like sharepoint, file
> share etc.. before content is getting indexed to Solr, I need to call our
> internal AI platform to get additional metadata like classification tags
> etc..
>
> I'm planning to leverage manifold cf for getting the content from sources
> and planning to write
> Custom tokenizer plugin to send the content to AI platform, which intern
> returns with additional tags. I'll index additional tags dynamically
> through plugin code.
>
> Is it a feasible solution?Is there any other way to achieve the same? I was
> planning to not to customize manifold cf.
>
> Please suggest
>
>
>
> Regards,
> Sreenivas
>
-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Reply via email to