Thanks Doug. Now I think it's better to customize Manifold CF's output
connector for
Solr.

Sreenivas
On Thu, Dec 7, 2017 at 10:01 AM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> A tokenizer plugin is probably not what you want, you probably want
> something more like an UpdateProcessor that can manipulate the whole
> document as it comes into Solr. Or you may want to avoid having a Solr
> plugin call to an API and do this work outside of Solr (what happens when
> the API is down, should doc updates fail? for example).
>
> A tokenizer plugin would definitely not be recommended. Tokenizers need to
> fast, low-level code that split up text into tokens based on readily
> accesible config & data. The overhead of a network call would be far too
> high,
>
> You probably want to put your extracted tags Into a different field anyway,
> and a tokenizer only works on text within a single field.
>
> -Doug
>
> On Wed, Dec 6, 2017 at 10:57 PM Sreenivas.T <sree...@gmail.com> wrote:
>
> > All,
> >
> > I need help from experts. We are trying to build a cognitive search
> > platform with enterprise content from content sources like sharepoint,
> file
> > share etc.. before content is getting indexed to Solr, I need to call our
> > internal AI platform to get additional metadata like classification tags
> > etc..
> >
> > I'm planning to leverage manifold cf for getting the content from sources
> > and planning to write
> > Custom tokenizer plugin to send the content to AI platform, which intern
> > returns with additional tags. I'll index additional tags dynamically
> > through plugin code.
> >
> > Is it a feasible solution?Is there any other way to achieve the same? I
> was
> > planning to not to customize manifold cf.
> >
> > Please suggest
> >
> >
> >
> > Regards,
> > Sreenivas
> >
> --
> Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
>

Reply via email to