A tokenizer plugin is probably not what you want, you probably want something more like an UpdateProcessor that can manipulate the whole document as it comes into Solr. Or you may want to avoid having a Solr plugin call to an API and do this work outside of Solr (what happens when the API is down, should doc updates fail? for example).
A tokenizer plugin would definitely not be recommended. Tokenizers need to fast, low-level code that split up text into tokens based on readily accesible config & data. The overhead of a network call would be far too high, You probably want to put your extracted tags Into a different field anyway, and a tokenizer only works on text within a single field. -Doug On Wed, Dec 6, 2017 at 10:57 PM Sreenivas.T <sree...@gmail.com> wrote: > All, > > I need help from experts. We are trying to build a cognitive search > platform with enterprise content from content sources like sharepoint, file > share etc.. before content is getting indexed to Solr, I need to call our > internal AI platform to get additional metadata like classification tags > etc.. > > I'm planning to leverage manifold cf for getting the content from sources > and planning to write > Custom tokenizer plugin to send the content to AI platform, which intern > returns with additional tags. I'll index additional tags dynamically > through plugin code. > > Is it a feasible solution?Is there any other way to achieve the same? I was > planning to not to customize manifold cf. > > Please suggest > > > > Regards, > Sreenivas > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)