Hello David, If JSON serialization is too bulky, we could also opt for SimplePreAnalyzed right? At least as a FieldType it is possible, if not with URP, it just needs some work.
Regarding results; we haven't done it yet, and won't for some time, but we will when we reintroduce OpenNLP in the analysis chain. We tried to introduce POS-tagging on our own two years ago, but i wasn't suited for production because it was too heavy on the CPU. Indexing data suddenly took eight to ten times longer in a SolrCloud environment with three replica's. If we offload our current chains without OpenNLP, it will only benefit when large fields pass through a regex, and for decompounding the Germanic languages we ingest. Offloading just this cost is a micro optimization, offloading the various OpenNLP char and token filters are really beneficial. Regarding a dependency on Lucene core and analysis-common, it would be helpful, but we'll manage. Thanks again, Markus -----Original message----- > From:David Smiley <david.w.smi...@gmail.com> > Sent: Thursday 12th April 2018 19:16 > To: solr-user@lucene.apache.org > Subject: Re: PreAnalyzed URP and SchemaRequest API > > Ah ok. > I've wondered how much value there is in pre-analysis. The serialization > of the analyzed form in JSON is bulky. If you can share any results, I'd > be interested to hear how it went. It's an optimization so you should be > able to know how much better it is. Of course it isn't for everybody -- > only when the analysis chain is sufficiently complex. > > On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello David, > > > > The remote client has everything on the class path but just calling > > setTokenStream is not going to work. Remotely, all i get from SchemaRequest > > API is a AnalyzerDefinition. I haven't found any Solr code that allows me > > to transform that directly into an analyzer. If i had that, it would make > > things easy. > > > > As far as i see it, i need to reconstruct a real Analyzer using > > AnalyzerDefinition's information. It won't be a problem, but it is > > cumbersome. > > > > Thanks anyway, > > Markus > > > > -----Original message----- > > > From:David Smiley <david.w.smi...@gmail.com> > > > Sent: Thursday 5th April 2018 19:38 > > > To: solr-user@lucene.apache.org > > > Subject: Re: PreAnalyzed URP and SchemaRequest API > > > > > > Is this really a problem when you could easily enough create a TextField > > > and call setTokenStream? > > > > > > Does your remote client have Solr-core and all its dependencies on the > > > classpath? That's one way to do it... and presumably the direction you > > > are going because you're asking how to work with PreAnalyzedParser which > > is > > > in solr-core. *Alternatively*, only bring in Lucene core and construct > > > things yourself in the right format. You could copy PreAnalyzedParser > > into > > > your codebase so that you don't have to reinvent any wheels, even though > > > that's awkward. Perhaps that ought to be in Solrj? But no we don't want > > > SolrJ depending on Lucene-core, though it'd make a fine "optional" > > > dependency. > > > > > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma <markus.jel...@openindex.io > > > > > > wrote: > > > > > > > Hello, > > > > > > > > We intend to move to PreAnalyzed URP for analysis offloading. Browsing > > the > > > > Javadocs i came across the SchemaRequest API looking for a way to get a > > > > Field object remotely, which i seem to need for > > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get > > from > > > > SchemaRequest API is FieldTypeRepresentation, which offers me > > > > getIndexAnalyzer() but won't allow me to construct a Field object. > > > > > > > > So, to analyze remotely i do need an index-time analyzer. I can get it, > > > > but not turn it into a Field object, which the PreAnalyzedParser for > > some > > > > reason wants. > > > > > > > > Any hints here? I must be looking the wrong way. > > > > > > > > Many thanks! > > > > Markus > > > > > > > -- > > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > > > http://www.solrenterprisesearchserver.com > > > > > > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com >