Hello David,

If JSON serialization is too bulky, we could also opt for SimplePreAnalyzed 
right? At least as a FieldType it is possible, if not with URP, it just needs 
some work.

Regarding results; we haven't done it yet, and won't for some time, but we will 
when we reintroduce OpenNLP in the analysis chain. We tried to introduce 
POS-tagging on our own two years ago, but i wasn't suited for production 
because it was too heavy on the CPU. Indexing data suddenly took eight to ten 
times longer in a SolrCloud environment with three replica's.

If we offload our current chains without OpenNLP, it will only benefit when 
large fields pass through a regex, and for decompounding the Germanic languages 
we ingest. Offloading just this cost is a micro optimization, offloading the 
various OpenNLP char and token filters are really beneficial.

Regarding a dependency on Lucene core and analysis-common, it would be helpful, 
but we'll manage.

Thanks again,
Markus
 
-----Original message-----
> From:David Smiley <david.w.smi...@gmail.com>
> Sent: Thursday 12th April 2018 19:16
> To: solr-user@lucene.apache.org
> Subject: Re: PreAnalyzed URP and SchemaRequest API
> 
> Ah ok.
> I've wondered how much value there is in pre-analysis.  The serialization
> of the analyzed form in JSON is bulky.  If you can share any results, I'd
> be interested to hear how it went.  It's an optimization so you should be
> able to know how much better it is.  Of course it isn't for everybody --
> only when the analysis chain is sufficiently complex.
> 
> On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > Hello David,
> >
> > The remote client has everything on the class path but just calling
> > setTokenStream is not going to work. Remotely, all i get from SchemaRequest
> > API is a AnalyzerDefinition. I haven't found any Solr code that allows me
> > to transform that directly into an analyzer. If i had that, it would make
> > things easy.
> >
> > As far as i see it, i need to reconstruct a real Analyzer using
> > AnalyzerDefinition's information. It won't be a problem, but it is
> > cumbersome.
> >
> > Thanks anyway,
> > Markus
> >
> > -----Original message-----
> > > From:David Smiley <david.w.smi...@gmail.com>
> > > Sent: Thursday 5th April 2018 19:38
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: PreAnalyzed URP and SchemaRequest API
> > >
> > > Is this really a problem when you could easily enough create a TextField
> > > and call setTokenStream?
> > >
> > > Does your remote client have Solr-core and all its dependencies on the
> > > classpath?   That's one way to do it... and presumably the direction you
> > > are going because you're asking how to work with PreAnalyzedParser which
> > is
> > > in solr-core.  *Alternatively*, only bring in Lucene core and construct
> > > things yourself in the right format.  You could copy PreAnalyzedParser
> > into
> > > your codebase so that you don't have to reinvent any wheels, even though
> > > that's awkward.  Perhaps that ought to be in Solrj?  But no we don't want
> > > SolrJ depending on Lucene-core, though it'd make a fine "optional"
> > > dependency.
> > >
> > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma <markus.jel...@openindex.io
> > >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We intend to move to PreAnalyzed URP for analysis offloading. Browsing
> > the
> > > > Javadocs i came across the SchemaRequest API looking for a way to get a
> > > > Field object remotely, which i seem to need for
> > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get
> > from
> > > > SchemaRequest API is FieldTypeRepresentation, which offers me
> > > > getIndexAnalyzer() but won't allow me to construct a Field object.
> > > >
> > > > So, to analyze remotely i do need an index-time analyzer. I can get it,
> > > > but not turn it into a Field object, which the PreAnalyzedParser for
> > some
> > > > reason wants.
> > > >
> > > > Any hints here? I must be looking the wrong way.
> > > >
> > > > Many thanks!
> > > > Markus
> > > >
> > > --
> > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > > http://www.solrenterprisesearchserver.com
> > >
> >
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
> 

Reply via email to