David Smiley had a place name and general tagging engine that for the life of me I can't find.
It didn't do NER for you (I'm not sure you want to do this in the search engine) but it helps you tag entities in a search engine based on a predefined list. At least that's what I remember. On Wed, Nov 4, 2015 at 3:05 PM, <liviuchrist...@yahoo.com.invalid> wrote: > Hi everyone, > > I need to install a plugin to extract Location (Country/State/City) from > free text documents - any professional advice?!? Does OpenNLP really does > the job? Is it English only? US only? Or does it cover worldwide places > names? > Could someone help me with this job - installation, configuration, > model-training etc? > > Please help,Kind regards,Christian > Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570 > > > From: Upayavira <u...@odoko.co.uk> > To: solr-user@lucene.apache.org > Sent: Tuesday, November 3, 2015 12:13 PM > Subject: Re: language plugin > > Looking at the code, this is not going to work without modifications to > Solr (or at least a custom component). > > The atomic update code is closely embedded into the Solr > DistributedUpdateProcessor, which expands the atomic update into a full > document and then posts it to the shards. > > You need to do the update expansion before your lang detect processor, > but there is no gap between them. > > From my reading of the code, you could create an AtomicUpdateProcessor > that simply expands updates, and insert that before the > LangDetectUpdateProcessor. > > Upayavira > > On Tue, Nov 3, 2015, at 06:38 AM, Chaushu, Shani wrote: > > Hi > > When I make atomic update - set field - also on content field and also > > another field, the language field became generic. Meaning, it doesn’t > > work in the set field, only in the first inserting. Even if in the first > > time the language was detected, it just became generic after the update. > > Any idea? > > > > The chain is > > > > <updateRequestProcessorChain name="aa_chain"> > > <processor > > > class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> > > <str name="langid.fl">title,content,text</str> > > <str name="langid.langField">language_t</str> > > <str name="langid.langsField">language_all_t</str> > > <str name="langid.fallback">generic</str> > > <str name="langid.overwrite">false</str> > > <str name="langid.threshold">0.8</str> > > </processor> > > <processor class="solr.LogUpdateProcessorFactory" /> > > <processor class="solr.RunUpdateProcessorFactory" /> > > </updateRequestProcessorChain> > > > > > > Thanks, > > Shani > > > > > > > > > > -----Original Message----- > > From: Jack Krupansky [mailto:jack.krupan...@gmail.com] > > Sent: Thursday, October 29, 2015 17:04 > > To: solr-user@lucene.apache.org > > Subject: Re: language plugin > > > > Are you trying to do an atomic update without the content field? If so, > > it sounds like Solr needs an enhancement (bug fix?) so that language > > detection would be skipped if the input field is not present. Or maybe > > that could be an option. > > > > > > -- Jack Krupansky > > > > On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani <shani.chau...@intel.com > > > > wrote: > > > > > Hi, > > > I'm using solr language detection plugin on field name "content" > > > (solr 4.10, plugin LangDetectLanguageIdentifierUpdateProcessorFactory) > > > When I'm indexing on the first time it works fine, but if I want to > > > set one field again (regardless if it's the content or not) if goes to > > > its default language. If I'm setting other field I would like the > > > language to stay the way it was before, and o don't want to insert all > > > the content again. There is an option to set the plugin that it won't > > > calculate again the language? (put langid.overwrite to false didn't > > > work) > > > > > > Thanks, > > > Shani > > > > > > > > > --------------------------------------------------------------------- > > > Intel Electronics Ltd. > > > > > > This e-mail and any attachments may contain confidential material for > > > the sole use of the intended recipient(s). Any review or distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > > Intel Electronics Ltd. > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.