RE: language identification during solrj indexing

2015-07-02 Thread Markus Jelsma
https://wiki.apache.org/solr/LanguageDetection -Original message- > From:Alessandro Benedetti > Sent: Thursday 2nd July 2015 11:06 > To: solr-user@lucene.apache.org > Subject: Re: language identification during solrj indexing > > SolrJ is simply a java client to ac

Re: language identification during solrj indexing

2015-07-02 Thread Alessandro Benedetti
SolrJ is simply a java client to access Solr REST API. This means that " indexing through SolrJ" doesn't exist. You simply need to add the proper chain to the update request handler you are using. Taking a look to the code , by Default SolrJ UpdateRequest refers to the "/update" endpoint. Have you

Re: Language Identification and Stemming

2013-03-02 Thread Jan Høydahl
In addition to the text_lang fields you can of course have a text_general field which is unstemmed, where you put documents that you don't yet have language specific handling for. One potential issue of multi language search is detecting the language of the query itself. Sometimes your search pag

Re: Language Identification and Stemming

2013-03-01 Thread vybe3142
>From your response, I gather that there's no way to maintain a single set of fields for multiple languages i.e. I can't use a field "text" for the body text. Instead, I would have to define text_en, text_fr, text_ru etc each mapped to their specific languages. -- View this message in context:

Re: Language Identification and Stemming

2013-03-01 Thread Jan Høydahl
Hi, Q1. You use langid for the detection, and your chosen field(s) can be mapped to new names such as title->title_en or title_de. Thus you need to configure your schema with a separate fieldType for every language you want to support if you'd like to use language specific stemming and stopwords e

Re: Language Identification in index time

2013-01-20 Thread Jack Krupansky
It sounds like you want an update request processor: http://wiki.apache.org/solr/UpdateRequestProcessor But, it also sounds like you should probably be normalizing the encoding before sending the data to Solr. -- Jack Krupansky -Original Message- From: Yewint Ko Sent: Sunday, Januar

Re: Language Identification

2012-04-23 Thread Jan Høydahl
I think nothing has "moved". We just offer Solr users to do language detection inside of Solr, using any of these two libs. If you choose to do language detection on client side instead, using any of these, what is stopping you? -- Jan Høydahl, search solution architect Cominvent AS - www.cominv

Re: Language Identification

2012-04-23 Thread Robert Muir
On Mon, Apr 23, 2012 at 1:27 PM, Bai Shen wrote: > I was under the impression that solr does Tika and the language identifier > that Shuyo did.  The page at > http://wiki.apache.org/solr/LanguageDetectionlists them both. > > class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProc

Re: Language Identification

2012-04-23 Thread Bai Shen
I was under the impression that solr does Tika and the language identifier that Shuyo did. The page at http://wiki.apache.org/solr/LanguageDetectionlists them both. Again, I'm just trying to understand why it was moved to solr. On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl wrote: > Hi, > >

Re: Language Identification

2012-04-20 Thread Jan Høydahl
Hi, Solr just reuses Tika's language identifier. But you are of course free to do your language detection on the Nutch side if you choose and not invoke the one in Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 20. apr.