Sandeep,

As Jack mentioned it will be useful to know the use case/what kind of query you 
will be executing as you may also need to handle on query side not just on 
indexing side.  For integrating with nltk there could be different options like 
calling ntlk as out of proc or use jythonc to generate java classes.

Thnx

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, September 08, 2014 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Out of curiosity, what would be an example query for your application that 
would depend on sentence tokenization, as opposed to simple term tokenization? 
I mean, there are no sentence-based query operators in the Solr query parsers.

-- Jack Krupansky

-----Original Message-----
From: Sandeep B A
Sent: Monday, September 8, 2014 12:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Hi Susheel ,
Thanks for the information.
I have crawled few website and all I need is for sentence tokenizers on the 
data I have collected.
These websites are English only.

Well I don't have experience in writing custom sentence tokenizers for solr. Is 
there any tutorial link which tell how to do it?

Is it possible to integrate nltk for solr? If yes how to do it? Because I found 
sentence tokenizers for English in nltk.

Thanks,
Sandeep
On Sep 5, 2014 8:10 PM, "Sandeep B A" <belgavi.sand...@gmail.com> wrote:

> Sorry for typo it is solr 4.9.0 instead of sold 4.9.0  On Sep 5, 2014
> 7:48 PM, "Sandeep B A" <belgavi.sand...@gmail.com> wrote:
>
>> Hi,
>>
>> I was looking out the options for sentence tokenizers default in solr
>> but could not find it. Does any one used? Integrated from any other
>> language tokenizers to solr. Example python etc.. Please let me know.
>>
>>
>> Thanks and regards,
>> Sandeep
>>
>

This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.

Reply via email to