[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

pavithra kariyawasam (Jira) Wed, 13 Nov 2019 05:10:48 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


pavithra kariyawasam updated LUCENE-9044:
-----------------------------------------
    Issue Type: New Feature  (was: Improvement)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9044
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 8.3
>         Environment: Lucene
>            Reporter: pavithra kariyawasam
>            Priority: Major
>              Labels: features
>             Fix For: 5.5.6
>
>         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

Reply via email to