[ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-----------------------------------------
    Issue Type: New Feature  (was: Improvement)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9044
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 8.3
>         Environment: Lucene
>            Reporter: pavithra kariyawasam
>            Priority: Major
>              Labels: features
>             Fix For: 5.5.6
>
>         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to