[ https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
pavithra kariyawasam updated LUCENE-9044: ----------------------------------------- Issue Type: New Feature (was: Improvement) > Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer > which consist of language dependent tokenizer, stemming algorithm and list of > stop words. > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-9044 > URL: https://issues.apache.org/jira/browse/LUCENE-9044 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis > Affects Versions: 8.3 > Environment: Lucene > Reporter: pavithra kariyawasam > Priority: Major > Labels: features > Fix For: 5.5.6 > > Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, > SinhalaTokenizer.java, stopwords.txt > > > This component is developed based on three main researches. > Lucene did not have component to analyze Sinhala documents. So our intension > is to fill that space with an Analyzer which can analyze Sinhala documents. > Sinhala Analyzer has implemented by performing Sinhala morphological > analysis. Tokenizing the document content precisely, Removing stopwords > accordingly and converting the terms to its base/root form accurately are the > main three functionalities of Sinhala Analyzer. These are built by > considering the grammatical rules in Sinhala -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org