Hi, Andreas, << I don't quite understand what you mean by "I’m having a hard time thinking through the process of generating the candidate suffix set using set forms" >>
It is my usual roundabout way of saying "I don't know how to do this." ;) I'm looking at your code as we speak. Thanks, Tuba On Mon, Aug 8, 2011 at 1:13 AM, Andreas Kostler < [email protected]> wrote: > Hi Tuba, > I don't quite understand what you mean by "I’m having a hard time > thinking through the process of generating the > candidate suffix set using set forms" but I have created a porter > stemmer for English in the past. > I understand that's not what you're looking for but it is moreso a > framwork for building stemmers: > > You specify rules of the like: > {:c? condition :s1 "abc" :s2 "efg" :a action} > reading if condition is met, replace s1 with s2 and execute action. > Where s1 could be a suffix etc. All you need to do is specify these rules. > Have a browse > https://github.com/AndreasKostler/Stout > > Cheers > Andreas > > > On 8 August 2011 16:16, Tuba Lambanog <[email protected]> wrote: > > > > Hello, > > > > I’m doing a word stemmer for a non-English language. A stemmer parses > > a word into its word parts: prefixes, roots, suffixes. The input word > > is at least a root word (English example would be ‘cloud’), but can be > > any combination of prefix(es) and a root (e.g., 'pre-nuptial'), or a > > root and suffix(es) (‘cloudy’), or all three ('unidirection'). A > > sequence of more than one prefix in a word is considered one > > occurrence of a prefix, and similarly for complex prefixes, thus, > > ‘directional’ is considered to have the ‘single’ suffix ‘ional’. The > > prefixes, roots, and suffixes are in their own set data structure. > > > > The approach I am pursuing is to create a set of potential suffixes > > that the input word contains. Asssume, for simplicity, that the suffix > > set consists of #{-or, -er, -al, -ion, -ional, able}. The input > > ‘directional’ would have the candidate suffix set #{-al –ional}. Now, > > drop the longest suffix (‘ional’) from the input then check the > > remaining string (‘direct’) if it is a root; if it is, done. If not, > > try the next suffix (‘-al’) in the potential suffix set. Prefixes > > will be similarly processed. Input words with both prefixes and > > affixes will be fun to do ;) > > > > I’m having a hard time thinking through the process of generating the > > candidate suffix set using set forms, and I’m beginning to think I > > have selected an arduous path (for me). > > > > Thoughts? > > > > Thanks. > > Tuba > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Clojure" group. > > To post to this group, send email to [email protected] > > Note that posts from new members are moderated - please be patient with > your first post. > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/clojure?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en
