Re: Synonym file for American-British words

SUJIT PAL Tue, 07 Aug 2012 11:29:25 -0700

Hi Alex,

I implemented something similar using the rules described in this page:

http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences 

The idea is to normalize the British spelling form to the American form during 
indexing and query using a tokenizer that takes in a word and if matched to one 
of the rules, returns the converted form.

My rules were modeled as a chain of transformations. Each transformation had a 
set of (pattern, action) pairs. The transformations were:
a. word_replacement (such as artefact => artifact) - in this case the source 
word would directly be normalized into the specified target word.
b) prefix rules (eg anae => ane for anemic) - in this case the prefix 
characters of the word, if matched, would be transformed into the target.
c) suffix rules (eg tre => ter for center) - similar to prefix rules except it 
works on suffix.
d) infix rules (eg moeb => meb for ameba) - replaces characters in the middle 
of the word. 

I cannot share the actual rules, but they should be relatively simple to figure 
out from the wiki page, if you want to go that route.

HTM
Sujit

On Aug 7, 2012, at 7:08 AM, Alexander Cougarman wrote:

> Dear friends,
> 
> Is there a downloadable synonym file for American-British words? This page 
> has some, for example the VarCon file, but it's not in the Solr synonym.txt 
> file. 
> 
> We need something that can normalize words like "center" to "centre". The 
> VarCon file has it, but it's in the wrong format.
> 
> Thank you in advance :)
> 
> Sincerely,
> Alex 
>

Re: Synonym file for American-British words

Reply via email to