File based spell check with weights

Peter Lee Fri, 08 Apr 2016 11:08:45 -0700

In an older wiki post (which I know is outdated...but still...) located here: 
https://wiki.apache.org/solr/FileBasedSpellChecker, the end of the first 
paragraph  (the intro) indicates that "...it isn't all that hard to create an 
index from a file and have it weight the terms."


I've been thinking about this, and it would be PERFECT for what we need 
(because we HAVE a dictionary from a different search engine WITH weights on it 
that have been crafted/tweaked over a long period of time). However, as I've 
read the spellcheck documentation, I'm really not certain what this person 
(Mark Bennett) was talking about.

Question #1: Has anyone pursued this possibility? If so, can you offer some 
insight as to how you approached this?

Question #2: I'd be grateful for someone's "hint" as to where to start looking 
to try to create this workaround if you have some experience with it. I'm 
guessing that the only way to do this is to dive into the File based spell 
check dictionary code and then clone/modify what is there. I was just wondering 
if I was missing something or if someone was aware of a quicker way to scale 
this technological hill. I realize I could dissect the format of the spellcheck 
dictionary and load it as I desire...but I'm a bit concerned about this as this 
is for a long-lived project that is expected to live a lot longer, and this 
approach seems to be a bit "brittle." It will break the first time Apache makes 
the slightest change to the structure of the spell check index.

If anyone has any insights to offer I'd appreciate it.

Thanks.

Peter S. Lee

File based spell check with weights

Reply via email to