Hi all,
I've been so busy the last few days so I haven't replied to this
email. I
modified SpellCheckerHandler a while ago to include support for
multiword
query. To be honest, I didn't have time to write unit test for the
code.
However, I deployed it in a production environment and it has been
working
for me so far. My version, however, has two assumptions:
1) I assumpt that when user enter a misspelled multiword query, we
should
only check for words that are actually misspelled. For example, if
user
enter "life expectancy calculatar", which has "calculator"
misspelled, we
should only spellcheck "calculatar".
2) I only return the best string for a mispelled query.
I guess I can just directly paste the code here so that others can
adapt
for their own purposes. If you have any question, just send me an
email.
I'll happy to help you.
StringBuffer buf = null;
if (null != words && !"".equals(words.trim())) {
Analyzer analyzer = req.getSchema
().getField(field).getType().getAnalyzer();
TokenStream source = analyzer.tokenStream(field, new
StringReader(words));
Token t;
boolean hasSuggestion = false;
boolean termExists = false;
while (true) {
try {
t = source.next();
} catch (IOException e) {
t = null;
}
if (t == null)
break;
String termText = t.termText();
String[] suggestions = spellChecker.suggestSimilar
(termText,
numSug, req.getSearcher().getReader(), restrictToField, true);
if (suggestions != null && suggestions.length > 0) {
if (!suggestions[0].equals(termText)) {
hasSuggestion = true;
}
if (buf == null) {
buf = new StringBuffer(suggestions[0]);
} else
buf.append(" ").append(suggestions[0]);
} else if (spellChecker.exist(termText)){
termExists = true;
if (buf == null) {
buf = new StringBuffer(termText);
} else
buf.append(" ").append(termText);
} else {
hasSuggestion = false;
termExists= false;
break;
}
}
try {
source.close();
} catch (IOException e) {
// ignore
}
// String[] suggestions = spellChecker.suggestSimilar
(words,
numSug,
// nullReader, restrictToField, onlyMorePopular);
if (hasSuggestion || (!hasSuggestion && termExists))
rsp.add("suggestions", buf.toString());
else
rsp.add("suggestions", null);
On 10/11/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Hoss,
I had a feeling someone would be quoting Yonik's Law of
Patches! ;-)
For now, this is done.
I created the changes, created JavaDoc comments on the various
settings
and their expected output, created a JUnit test for the
SpellCheckerRequestHandler
which tests various components of the handler, and I also created
the
supporting configuration files for the JUnit tests (schema and
solrconfig files).
I attached the patch to the JIRA issue so now we just have to
wait until
it gets
added back in to the main code stream.
For anyone who is interested, here is a link to the JIRA:
https://issues.apache.org/jira/browse/SOLR-375
Could someone please drop me a hint on how to update the wiki or any
other
documentation that could benefit to being updated; I'll like to
help out
as much
as possible, but first I need to know "how". ;-)
When these changes do get committed back in to the daily build,
please
review the generated JavaDoc for information on how to utilize
these new
features.
If anyone has any questions, or comments, please do not hesitate
to ask.
As a general note of a self-critique on these changes, I am not 100%
sure of the way I
implemented the "nested" structure when the "multiWords"
parameter is
used. My interest
is that it should work smoothly with some other technology such as
Prototype using the
JSon output type. Unfortunately, I will not be getting a chance to
start on that coding until
next week so it is up in the air as to if this structure will be
conducive or not. I am planning
on providing more details in the documentations as far as how to
utilize
these modifications
in Prototype and AJax when I get a chance (even provide links to a
production site so you
can see it in action and view the source if interested). So stay
tuned...
Thanks for everyones time,
Scott Tabar
---- Chris Hostetter <[EMAIL PROTECTED]> wrote:
: If you like, I can post the source code changes that I made to the
: SpellCheckerRequestHandler, but at this time I am not ready to
open a
: JIRA issue and submit the changes back through the subversion.
I will
: need to do a little more testing, documentation, and create
some unit
: tests to cover all of these changes, but what I have been able to
: perform, it is working very well.
Keep in mind "Yonik's Law Of Patches" ...
"A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at
all."
http://wiki.apache.org/solr/HowToContribute
...even if you don't think the code is "solid" yet, if you want to
eventually make it available to people, making a "rough" version
available
to people early gives other people the opportunity to help you
make it
solid (by writing unit tests, fixing bugs, and adding
documentation).
-Hoss
--
Regards,
Cuong Hoang