Hi,
The suggested approach with a TokenFilter extending the BufferedTokenStream
class works fine, performance is OK - the external stemmer is now invoked
only once for the complete search text. Also, from a functional point of
view, the approach is useful, because it allows for other filtering (i.
Thanks for these suggestions, will try it in the coming days and post my
findings in this thread.
Bye,
Jaco.
2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>
>
> On Sep 26, 2008, at 12:05 PM, Jaco wrote:
>
> Hi Grant,
>>
>> In reply to your questions:
>>
>> 1. Are you having to restart/initialize
On Sep 26, 2008, at 12:05 PM, Jaco wrote:
Hi Grant,
In reply to your questions:
1. Are you having to restart/initialize the stemmer every time for
your
"slow" approach? Does that really need to happen?
It is invoking a COM object in Windows. The object is instantiated
once for
a token
The overhead is not in the instantiation, but in the actual call to the COM
object. The approach with one time instantiation in the TokenFilterFactory,
and the use of that object in the TokenFilter is exactly what I tried. There
is a factor of 10 performance gain when being able to do a single call
: It is invoking a COM object in Windows. The object is instantiated once for
: a token stream, and then invoked once for each token. The invoke always has
: an overhead, not much to do about that (sigh...)
I also know nothing about COM, but based on your comments it sounds like
instantiating yo
Hi Grant,
In reply to your questions:
1. Are you having to restart/initialize the stemmer every time for your
"slow" approach? Does that really need to happen?
It is invoking a COM object in Windows. The object is instantiated once for
a token stream, and then invoked once for each token. The i
On Sep 26, 2008, at 9:40 AM, Jaco wrote:
Hi,
Here's some of the code of my Tokenizer:
public class MyTokenizerFactory extends BaseTokenizerFactory
{
public WhitespaceTokenizer create(Reader input)
{
String text, normalizedText;
try {
text = IOUtils.toString(in
Hi,
Here's some of the code of my Tokenizer:
public class MyTokenizerFactory extends BaseTokenizerFactory
{
public WhitespaceTokenizer create(Reader input)
{
String text, normalizedText;
try {
text = IOUtils.toString(input);
normalizedText= *i
How are you creating the tokens? What are you setting for the offsets
and the positions?
One thing that is helpful is Solr's built in Analysis tool via the
Admin interface (http://localhost:8983/solr/admin/) From there, you
can plug in verbose mode, and see what the position and offsets a