On 03.10.2010 09:20, Andy wrote:
NGramFilterFactory would then take that one toke ("electric guitar")
and generate N-grams out of it. One of the ngrams would be "guit"
because "guit" is a substring of "electric guitar".
AFAIK it only produces prefix-strings like
gui
guit
guita
guitar
etc.
So
On 28.05.2010 22:06, Chris Hostetter wrote:
and one "text_prefix"
defined similarly but with an additional EdgeNGramTokenFilter used when
indexing to generate "prefix" tokens. then search those fields using
dismax...
To be sure that I understand this right:
Am I right that I should not stopwor
Thank you, Chris and Erick, for the answers,
it was new to me that "the*" is expanded to all known the* words in the
index. Good to know.
And yes, the AND operation between the query terms are certainly the
problem. (I would like to switch to OR instead. The result set will grow
the more wo
Hello,
I am having some problems with solr 1.4. I am indexing and querying data
using the following fieldType:
The ap
revathy arun wrote:
> Is there any way to check the encoding of a text/pdf document or convert
> them to utf -8 encoding?
If you are using pdftotext you could set the enc parameter:
pdftotext -enc UTF-8 filename
How can you convert PDFs to text via xpdf programmatically?
Greetings,
Gert
sunnyfr wrote:
> Yes the average is 12 docs seconde updated.
In our case with indexing normal web-pages on a normal workstation we
have about 10 docs per second (updating + committing). This feels quite
long. But if this is normal... ok.
> I actually reduce warmup and cache, it works fine now, I
Mark Miller wrote:
>> Currently I think about dropping the stemming and only use
>> prefix-search. But as highlighting does not work with a prefix "house*"
>> this is a problem for me. The hint to use "house?*" instead does not
>> work here.
>>
> Thats because wildcard queries are also not high
Mark Miller wrote:
> Try hitting /solr/admin/luke and see what it says.
Oh, interesting. I think I have to check the stopword list. Is there a
way to filter single characters like the "h"?
text_de_de
ITS--
ITS--
2340
57971
1454
1016
1008
980
927
924
895
843
730
730
Mark Miller wrote:
> Yeah, sounds small. Its odd you would see such slow performance. It
> depends though. You may still have a *lot* of unique terms in there.
Is there a way to retrieve the list of terms in the index?
Gert
Thanks, Mark, for your answer,
Mark Miller wrote:
> Truncation queries and stemming are difficult partners. You likely have
> to accept compromise. You can try using multiple fields like you are,
I already have multiple fields, one per language, to be able to use
different stemmers. Wouldn't bec
Gert Brinkmann wrote:
>> A) fuzzy search
>>
>> What can I do to speed up the fuzzy query?
Setting ramBufferSizeMB to a higher value seems to speed up the query
slightly. I have to continue with tuning though.
>> B) combine stemming, prefix and fuzzy search
>>
&g
Shalin Shekhar Mangar wrote:
Quite the opposite, you are actually working with some advanced stuff :)
Thank you for the response.
Please have some patience, someone is
Ok, I will have (what else could I do? ;) ). Meanwhile I while try some
things and continue to search the web.
Greetings
Hello again,
is there nobody who could help me with this? Or is it an FAQ and my
questions are dumb somehow? Maybe I should try to shorten the questions: ;)
> A) fuzzy search
>
> What can I do to speed up the fuzzy query?
> B) combine stemming, prefix and fuzzy search
>
> Is there a way to
Hello,
I am trying to get Solr to properly work. I have set up a Solr test
server (using jetty as mentioned in the tutorial). Also I had to modify
the schema.xml so that I have different fields for different languages
(with their own stemmers) that occur in the content management system
that I am
14 matches
Mail list logo