On 8/30/2010 9:01 AM, Shawn Heisey wrote:
On 8/29/2010 2:17 PM, Erick Erickson wrote:
<<>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from S
On 8/29/2010 2:17 PM, Erick Erickson wrote:
<<>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from SOLR admin sure doesn't
seem to support that
There's nothing built into SOLR that I know of that'll deal with
auto-detecting
multiple languages and "doing the right thing". I know there's been
discussion
of that, searching the users' list might help... You may have to write your
own
analyzer that tries to do this, but I have no clue how you'd
Thank you for taking the time to help. The way I've got the word
delimiter index filter set up with only one pass, "wolf-biederman" will
result in wolf, biederman, wolfbiederman, and wolf-biederman. With two
passes, the last one is not present. One pass changes "gremlin's" to
gremlin and gr
Look at the tokenizer/filter chain that makes up your analyzers, and see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for other tokenizer/analyzer/filter options.
You're on the right track looking at the various choices provided, and
I suspect you'll find what you need...
Be a l
On 8/28/2010 7:59 PM, Shawn Heisey wrote:
The only drop in term quality that I noticed was that possessive words
(apostrophe-s) no longer have the original preserved. I haven't yet
decided whether that's a problem.
I finally did notice another drop in term quality from the dual pass -
words
It's metadata for a collection of 45 million documents that is mostly
photos, with some videos and text. The data is imported from a MySQL
database and split among six large shards (each nearly 13GB) and a small
shard with data added in the last week. That works out to between
300,000 and 50
It's metadata for a collection of 45 million documents that is mostly
photos, with some videos and text. The data is imported from a MySQL
database and split among six large shards (each nearly 13GB) and a small
shard with data added in the last week, which usually works out to
between 300,000
I agree with Marcus, the usefulness of passing through WDF twice
is suspect. You can always do a copyfield to a completely different
field and do whatever you want there, copyfield forks the raw input
to the second field, not the analyzed stream...
What is it you're really trying to accomplish? Yo
It's just a configured filter, so you should be able to define it twice. Have
you tried it? But it might be tricky, the output from the first will be the
input of the second so i doubt the usefulness of this approach.
On Thursday 26 August 2010 17:45:45 Shawn Heisey wrote:
> Can I pass my dat
10 matches
Mail list logo