Have you tried to just use a copyField? For example, I had a similar use
case where I needed to have particular field (f1) tokenized but also
needed to facet on the complete contents.
For that, I created a copyField
<copyField source="f1" dest="f2" />
f1 used tokenizers and filters but f2 was just a plain string. You then
facet on f2
... just an idea
On 02/28/2014 04:54 AM, epnRui wrote:
Hi Ahmet!!
I went ahead and did something I thought it was not a clean solution and
then when I read your post and I found we thought of the same solution,
including the European_Parliament with the _ :)
So I guess there would be no way to do this more cleanly, maybe only
implementing my own Tokenizer and Filters, but I honestly couldn't find a
tutorial for implement a customized solr Tokenizer. If I end up needing to
do it I will write a tutorial.
So for now I'm doing PatternReplaceCharFilterFactory to replace "European
Parliament" with <MD5Hash>European_Parliament (initially I didnt use the
md5hash European_Parliament).
Then I replace it back after the StandardTokenizerFactory ran, into
"European Parliament". Well I guess I just found a way to do a 2 words token
:)
I had seen the ShingleFilterFactory but the problem is I don't need the
whole phrase in tokens of 2 words and I understood it's what it does. Of
course I would need some filter that would handle a .txt with the tokens to
merge, like "European" and "Parliament".
I'm still having some other problem now but maybe I find a solution after I
read the page you annexed which seems great. Solr is considering #European
as #European and European, meaning it does 2 facets for one token. I want it
to consider it only as #European. I ran the analyzer debugger in my Solr
admin console and I don't see how he can be doing that.
Would you know of a reason for this?
Thanks for your reply and that page you annexed seems excelent and I'll read
it through.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html
Sent from the Solr - User mailing list archive at Nabble.com.