I think I didn't state my problem very well, allow me rephrase my case here:
1. We have over ten million news articles to build into Solr index.
2. We copy several fields, such as title, author, body, caption of attahed
photos into a new field for default search.
3. We then wanna use shingle filter on this new field.
4. We can't predict what new single-word noun that our users may be
interesting cause it's "news", you know. For exmple, the word "ECFA" is only
very popular word in news here recently, so I wish users can type in 'ECFA'
to search and Solr will output see some relevant news articles.
5. I wish to keep index as smaller as possible.
6. I also wish to do same thing descirbed in 5 when I search by explicitly
specifyng field name of those fields, too.
I don't quite understand additional-field-way? Do you mean making another
field that stores special words particularly but no indexing for that field?
Scott
----- Original Message -----
From: "MitchK" <mitc...@web.de>
To: <solr-user@lucene.apache.org>
Sent: Sunday, August 22, 2010 11:48 PM
Subject: Re: Doing Shingle but also keep special single word
Hi,
keepword-filter is no solution for this problem, since this would lead to
the problematic that one has to manage a word-dictionary. As explained,
this
would lead to too much effort.
You can easily add outputUnigrams=true and check out the analysis.jsp for
this field. So you can see how much bigger a single field will become with
this option.
However, I am quite sure that the difference between using
outputUnigrams=true and indexing in a seperate field is not noteworthy.
I would suggest you to do it the additionally-field-way, since this would
lead to more flexibility in boosting the different fields.
Unfortunately, I haven't understood your explanation about the use-case.
But
it sounds a little bit like tagging?
Kind regards,
- Mitch
iorixxx wrote:
Isn't set outputUnigrams="true" will
make index size about twice than when it's set to false?
Sure index will be bigger. I didn't know that this is problem for you.
But
if you have a list of special single words that you want to keep,
keepwordfilter can eliminate other tokens. So index size will be okey.
Scott
----- Original Message ----- From: "Ahmet Arslan" <iori...@yahoo.com>
To: <solr-user@lucene.apache.org>
Sent: Saturday, August 21, 2010 1:15 AM
Subject: Re: Doing Shingle but also keep special single
word
>> I am building index with Shingle
>> filter. We know it's minimum 2-gram but I also
want keep
>> some special single word, e.g. IBM, Microsoft,
etc. i.e. I
>> want to do a minimum 2-gram but also want to have
these
>> single word in my index, Is it possible?
>
> outputUnigrams="true" parameter does not work for
you?
>
> After that you can cast <filter
class="solr.KeepWordFilterFactory" words="keepwords.txt"
ignoreCase="true"/> with keepwords.txt=IBM, Microsoft.
>
>
>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
Sent from the Solr - User mailing list archive at Nabble.com.
--------------------------------------------------------------------------------
¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10
14:35:00