I think I didn't state my problem very well, allow me rephrase my case here:

1. We have over ten million news articles to build into Solr index.
2. We copy several fields, such as title, author, body, caption of attahed photos into a new field for default search.
3. We then wanna use shingle filter on this new field.
4. We can't predict what new single-word noun that our users may be interesting cause it's "news", you know. For exmple, the word "ECFA" is only very popular word in news here recently, so I wish users can type in 'ECFA' to search and Solr will output see some relevant news articles.
5. I wish to keep index as smaller as possible.
6. I also wish to do same thing descirbed in 5 when I search by explicitly specifyng field name of those fields, too.

I don't quite understand additional-field-way? Do you mean making another field that stores special words particularly but no indexing for that field?

Scott

----- Original Message ----- From: "MitchK" <mitc...@web.de>
To: <solr-user@lucene.apache.org>
Sent: Sunday, August 22, 2010 11:48 PM
Subject: Re: Doing Shingle but also keep special single word



Hi,

keepword-filter is no solution for this problem, since this would lead to
the problematic that one has to manage a word-dictionary. As explained, this
would lead to too much effort.

You can easily add outputUnigrams=true and check out the analysis.jsp for
this field. So you can see how much bigger a single field will become with
this option.
However, I am quite sure that the difference between using
outputUnigrams=true and indexing in a seperate field is not noteworthy.

I would suggest you to do it the additionally-field-way, since this would
lead to more flexibility in boosting the different fields.

Unfortunately, I haven't understood your explanation about the use-case. But
it sounds a little bit like tagging?

Kind regards,
- Mitch


iorixxx wrote:

Isn't set outputUnigrams="true" will
make index size about twice than when it's set to false?

Sure index will be bigger. I didn't know that this is problem for you. But
if you have a list of special single words that you want to keep,
keepwordfilter can eliminate other tokens. So index size will be okey.


Scott

----- Original Message ----- From: "Ahmet Arslan" <iori...@yahoo.com>
To: <solr-user@lucene.apache.org>
Sent: Saturday, August 21, 2010 1:15 AM
Subject: Re: Doing Shingle but also keep special single
word


>> I am building index with Shingle
>> filter. We know it's minimum 2-gram but I also
want keep
>> some special single word, e.g. IBM, Microsoft,
etc. i.e. I
>> want to do a minimum 2-gram but also want to have
these
>> single word in my index, Is it possible?
>
> outputUnigrams="true" parameter does not work for
you?
>
> After that you can cast <filter
class="solr.KeepWordFilterFactory" words="keepwords.txt"
ignoreCase="true"/> with keepwords.txt=IBM, Microsoft.
>
>
>
>







--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
Sent from the Solr - User mailing list archive at Nabble.com.



--------------------------------------------------------------------------------



¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10 14:35:00

Reply via email to