Re: multilingual list of stopwords

Maria Mosolova Thu, 18 Oct 2007 07:48:43 -0700

Thanks a lot to everyone who responded. Yes, I agree that eventually
we need to use separate stopword lists for different languages.
Unfortunately the data we are trying to index at the moment does not
contain any direct country/language information and we need to create
the first version of the index quickly. It does not look like
analyzing  documents to determine their languge is something which
could be accomplished in a very limited timeframe. Or am I wrong here
and there are existing analyzers one could use?
Maria


On 10/18/07, Walter Underwood <[EMAIL PROTECTED]> wrote:
> Also "die" in German and English. --wunder
>
> On 10/18/07 4:16 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
>
> > One example that I'm familiar with: words "is" and "by" in English and
> > in Swedish. Both words are stopwords in English, but they are content
> > words in Swedish (ice and village, respectively). Similarly, "till" in
> > Swedish is a stopword (to, towards), but it's a content word in English.
>
>

Re: multilingual list of stopwords

Reply via email to