Hi, I haven't heard of multilingual stop words list before. What should be the purpose of it? This seems to odd to me :-) Stop words are used to cut down the size of index.
One way you can go about this is to create your own list by indexing your documents (without stop words removed) and then looking at the most frequent words and create the list by picking some of them. This could work if you want to index static set of documents (so you know what your content is all about and you can leave some words without loosing any important information). But I think the preferred way is to identify language first and then use specific language stop list. If you can't use language identification then you can try creative ways like: Employing some kind of document classification algorithm and then creating stop lists for each class. Then with every new document you will determine first in which class it belongs and then apply particular stop list. I am just sucking the wind here... Regards, Lukas On 10/18/07, Joseph Doehr <[EMAIL PROTECTED]> wrote: > > > Hi Maria, > > this is a "me too". ;) > At the moment I'll take the way to merge the various language stopword > files I need to one and use it. But the main problem in this case is, > having collusions with words which are stopwords in one language and in > the other not. > > Cheers, > Joe > > > Maria Mosolova schrieb: > > I am looking for a multilingual list of stopwords to use with > > Solr/Lucene and would greatly appreciate an advice on where I could > > find it. > > -- http://blog.lukas-vlcek.com/