Re: French synonyms & Online synonyms

Pierre Auslaender Tue, 30 Sep 2008 11:36:37 -0700

True, synonyms can be grouped in cliques based on the strength of their"resemblence" given a specific context.

But what I'm indexing is the text content of TV programs produced by apublic television, so the context is very large and non-specific. What Iwant is to find "automobile" for "car", "motorcycle" for "bike", "pub"for "restaurant", "woman" for "lady", and the likes.

There actually are free on-line resources for most European languages(of course, English included), check these out:

http://dico.isc.cnrs.fr/dico_html/en/index.html
http://www.crisco.unicaen.fr/alexandria2.html

Would you mind commenting on the following plan for a special synonymanalyzer.

1/ We would start with an empty synonyms file.

2/ For each indexing request, the analyser looks up the file forsynonyms. If it finds synonyms, it proceeds normally.3/ Otherwise, it checks an online resource for synonyms, updates thesynonyms file, and proceeds.

If you think this is workable, there are two problems left: which termsto look up for online synonyms, and how to select the "synonymity" clique.

For the first issue, I would definitely only search for synonyms ofnouns, verbs and adjectives, so some stemming is required initially.For the second issue, I'd have a cut-off value for the strength of"resemblence", if this information is available, or / and use thefrequency of the synonyms in the SOLR index as a measure.

Building the synonyms file that way would make the system quicker overtime, and for a specific domain (chemistry, biology, sports, etc) theprocess would be auto-adaptive - perhaps with some human help from timeto time.


Thanks,
Pierre

Walter Underwood a écrit :

Synonyms are domain-specific, so general-purpose lists are not very useful.

Ultraseek shipped a British-American synonym list as an example, but even
that wasn't very general. One of our customers was a chemical company and
was very surprised when the search "rocket fuel" suggested "arugula",
even though "rocket" is a perfectly good synonym for "arugula".

wunder

On 9/30/08 10:14 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

Pierre,

1) I don't know, but a good place to check and see what previous answers to
this questions were is markmail.org
2) I don't think there is such a thing, but I also don't think there are sites
that make this data freely available (answer to 1?)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

From: Pierre Auslaender <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, September 30, 2008 11:28:40 AM
Subject: French synonyms & Online synonyms

Hello,

I'm sure these questions have been raised a million times, I'll try one
more:

1/ Is there any general-purpose, free, French synonyms file out there?

2/ Is there a Solr or Lucene analyser class that could tap an on-line
resource for synoynms at index-time? And by the same token, maintain and
complete a synoynms text file?

Thanks for the great work on SOLR and for the liveliness of this list.

Pierre

Re: French synonyms & Online synonyms

Reply via email to