Re: UIMA without API key

Tommaso Teofili Mon, 04 Jul 2011 15:03:50 -0700

No, sorry maybe my explanation was just too abstract.
What I was suggesting is an alternative way of extracting language based on
stopwords dictionaries (using one DictionaryAnnotator instance for each
language) and a custom Annotator to evaluate which dictionary collected more
hits.
In general extracting language with UIMA without having an internet
connection can be done in various ways, if you need help on this however it
may be better asking about it on UIMA mailing list ( [email protected] ).
Another option for language identification task which does not use UIMA but
exploits Tika capabilities is being discussed/developed on
https://issues.apache.org/jira/browse/SOLR-1979
Hope this helps,
Tommaso




2011/7/4 PacoPeralta <[email protected]>

>
>
> Sorry for my insistence...
> If I have configured into the uima_config  in the solrconfig.xml:
>
> <lst name="type">
>            <str
> name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
>            <lst name="mapping">
>              <str name="feature">language</str>
>              <str name="field">language</str>
>            </lst>
>          </lst>
>
>  <lst name="type">
>           <str name="name">org.apache.uima.DictionaryEntry</str>
>           <lst name="mapping">
>             <str name="feature">coveredText</str>
>             <str name="field">tag</str>
>           </lst>
>         </lst>
>
> And I follow the steps that you listed, Could I extract language and
> dictionary entries form the indexed documents?
>
> Excuse my ignorance...
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/UIMA-without-API-key-tp3135299p3137478.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

Re: UIMA without API key

Reply via email to