from:"tomtom"

poor language detection

2012-09-18 Thread tomtom

Hi,

I've got a problem with language detection. There are about 120 documents in
different languages to import, mostly chinese, english, german and others.
English and german are classified quite well, but chinese, japanese and
others stray into a field 'fieldname_lt' - for lituanian language. 

As I see many writers, who have good experience with language detection, my
first questions is:

is there something missing (apache-solr-langid-3.6.1.jar hence the
'langdetect-profiles' are not deployed into my glassfish-server, the
deployed apache-solr-3.6.1.war doesn't contain this and other libraries from
the 'dist'-directory)?


My configuration:


   
 
  true
   
  attr_content, attr_dw_title
  language_s
  language_all 
  true
  
  eu
  0.2
 
   
   
   
 


Experimenting with threshold doesn't change the results so much. The
fallback 'eu' only contains numbers.
The strange indexing distribution (seen in luke) is:
  content_de   2,52%  (seems correct)
  content_en  11,13% (seems correct)
  content_eu0,5%  (fallback)
  content_lt35,5%  (not in any configuration file)

Lookimng the content_lt shows mostly chinese, japanese and other "non-latin"
contents.


Any known issue or ignorance for my part?


Thank you in advance!

sincerely, tom



--
View this message in context: 
http://lucene.472066.n3.nabble.com/poor-language-detection-tp4008624.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: poor language detection

2012-09-21 Thread tomtom

Hi Markus,

thank you very much, it helped. After many tries and reorderings in
config-files it works.

Greetings, tom






--
View this message in context: 
http://lucene.472066.n3.nabble.com/poor-language-detection-tp4008624p4009374.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrJ Apidoc - is there any comprehensive literature ?

2012-10-09 Thread tomtom

Hi,

is there a comprehensive documentation of the SolrJ Api?  The given
resources are hard to read and have only few information. The Guide from
lucid imagination gave me a certain progress but is just a well organized
compilation of the apache documents. For programming there's just few help
and the apidoc seems very spare.   

I wonder where the gracefully given hints in this forum (and others) are
from. Is it really necessarey to inspect the source-code of Solr? Does
anyone knows about "readable" and nearly complete documentation? 

A sometimes frustrated Solr-User/programmer.

Thank you in advance, tom



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Apidoc-is-there-any-comprehensive-literature-tp4012673.html
Sent from the Solr - User mailing list archive at Nabble.com.

poor language detection

RE: poor language detection

SolrJ Apidoc - is there any comprehensive literature ?

3 matches

Site Navigation

Mail list logo

Footer information