One thing you might do is use the termfreq function to see that it looks like in the index. Also the schema/analysis page will put terms in "buckets" by power-of-2 so that might help too.
Best, Erick On Tue, Nov 21, 2017 at 7:55 AM, Barbet Alain <alian123sol...@gmail.com> wrote: > You rock, thank you so much for this clear answer, I loose 2 days for > nothing as I've already the term freq but now I've an answer :-) > (And yes I check it's the doc freq, not the term freq). > > Thank you again ! > > 2017-11-21 16:34 GMT+01:00 Emir Arnautović <emir.arnauto...@sematext.com>: >> Hi Alain, >> As explained in prev mail that is doc frequency and each doc is counted >> once. I am not sure if Luke can provide you information about overall term >> frequency - sum of term frequency of all docs. >> >> Regards, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 21 Nov 2017, at 16:30, Barbet Alain <alian123sol...@gmail.com> wrote: >>> >>> $ cat add_test.sh >>> DATA=' >>> <add> >>> <doc> >>> <field name="docid">666</field> >>> <field name="titi_txt_fr">toto titi tata toto tutu titi</field> >>> </doc> >>> </add> >>> ' >>> $ sh add_test.sh >>> <?xml version="1.0" encoding="UTF-8"?> >>> <response> >>> <lst name="responseHeader"><int name="status">0</int><int >>> name="QTime">484</int></lst> >>> </response> >>> >>> >>> $ curl >>> 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr&terms.sort=index' >>> <?xml version="1.0" encoding="UTF-8"?> >>> <response> >>> <lst name="responseHeader"><int name="status">0</int><int >>> name="QTime">0</int></lst><lst name="terms"><lst >>> name="titi_txt_fr"><int name="tata">1</int><int >>> name="titi">1</int><int name="toto">1</int><int >>> name="tutu">1</int></lst></lst> >>> </response> >>> >>> >>> So it's not only on Luke Side, it's come from Solr. Does it sound normal ? >>> >>> 2017-11-21 11:43 GMT+01:00 Barbet Alain <alian123sol...@gmail.com>: >>>> Hi, >>>> >>>> I build a custom analyzer & setup it in solr, but doesn't work as I expect. >>>> I always get 1 as frequency for each word even if it's present >>>> multiple time in the text. >>>> >>>> So I try with default analyzer & find same behavior: >>>> My schema >>>> >>>> <fieldType name="text_ami" class="solr.TextField"> >>>> <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> >>>> </fieldType> >>>> <field name="docid" type="string" indexed="true" required="true" >>>> stored="true"/> >>>> <field name="test_text" type="nametext"/> >>>> >>>> alian@yoda:~/solr> cat add_test.sh >>>> DATA=' >>>> <add> >>>> <doc> >>>> <field name="docid">666</field> >>>> <field name="test_text">toto titi tata toto tutu titi</field> >>>> </doc> >>>> </add> >>>> ' >>>> curl -X POST -H 'Content-Type: text/xml' >>>> 'http://localhost:8983/solr/alian_test/update?commit=true' >>>> --data-binary "$DATA" >>>> >>>> When I test in solr interface / analyze, I find the right behavior >>>> (find titi & toto 2 times). >>>> But when I look in solr index with Luke or solr interface / schema, >>>> the top term always get 1 as frequency. Can someone give me the thing >>>> I forget ? >>>> >>>> (solr 6.5) >>>> >>>> Thank you ! >>