Thanks Toke. The issue I have is I cannot look for a specific word e.g. ddr in termfreq(%27name%27,%20%27ddr%27). I have to find count of all words and their sum
CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are intended only for the use of the addressee(s) and may contain confidential and legally privileged information belonging to CEB and/or its subsidiaries, including SHL. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -----Original Message----- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Tuesday, May 10, 2016 1:52 PM To: solr-user@lucene.apache.org Subject: Re: Facet ignoring repeated word On Fri, 2016-04-29 at 08:55 +0000, G, Rajesh wrote: > I am trying to implement word > cloud<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_imgres-3Fimgurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fsites-252Fdefault-252Ffiles-252Fother-252Fsotu-5Fwordle.png-26imgrefurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fblog-252F2011-252F01-252F26-252Fstate-2Dunion-2Dword-2Dcloud-2Djobs-2Damerica-2Dpeople-2Dnew-26docid-3DeZ-5FHvQpd9FRBKM-26tbnid-3DqyIc-2Delv6z-2D0iM-253A-26w-3D895-26h-3D406-26bih-3D643-26biw-3D1366-26ved-3D0ahUKEwie-5F8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA-26iact-3Dmrc-26uact-3D8&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=fEZWmciBUrd2RCDeqkQcv4wZx4tZlQIt_u01gB6D0VU&e= > > using Solr. The problem I have is Solr facet query ignores repeated > words in a document eg. Use a combination of faceting and stats: 1) Resolve candidate words with faceting, just as you have already done. 2) Create a stats-request with the same q as you used for faceting, with a termfreq-function for each term in your facet result. Working example from the techproducts-demo that comes with Solr: https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_techproducts_select&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=UWysIbdd4V1fnKkuLiek_J_zQ66MM2YNLLVI7f--ICI&e= ?q=name%3Addr%0A &fl=name&wt=json&indent=true &stats=true &stats.field={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27) &stats.field={!sum=true%20func}termfreq(%27name%27,%20%271GB%27) where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB' are two terms ('absorbed', 'am', 'believe' etc. in your setup). The result will be something like "response": { "numFound": 3, ... "stats": { "stats_fields": { "termfreq('name', 'ddr')": { "sum": 6 }, "termfreq('name', '1GB')": { "sum": 3 } } } - Toke Eskildsen, State and University Library, Denmark