Thanks Toke. The issue I have is I cannot look for a specific word e.g. ddr in 
termfreq(%27name%27,%20%27ddr%27). I have to find count of all words and their 
sum. I might have 1000+ comments and each might have different words



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-----Original Message-----
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Tuesday, May 10, 2016 6:22 PM
To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
Subject: RE: Facet ignoring repeated word

Thanks Toke. The issue I have is I cannot look for a specific word e.g. ddr in 
termfreq(%27name%27,%20%27ddr%27). I have to find count of all words and their 
sum



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-----Original Message-----
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Tuesday, May 10, 2016 1:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Facet ignoring repeated word

On Fri, 2016-04-29 at 08:55 +0000, G, Rajesh wrote:
> I am trying to implement word 
> cloud<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_imgres-3Fimgurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fsites-252Fdefault-252Ffiles-252Fother-252Fsotu-5Fwordle.png-26imgrefurl-3Dhttps-253A-252F-252Fwww.whitehouse.gov-252Fblog-252F2011-252F01-252F26-252Fstate-2Dunion-2Dword-2Dcloud-2Djobs-2Damerica-2Dpeople-2Dnew-26docid-3DeZ-5FHvQpd9FRBKM-26tbnid-3DqyIc-2Delv6z-2D0iM-253A-26w-3D895-26h-3D406-26bih-3D643-26biw-3D1366-26ved-3D0ahUKEwie-5F8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA-26iact-3Dmrc-26uact-3D8&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=fEZWmciBUrd2RCDeqkQcv4wZx4tZlQIt_u01gB6D0VU&e=
>  >  using Solr.  The problem I have is Solr facet query ignores repeated 
> words in a document eg.

Use a combination of faceting and stats:

1) Resolve candidate words with faceting, just as you have already done.

2) Create a stats-request with the same q as you used for faceting, with a 
termfreq-function for each term in your facet result.


Working example from the techproducts-demo that comes with Solr:

https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_techproducts_select&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=ZdiuXWIvnemQkwtzfuD8daMQYonM62VtPXW6Nojd__o&s=UWysIbdd4V1fnKkuLiek_J_zQ66MM2YNLLVI7f--ICI&e=
?q=name%3Addr%0A
&fl=name&wt=json&indent=true
&stats=true
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27)
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%271GB%27)

where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB'
are two terms ('absorbed', 'am', 'believe' etc. in your setup).


The result will be something like

"response": {
    "numFound": 3,
...
"stats": {
    "stats_fields": {
      "termfreq('name', 'ddr')": {
        "sum": 6
      },
      "termfreq('name', '1GB')": {
        "sum": 3
      }
    }
  }


- Toke Eskildsen, State and University Library, Denmark


Reply via email to