On Fri, Jun 12, 2015 at 10:30 AM, kingofhypocrites
<kingofhypocri...@gmail.com> wrote:
> I am migrating a database from SQL Server to Cassandra. Currently I have a
> setup as follows:
>
> - Log data in Cassandra
> - Summarize data in Spark and put into Cassandra summary tables
> - Query data in Solr
>
> Everything fits beautifully until I need to do stats on groups. I am hoping
> to get this to work with Solr so I can stick to one database, but I am not
> sure it's possible.
>
> If I had it in SQL Server, I could do it like so:
> SELECT
>     site_id,
>     keyword,
>     SUM(visits) as visits,
>     CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate,
>     SUM(pageviews) as pageviews,
>     CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as
> avg_pages_per_visit
> FROM
>     report_all_keywords_daily
> WHERE
>     site_id = 55 AND date_key >= '20150606' AND date_key <= '20150608'
> GROUP BY
>     site_id, keyword
> ORDER BY visits DESC

This is the closest we can get with the JSON Facet API today:

json.facet={
  sites: {
    type : terms,
    field : site_id,
    sort : "visits desc",
    facet : {
      visits : "sum(visits)",
      bounces : "sum(bounces)",
      pageviews : "sum(pageviews)"
    }
  }
}

That doesn't take into account "keyword" when sorting the buckets.
You could nest a ketword facet inside a site facet and thus calculate
the stats for the top N keywords per site:

json.facet={
  sites: {
    type : terms,
    field : site_id,
    facet : {
      keywords: {
           type : terms,
           field : keyword,
           sort : "visits desc",
           facet : {
              visits : "sum(visits)",
              bounces : "sum(bounces)",
              pageviews : "sum(pageviews)"
          }
     }
  }
}

More info here:  http://yonik.com/json-facet-api/

-Yonik

Reply via email to