Histogram facet?

2014-05-05 Thread Romain
Hi,

I am trying to plot a non date field by time in order to draw an histogram
showing its evolution during the week.

For example, if I have a tweet index:

Tweet:
  date
  retweetCount

3 tweets indexed:
Tweet | Date | Retweet
A01/01   100
B01/01   100
C01/02   100

If I want to plot the number of tweets by day: easy with a date range facet:
Day 1: 2
Day 2: 1

But now counting the number of retweet by day is not possible natively:
Day 1: 200
Day 2: 100

On current workaround would be to do a date rage facet to get the date
slots and ask only for the retweet field and compute the sums in the
client. We could compute other stats like average, etc... too

The closest I could see was
https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
slightly different.

Basically I am trying to do something very similar to the Date Histogram
Facet<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet>in
ES.

Is there a way to move the counting logic to the Solr server?

Thanks!

Romain


Query with star returns double type values equal 0

2011-10-17 Thread romain
Hello,

I am experiencing an unexpected behavior using solr 3.4.0.

if my query includes a star, all the properties of type 'long' or 'LatLon'
have 0 as value
(ex: select/?start=0&q=way*&rows=10&version=2)

Though the same request without stars returns correct values
(ex: select/?start=0&q=way&rows=10&version=2)

Does anyone have an idea?

Romain.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3428721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query with star returns double type values equal 0

2011-10-18 Thread romain
Hi iorixxx,

I am using lucene

On Monday, October 17, 2011 5:58:31 PM, iorixxx [via Lucene] wrote:
> > I am experiencing an unexpected behavior using solr 3.4.0.
> >
> > if my query includes a star, all the properties of type
> > 'long' or 'LatLon'
> > have 0 as value
> > (ex: select/?start=0&q=way*&rows=10&version=2)
> >
> > Though the same request without stars returns correct
> > values
> > (ex: select/?start=0&q=way&rows=10&version=2)
> >
> > Does anyone have an idea?
>
> Please keep in mind that wildcard queries are not analyzed.
>
> What query parser are you using? lucene, dismax, edismax?
>
>
>
>
> 
> If you reply to this email, your message will be added to the 
> discussion below:
> http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3429578.html
>  
>
> To unsubscribe from Query with star returns double type values equal 
> 0, click here 
> .
>  
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3432312.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Histogram facet?

2014-05-05 Thread Romain Rigaux
The dates won't match unless you truncate all of them to day. But then if
you want to have slots of 15minutes it won't work as you would need to
truncate the dates every 15minutes in the index.

In ES, they have 1 field to make the slots and 1 field to insert into the
bucket, e.g.:

{
"query" : {


"match_all" : {}


},
"facets" : {


"histo1" : {


"date_histogram" : {


"key_field" : "timestamp",


"value_field" : "price",


"interval" : "day"


}
}


}
}

Romain


On Mon, May 5, 2014 at 9:05 PM, Erick Erickson wrote:

> Hmmm, I _think_ pivot faceting works here. One dimension would be day
> and the other retweet count. The response will have the number of
> retweets per day, you'd have to sum them up I suppose.
>
> Best,
> Erick
>
> On Mon, May 5, 2014 at 3:18 PM, Romain  wrote:
> > Hi,
> >
> > I am trying to plot a non date field by time in order to draw an
> histogram
> > showing its evolution during the week.
> >
> > For example, if I have a tweet index:
> >
> > Tweet:
> >   date
> >   retweetCount
> >
> > 3 tweets indexed:
> > Tweet | Date | Retweet
> > A01/01   100
> > B01/01   100
> > C01/02   100
> >
> > If I want to plot the number of tweets by day: easy with a date range
> facet:
> > Day 1: 2
> > Day 2: 1
> >
> > But now counting the number of retweet by day is not possible natively:
> > Day 1: 200
> > Day 2: 100
> >
> > On current workaround would be to do a date rage facet to get the date
> > slots and ask only for the retweet field and compute the sums in the
> > client. We could compute other stats like average, etc... too
> >
> > The closest I could see was
> > https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
> > slightly different.
> >
> > Basically I am trying to do something very similar to the Date Histogram
> > Facet<
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet
> >in
> > ES.
> >
> > Is there a way to move the counting logic to the Solr server?
> >
> > Thanks!
> >
> > Romain
>


Re: Histogram facet?

2014-05-06 Thread Romain Rigaux
This looks nice!

The only missing piece for more interactivity would be to be able to map
multiple field values into the same bucket.

e.g.

http://localhost:8983/solr/query?
   q=*:*
   &facet=true
   &facet.field=*round(date, '15MINUTES')*
   &facet.stat=sum(retweetCount)

This is a bit similar to
SOLR-4772<https://issues.apache.org/jira/browse/SOLR-4772>for the
rounding.

Then we could zoom out just by changing the size of the bucket, without any
index change, e.g.:
http://localhost:8983/solr/query?
   q=*:*
   &facet=true
   &facet.field=*round(date, '1HOURS')*
   &facet.stat=sum(retweetCount)

Romain

On Tue, May 6, 2014 at 10:09 AM, Yonik Seeley  wrote:

> On Mon, May 5, 2014 at 6:18 PM, Romain  wrote:
> > Hi,
> >
> > I am trying to plot a non date field by time in order to draw an
> histogram
> > showing its evolution during the week.
> >
> > For example, if I have a tweet index:
> >
> > Tweet:
> >   date
> >   retweetCount
> >
> > 3 tweets indexed:
> > Tweet | Date | Retweet
> > A01/01   100
> > B01/01   100
> > C01/02   100
> >
> > If I want to plot the number of tweets by day: easy with a date range
> facet:
> > Day 1: 2
> > Day 2: 1
> >
> > But now counting the number of retweet by day is not possible natively:
> > Day 1: 200
> > Day 2: 100
>
> Check out "facet functions" in Heliosearch (an experimental fork of Solr):
> http://heliosearch.org/solr-facet-functions/
>
> All you would need to do is add:
> facet.stat=sum(retweetCount)
>
> -Yonik
> http://heliosearch.org - solve Solr GC pauses with off-heap filters
> and fieldcache
>


Re: Histogram facet?

2014-05-06 Thread Romain Rigaux
This is super nice, I tried (even without subfacets) and it works! Thanks a
lot!

Romain

facet=true&facet.range=price&facet.range.start=0&facet.range.end=1000&facet.range.gap=100&facet.stat=avg(popularity)


facets": { "price": { "buckets": [ { "val": "0.0", "avg(popularity)":
3.5714285714285716 }, { "val": "100.0", "avg(popularity)": 5.5 }, { "val":
"200.0", "avg(popularity)": 6 }, { "val": "300.0", "avg(popularity)":
7.667 }, { "val": "400.0", "avg(popularity)": 7 }, { "val":
"500.0", "avg(popularity)": "NaN" }, { "val": "600.0", "avg(popularity)": 7},
{ "val": "700.0", "avg(popularity)": "NaN" }, { "val": "800.0", "
avg(popularity)": "NaN" }, { "val": "900.0", "avg(popularity)": "NaN" } ], "
gap": 100, "start": 0, "end": 1000 }


On Tue, May 6, 2014 at 3:15 PM, Yonik Seeley  wrote:

> On Tue, May 6, 2014 at 5:30 PM, Romain Rigaux  wrote:
> > This looks nice!
> >
> > The only missing piece for more interactivity would be to be able to map
> > multiple field values into the same bucket.
> >
> > e.g.
> >
> > http://localhost:8983/solr/query?
> >q=*:*
> >&facet=true
> >&facet.field=*round(date, '15MINUTES')*
> >&facet.stat=sum(retweetCount)
> >
> > This is a bit similar to
> > SOLR-4772<https://issues.apache.org/jira/browse/SOLR-4772>for the
> > rounding.
> >
> > Then we could zoom out just by changing the size of the bucket, without
> any
> > index change, e.g.:
> > http://localhost:8983/solr/query?
> >q=*:*
> >&facet=true
> >&facet.field=*round(date, '1HOURS')*
> >&facet.stat=sum(retweetCount)
>
> For this specific example, I think "map multiple field values into the
> same bucket" equates to a range facet?
>
> facet.range=mydatefield
> facet.range.start=...
> facet.range.end=...
> facet.range.gap=+1HOURS
> facet.stat=sum(retweetCount)
>
> And then if you need additional breakouts by time range, you can use
> subfacets:
>
> subfacet.mydatefield.field=mycategoryfield
>
> That will provide retweet counts broken out by "mycategoryfield" for
> every bucket produced by the range query.
>
> See http://heliosearch.org/solr-subfacets/
>
> -Yonik
> http://heliosearch.org - facet functions, subfacets, off-heap
> filters&fieldcache
>


Using wild characters in query doesn't work with my configuraiton

2014-08-27 Thread Romain Pigeyre
Hi,

I have a little mistake using Solr :

I can query this : "lastName:HK+IE"
The result contains the next record :
{ "customerId": "0003500226598", "countryLibelle": "HONG KONG", "firstName1":
"lC /o", "countryCode": "HK", "address1": " 1F0/", "address2": "11-35", "
storeId": "100", "lastName1": "HK IE", "city": "HONG KONG", "_version_":
1477612965227135000 }
NB : lastName contains the lastName1 field.

When I'm adding * on the same query : "lastName:*HK*+*IE*", there is no
result. I hoped that the * character replace 0 to n character.

Here is my configuration :






  




  
  




  


I'm using a WhitespaceTokenizerFactory at indexing time in order to keep
specials characters : /?...
After this configuration, I restarted Solr and re-indexed data.

Is Somebody have any idea to resolve this issue?

Thanks a lot

-- 

*-Romain PIGEYRE*


Using def function in fl criteria,

2014-09-09 Thread Pigeyre Romain
Hi

I'm trying to use a query with 
fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
As you can see, the itemDesc field (builded by solr) is truncated :

{
"name_UK": "MEN S SUIT\n",
"name_FRA": "24 RELAX 2 BTS ST GERMAIN TOILE FLAMMEE LIN ET SOIE",
"itemDesc": "suit"
  }

Do you have any idea to change it?

Thanks.

Regards,

Romain


Re: Using def function in fl criteria,

2014-09-09 Thread Pigeyre Romain
I want to return :

-the field name_UK (if it exists)

-Otherwise the name_FRA field
... into an alias field (itemDesc, created at query time).

There is no schema definition for itemDesc because, it is only a virtual field 
declared in fl= criteria. I don't understand while filter is applying to this 
field.

On Tue, Sep 9, 2014 at 17:44 AM, Erick Erickson 
mailto:erickerick...@gmail.com>> wrote:

> I'm really confused about what you're trying to do here. What do you
> intend the syntax
> itemDesc:def(name_UK,name_FRA)
> to do?
>
> It's also really difficult to say much of anything unless we see the
> schema definition for "itemDesc" and sample input.
>
> Likely you're somehow applying an analysis chain that is truncating
> the input. Or it's also possible that you aren't indexing quite what
> you think you are.
>
> Best,
> Erick
>
> On Tue, Sep 9, 2014 at 4:36 AM, Pigeyre Romain 
> mailto:romain.pige...@sopra.com>> wrote:
> > Hi
> >
> > I'm trying to use a query with 
> > fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
> > As you can see, the itemDesc field (builded by solr) is truncated :
> >
> > {
> > "name_UK": "MEN S SUIT\n",
> > "name_FRA": "24 RELAX 2 BTS ST GERMAIN TOILE FLAMMEE LIN ET SOIE",
> > "itemDesc": "suit"
> >   }
> >
> > Do you have any idea to change it?
> >
> > Thanks.
> >
> > Regards,
> >
> > Romain




Scoring with wild cars

2014-09-24 Thread Pigeyre Romain
Hi,

I hava two records with name_fra field
One with name_fra="un test CARREAU"
And another one with name_fra="un test CARRE"

{
"codeBarre": "1",
"name_FRA": "un test CARREAU"
  }
{
"codeBarre": "2",
"name_FRA": "un test CARRE"
  }

Configuration of these fields are :







  





  
  





  


When I'm using this query :
http://localhost:8983/solr/cdv_product/select?q=text%3Acarre*&fl=score%2C+*&wt=json&indent=true&debugQuery=true
The result is :
{
  "responseHeader":{
"status":0,
"QTime":2,
"params":{
  "debugQuery":"true",
  "fl":"score, *",
  "indent":"true",
  "q":"text:carre*",
  "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
  {
   "codeBarre":"1",
"name_FRA":"un test CARREAU",
"_version_":1480150860842401792,
"score":1.0},
  {
"codeBarre":"2",
"name_FRA":"un test CARRE",
"_version_":1480150875738472448,
"score":1.0}]
  },
  "debug":{
"rawquerystring":"text:carre*",
"querystring":"text:carre*",
"parsedquery":"text:carre*",
"parsedquery_toString":"text:carre*",
"explain":{
  "1":"\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n",
  "2":"\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n  1.0 = 
boost\n  1.0 = queryNorm\n"},
"QParser":"LuceneQParser",
"timing":{
  "time":2.0,
  "prepare":{
"time":1.0,
"query":{
  "time":1.0},
"facet":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":1.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":1.0}

The score is the same for both of record. CARREAU record is first and CARRE is 
next. I want to place CARRE before CARREAU result because CARRE is an exact 
match. Is it possible?

NB : scoring for this query only use querynorm and boosters

In this test :
http://localhost:8983/solr/cdv_product/select?q=text%3Acarre&fl=score%2C*&wt=json&indent=true&debugQuery=true

I have only one record found but the scoring is more complex. Why?

{

  "responseHeader":{

"status":0,

"QTime":2,

"params":{

  "debugQuery":"true",

  "fl":"score,*",

  "indent":"true",

  "q":"text:carre",

  "wt":"json"}},

  "response":{"numFound":1,"start":0,"maxScore":0.53033006,"docs":[

  {

"codeBarre":"2",

"name_FRA":"un test CARRE",

"_version_":1480150875738472448,

"score":0.53033006}]

  },

  "debug":{

"rawquerystring":"text:carre",

"querystring":"text:carre",

"parsedquery":"text:carre",

"parsedquery_toString":"text:carre",

"explain":{

  "2":"\n0.53033006 = (MATCH) weight(text:carre in 0) [DefaultSimilarity], 
result of:\n  0.53033006 = fieldWeight in 0, product of:\n1.4142135 = 
tf(freq=2.0), with freq of:\n  2.0 = termFreq=2.0\n1.0 = idf(docFreq=1, 
maxDocs=2)\n0.375 = fieldNorm(doc=0)\n"},

"QParser":"LuceneQParser",

"timing":{

  "time":2.0,

  "prepare"