Solr - example for using percentiles
Hi, Can you direct me for Java example using Solr percentiles? The following 3 examples are not seems to be working. search.setParam("facet", true); search.setParam("percentiles", true); search.setParam("percentiles.field", "networkTime"); search.setParam("percentiles.requested.percentiles", "25,50,75"); search.setParam("percentiles.lower.fence", "0"); search.setParam("percentiles.upper.fence", "100"); search.setParam("percentiles.gap", "10"); search.setParam("percentiles.averages", true); search.setParam("facet", true); search.setParam("facets.stats.percentiles", true); search.setParam("facets.stats.percentiles.field", "networkTime"); search.setParam("f.networkTime.stats.percentiles.requested", "25,50,75"); search.setParam("f.networkTime.stats.percentiles.lower.fence", "0"); search.setParam("f.networkTime.stats.percentiles.upper.fence", "100"); search.setParam("f.networkTime.stats.percentiles.gap", "10"); search.setParam("facets.stats.percentiles.averages", true); search.setParam("facet", true); search.setParam("facets.stats.percentiles", true); search.setParam("facets.stats.percentiles.field", "networkTime"); search.setParam("facets.networkTime.stats.percentiles.requested", "25,50,75"); search.setParam("facets.networkTime.stats.percentiles.lower.fence", "0"); search.setParam("facets.networkTime.stats.percentiles.upper.fence", "100"); search.setParam("facets.networkTime.stats.percentiles.gap", "10"); search.setParam("facets.stats.percentiles.averages", true); Thanks, Gilad
Missing Segment File
Hi All, How does one resolve the missing segments issue: java.nio.file.NoSuchFileException: /pathxxx/data/index/segments_1bj Seems like it only occurs on large csv imports via DIH === GPAA e-mail Disclaimers and confidential note This e-mail is intended for the exclusive use of the addressee only. If you are not the intended recipient, you should not use the contents or disclose them to any other person. Please notify the sender immediately and delete the e-mail. This e-mail is not intended nor shall it be taken to create any legal relations, contractual or otherwise. Legally binding obligations can only arise for the GPAA by means of a written instrument signed by an authorised signatory. ===
Re: Referencing a !key and !stat in facet.pivot
If i'm understanding your question correctly, what you're looking for is simply... stats.field={!tag=pivot_stats}lastPrice facet.pivot={!key=pivot stats=pivot_stats}buyer,vendor ...there should only ever be one set of "{}" in the facet.pivot, defining the set of local params, and there are 2 param=values definied inside those "{}" (just like if you wanted multiple local params for the stats.field to define what stats you want to compute) https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-CombiningStatsComponentWithPivots https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-LocalParametersforFaceting https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-LocalParameters : Date: Thu, 12 Jan 2017 20:44:35 -0500 : From: John Blythe : Reply-To: solr-user@lucene.apache.org : To: solr-user : Subject: Referencing a !key and !stat in facet.pivot : : hi all : : i'm having an issue with an attempt to assign a key to a facet.pivot while : simultaneously referencing one of my stat fields. : : i've got something like this: : : stats.field={!tag=pivot_stats}lastPrice& : > ... : > facet.pivot={!key=pivot} {!stats=pivot_stats}buyer,vendor& ... : : : i've attempted it without a space, wrapping the entire pivot in the !key's : { } braces and anything else i could think of. some return errors, others : return the query results but w an empty : : "facet_counts":{ : > : > "facet_pivot":{ : > "pivot":[]}}, : : : it will work if I totally remove the {!key=pivot} portion, however. : : is there any way to have both present? : : thanks! : -Hoss http://www.lucidworks.com/
Re: Error Loading Custom Codec class with Solr Codec Factory. Class cast exception
: But when I try to load this codec directly via Solrconfig.xml CodecFactory : as below. : : : ...there is a difference between a (lucene layer) Codec. And a (solr layer) CodecFactory. Having the codec code in place (with the necessary SPI metadata files) let's Solr/Lucene *read* indexes written in that codec, but in order to create new indexes with your codec, you have to write a concrete implementation of the CodecFactory abstract class and provide that *Factory* class name in your config line. There is probably no CodecFactory for your DummyEncryptedLucene60Codec defined in the patch you're trying out. -Hoss http://www.lucidworks.com/
Re: Referencing a !key and !stat in facet.pivot
Appreciated. Will give it a whirl tomorrow and report back! On Sun, Jan 15, 2017 at 12:36 PM Chris Hostetter wrote: > > > If i'm understanding your question correctly, what you're looking for is > > simply... > > > > stats.field={!tag=pivot_stats}lastPrice > > facet.pivot={!key=pivot stats=pivot_stats}buyer,vendor > > > > ...there should only ever be one set of "{}" in the facet.pivot, defining > > the set of local params, and there are 2 param=values definied inside > > those "{}" (just like if you wanted multiple local params for the > > stats.field to define what stats you want to compute) > > > > > https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries > > > > > https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-CombiningStatsComponentWithPivots > > > https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-LocalParametersforFaceting > > > https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-LocalParameters > > > > > > : Date: Thu, 12 Jan 2017 20:44:35 -0500 > > : From: John Blythe > > : Reply-To: solr-user@lucene.apache.org > > : To: solr-user > > : Subject: Referencing a !key and !stat in facet.pivot > > : > > : hi all > > : > > : i'm having an issue with an attempt to assign a key to a facet.pivot > while > > : simultaneously referencing one of my stat fields. > > : > > : i've got something like this: > > : > > : stats.field={!tag=pivot_stats}lastPrice& > > : > ... > > : > facet.pivot={!key=pivot} {!stats=pivot_stats}buyer,vendor& ... > > : > > : > > : i've attempted it without a space, wrapping the entire pivot in the > !key's > > : { } braces and anything else i could think of. some return errors, others > > : return the query results but w an empty > > : > > : "facet_counts":{ > > : > > > : > "facet_pivot":{ > > : > "pivot":[]}}, > > : > > : > > : it will work if I totally remove the {!key=pivot} portion, however. > > : > > : is there any way to have both present? > > : > > : thanks! > > : > > > > -Hoss > > http://www.lucidworks.com/ > >
Solr schema design: fitting time-series data
Hi, I am trying to fit the following data in Solr to support flexible queries and would like to get your input on the same. I have data about users say: contentID (assume uuid), platform (eg. website, mobile etc), softwareVersion (eg. sw1.1, sw2.5, ..etc), regionId (eg. us144, uk123, etc..) and few more other such fields. This data is partially pre aggregated (read Hadoop jobs): so let’s assume for "contentID = uuid123 and platform = mobile and softwareVersion = sw1.2 and regionId = ANY" I have data in format: timestamp pre-aggregated data [ uniques, total] Jan 15[ 12, 4] Jan 14[ 4, 3] Jan 13[ 8, 7] ...... And then I also have less granular data say "contentID = uuid123 and platform = mobile and softwareVersion = ANY and regionId = ANY (These values will be more than above table since granularity is reduced) timestamp : pre-aggregated data [uniques, total] Jan 15[ 100, 40] Jan 14[ 45, 30] ... ... I'll get queries like "contentID = uuid123 and platform = mobile" , give sum of 'uniques' for Jan15 - Jan13 or for "contentID=uuid123 and platform=mobile and softwareVersion=sw1.2", give sum of 'total' for Jan15 - Jan01. I was thinking of simple schema where documents will be like (first example above): { "contentID": "uuid12349789", "platform" : "mobile", "softwareVersion": "sw1.2", "regionId": "ANY", "ts" : "2017-01-15T01:01:21Z", "unique": 12, "total": 4 } second example from above: { "contentID": "uuid12349789", "platform" : "mobile", "softwareVersion": "ANY", "regionId": "ANY", "ts" : "2017-01-15T01:01:21Z", "unique": 100, "total": 40 } Possible optimization: { "contentID": "uuid12349789", "platform.mobile.softwareVersion.sw1.2.region.us12" : { "unique": 12, "total": 4 }, "platform.mobile.softwareVersion.sw1.2.region.ANY" : { "unique": 100, "total": 40 }, "ts" : "2017-01-15T01:01:21Z" } Challenges: Number of such rows is very large and it'll grow exponentially with every new field - For instance if I go with above suggested schema, I'll end up storing a new document for each combination of contentID,platform,softwareVersion,regionId. Now if we throw in another field to this document, number of combinations increase exponentially.I have more than a billion such combination rows already. I am hoping to find advice by experts if 1. Multiple such fields can be fit in same document for different 'ts' such that range queries are possible on it. 2. time range (ts) can be fit in same document as a list(?) (to reduce number of rows). I know multivalued fields don't support complex data types, but if anything else can be done with the data/schema to reduce query time and number of rows. The number of these rows are very large, for sure more than 1billion (if we go with the schema I was suggesting). What schema would you suggest for this that'll fit query requirements? FYI: All queries will be exact match on fields (no partial or tokenized), so no analysis on fields is necessary. And almost all queries are range queries. Thanks, KP
Re: Solr schema design: fitting time-series data
bq: I know multivalued fields don't support complex data types Not sure what you're talking about here. mulitValued actually has nothing to do with data types. You can have text fields which are analyzed and produce multiple tokens and are multiValued. You can have primitive types (string, int/long/float/double, boolean etc) that are multivalued. or they can be single valued. All "multiValued" means is that the _input_ can have the same field repeated, i.e. some stuff more stuff 77 This doc would fail of mytext or myint were multiValued=false but succeed if multiValued=true at index time. There are some subtleties with text (analyzed) multivalued fields having to do with token offsets, but that's not germane. Does that change your problem? Your document could have a dozen timestamps However, there isn't a good way to query across multiple multivalued fields in parallel. That is, a doc like myint=1 myint=2 myint=3 mylong=4 mylong=5 mylong=6 there's no good way to say "only match this document if mhyint=1 AND mylong=4 AND they_are_both_in_the_same_position. That is, asking for myint=1 AND mylong=6 would match the above. Is that what you're wondering about? -- I expect you're really asking to do the second above, in which case you might want to look at StreamingExpressions and/or ParallelSQL in Solr 6.x Best, Erick On Sun, Jan 15, 2017 at 7:31 PM, map reduced wrote: > Hi, > > I am trying to fit the following data in Solr to support flexible queries > and would like to get your input on the same. I have data about users say: > > contentID (assume uuid), > platform (eg. website, mobile etc), > softwareVersion (eg. sw1.1, sw2.5, ..etc), > regionId (eg. us144, uk123, etc..) > > > and few more other such fields. This data is partially pre aggregated (read > Hadoop jobs): so let’s assume for "contentID = uuid123 and platform = > mobile and softwareVersion = sw1.2 and regionId = ANY" I have data in > format: > > timestamp pre-aggregated data [ uniques, total] > Jan 15[ 12, 4] > Jan 14[ 4, 3] > Jan 13[ 8, 7] > ...... > > And then I also have less granular data say "contentID = uuid123 and > platform = mobile and softwareVersion = ANY and regionId = ANY (These > values will be more than above table since granularity is reduced) > > timestamp : pre-aggregated data [uniques, total] > Jan 15[ 100, 40] > Jan 14[ 45, 30] > ... ... > > I'll get queries like "contentID = uuid123 and platform = mobile" , give > sum of 'uniques' for Jan15 - Jan13 or for "contentID=uuid123 and > platform=mobile and softwareVersion=sw1.2", give sum of 'total' for Jan15 - > Jan01. > > I was thinking of simple schema where documents will be like (first example > above): > > { > "contentID": "uuid12349789", > "platform" : "mobile", > "softwareVersion": "sw1.2", > "regionId": "ANY", > "ts" : "2017-01-15T01:01:21Z", > "unique": 12, > "total": 4 > } > > second example from above: > > { > "contentID": "uuid12349789", > "platform" : "mobile", > "softwareVersion": "ANY", > "regionId": "ANY", > "ts" : "2017-01-15T01:01:21Z", > "unique": 100, > "total": 40 > } > > Possible optimization: > > { > "contentID": "uuid12349789", > "platform.mobile.softwareVersion.sw1.2.region.us12" : { > "unique": 12, > "total": 4 > }, > "platform.mobile.softwareVersion.sw1.2.region.ANY" : { > "unique": 100, > "total": 40 > }, > "ts" : "2017-01-15T01:01:21Z" > } > > Challenges: Number of such rows is very large and it'll grow exponentially > with every new field - For instance if I go with above suggested schema, > I'll end up storing a new document for each combination of > contentID,platform,softwareVersion,regionId. Now if we throw in another > field to this document, number of combinations increase exponentially.I > have more than a billion such combination rows already. > > I am hoping to find advice by experts if > >1. Multiple such fields can be fit in same document for different 'ts' >such that range queries are possible on it. >2. time range (ts) can be fit in same document as a list(?) (to reduce >number of rows). I know multivalued fields don't support complex data >types, but if anything else can be done with the data/schema to reduce >query time and number of rows. > > The number of these rows are very large, for sure more than 1billion (if we > go with the schema I was suggesting). What schema would you suggest for > this that'll fit query requirements? > > FYI: All queries will be exact match on fields (no partial or tokenized), > so no analysis on fields is necessary. And almost all queries are range > queries. > > Thanks, > > KP
SOLR Installation / Configuration related
Hi, I have a standalone installation of solr 5.3.1 Recently I have started facing an issue, whenever the Garbage collector kicks in, and at that time if there is a request to solr, Solr (http) responds back with status 0 and the service is not served, it gets served after few seconds. The PHP library catches it as Exception of type Apache_Solr_HttpTransportException occurred with Message: '0' Status: Communication Error in File ..libraries/Solr/Service.php at Line ... Any suggestion / ideas to avoid this disruption of service ? Regards, Prasanna.
Re: Solr schema design: fitting time-series data
I may have used wrong terminology, by complex types I meant non-primitive types. Mutlivalued can be conceptualized as a list of values for instance in your example myint = [ 32, 77] etc which you can possibly analyze and query upon. What I was trying to ask is if a complex type can be multi-valued or something along those lines that can be supported by range queries. For instance: Below rows will have to be individual docs in Solr (in my knowledge) - If I want to range query from ts=Jan 12 to ts=Jan 15 give me sum of 'unique' where 'contentId=1,product=mobile' contentId=1,product=mobilets=Jan15 total=12,unique=5 contentId=1,product=mobilets=Jan14 total=10,unique=3 contentId=1,product=mobilets=Jan13 total=15,unique=2 contentId=1,product=mobilets=Jan12 total=17,unique=4 .. This increases number of documents in Solr by a lot. Only if there was a way to do something like: { contentId=1 product=mobile ts = [ { time = Jan15 total = 12 unique = 15 }, { time = Jan16 total = 10 unique = 3 }, .. .. ]} Of course above isn't allowed, but some way to squeeze timestamps in single document so that it doesn't increase the number of document by a lot and I am still able to range query on 'ts'. For some (combination of fields) rows the timestamps may go upto last 3-6 months! Let me know if I am still being unclear. On Sun, Jan 15, 2017 at 8:04 PM, Erick Erickson wrote: > bq: I know multivalued fields don't support complex data types > > Not sure what you're talking about here. mulitValued actually has > nothing to do with data types. You can have text fields which > are analyzed and produce multiple tokens and are multiValued. > You can have primitive types (string, int/long/float/double, > boolean etc) that are multivalued. or they can be single valued. > > All "multiValued" means is that the _input_ can have the same field > repeated, i.e. > > some stuff > more stuff > 77 > > > This doc would fail of mytext or myint were multiValued=false but > succeed if multiValued=true at index time. > > There are some subtleties with text (analyzed) multivalued fields having > to do with token offsets, but that's not germane. > > Does that change your problem? Your document could have a dozen > timestamps > > However, there isn't a good way to query across multiple multivalued fields > in parallel. That is, a doc like > > myint=1 > myint=2 > myint=3 > mylong=4 > mylong=5 > mylong=6 > > there's no good way to say "only match this document if mhyint=1 AND > mylong=4 AND they_are_both_in_the_same_position. > > That is, asking for myint=1 AND mylong=6 would match the above. Is > that what you're > wondering about? > > -- > I expect you're really asking to do the second above, in which case you > might > want to look at StreamingExpressions and/or ParallelSQL in Solr 6.x > > Best, > Erick > > On Sun, Jan 15, 2017 at 7:31 PM, map reduced wrote: > > Hi, > > > > I am trying to fit the following data in Solr to support flexible queries > > and would like to get your input on the same. I have data about users > say: > > > > contentID (assume uuid), > > platform (eg. website, mobile etc), > > softwareVersion (eg. sw1.1, sw2.5, ..etc), > > regionId (eg. us144, uk123, etc..) > > > > > > and few more other such fields. This data is partially pre aggregated > (read > > Hadoop jobs): so let’s assume for "contentID = uuid123 and platform = > > mobile and softwareVersion = sw1.2 and regionId = ANY" I have data in > > format: > > > > timestamp pre-aggregated data [ uniques, total] > > Jan 15[ 12, 4] > > Jan 14[ 4, 3] > > Jan 13[ 8, 7] > > ...... > > > > And then I also have less granular data say "contentID = uuid123 and > > platform = mobile and softwareVersion = ANY and regionId = ANY (These > > values will be more than above table since granularity is reduced) > > > > timestamp : pre-aggregated data [uniques, total] > > Jan 15[ 100, 40] > > Jan 14[ 45, 30] > > ... ... > > > > I'll get queries like "contentID = uuid123 and platform = mobile" , give > > sum of 'uniques' for Jan15 - Jan13 or for "contentID=uuid123 and > > platform=mobile and softwareVersion=sw1.2", give sum of 'total' for > Jan15 - > > Jan01. > > > > I was thinking of simple schema where documents will be like (first > example > > above): > > > > { > > "contentID": "uuid12349789", > > "platform" : "mobile", > > "softwareVersion": "sw1.2", > > "regionId": "ANY", > > "ts" : "2017-01-15T01:01:21Z", > > "unique": 12, > > "total": 4 > > } > > > > second example from above: > > > > { > > "contentID": "uuid12349789", > > "platform" : "mobile", > > "softwareVersion": "ANY", > > "regionId": "ANY", > > "ts" : "2017-01-15T01:01:21Z", > > "unique": 100, > > "total": 40 > > } > > > > Possible optimization: > > > > { > > "contentID": "uuid12349789", > > "platform.mobile.softwareVersion.sw1.2.region.us12" : { > > "unique": 12, >