multivalued coordinate for geospatial search
Hello solr users! I am trying to use geospatial to do some basic distance search in Solr4.10 At the moment, I got it working if I have just on set of coordinate (latitude,longitude) per document. However, I need to get it to work when I have an unknown numbers of set of coordinates per document: the document should be returned if any of its coordinates is within the distance threshold of a given coordinate. Below is how it is working when I have just one set of coordinate per document. The reason why I am using the copyField is because the latitude and longitude are provided in separate fields, not in the "lat,lon" format. So far, all my attempts to use multivalued failed, and I would greatly appreciate some help. Thanks! Chris
Conditional atomic update
(Resending because DMARC-compliant ESPs bounced the previous version) � I'm looking for a way to do an atomic update, but if a certain condition exists on the existing document, abort the update. � Each document has the fields id, count, and value.� The source data has just id and value. � When the source data is indexed, I use atomic updates to: - Increment the count value in the existing document - Add the source value to the existing document's value � What I'd like to do is abort the update if the existing document has a count of 5.� Is there a way to do this with a custom update processor?
JSON facet API: exclusive lower bound, inclusive upper bound
The docs for the JSON facet API tell us that the default ranges are inclusive of the lower bounds and exclusive of the upper bounds. �I'd like to do the opposite (exclusive lower, inclusive upper), but I can't figure out how to combine the 'include' parameters to make it work. � �
Limiting by range of sum across documents
I have documents in solr that look like this: { � "id": "acme-1", � "manufacturer": "acme", � "product_name": "Foo", � "price": 3.4 } � There are about 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.�� I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer.� In other words, for each manufacturer, sum the prices of all products for that manufacturer, and return the sum and the manufacturer name.� For example: [ � { � � "manufacturer": "acme", � � "sum": 150.5 � }, � { � � "manufacturer": "Johnson, Inc.", � � "sum": 167.0 � }, ... ] � I tried this: q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer which "works" on a test subset of 1,000 manufacturers.� However, there are two problems: 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want. 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit our target response time � How can I perform this query? We're using solr version 5.5.5. � � Thanks, Chris �
Re: Limiting by range of sum across documents
� Hi Emir, I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�� � I tried the JSON facet you suggested but received "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] � }, � "facet_counts":{ � � "facet_queries":{}, � � "facet_fields":{}, � � "facet_dates":{}, � � "facet_ranges":{}, � � "facet_intervals":{}, � � "facet_heatmaps":{}}, � "facets":{ � � "count":0}} � � > Hi Chris, > You mention it returns all manufacturers? Even after you apply filters > (don’t see filter in your example)? You can control how many facets are > returned with facet.limit and you can use face.pivot.mincount to determine > how many facets are returned. If you calculate sum on all manufacturers, it can last. > > Maybe you can try json faceting. Something like (url style): > > …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)” > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote: >> >> >> >> >> I have documents in solr that look like this: >> { >> "id": "acme-1", >> "manufacturer": "acme", >> "product_name": "Foo", >> "price": 3.4 >> } >> >> There are about >> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 >> products. >> I'd like to return the sum of all prices that are in the range [100, 200], >> faceted by manufacturer. In other words, for each manufacturer, sum the >> prices of all products for that manufacturer, >> and return the sum and the manufacturer name. For example: >> [ >> { >> "manufacturer": "acme", >> "sum": 150.5 >> }, >> { >> "manufacturer": "Johnson, >> Inc.", >> "sum": 167.0 >> }, >> ... >> ] >> >> I tried this: >> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 >> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer >> which "works" on a test >> subset of 1,000 manufacturers. However, there are two problems: >> 1) This query returns all the manufacturers, so I have to iterate over the >> entire response object to extract the ones I want. >> 2) The query on the whole data set takes more than 600 seconds to return, >> which doesn't fit >> our target response time >> >> How can I perform this query? >> We're using solr version 5.5.5. >> >> >> >> Thanks, >> Chris >> > >
Re: Limiting by range of sum across documents
I'm not looking for products where the price is in the range [100, 200]. I'm looking for manufacturers for which the sum of the prices of all of their products is in the range [100, 200]. � > Hi Chris, > > I assumed that you apply some sort of fq=price:[100 TO 200] to focus on > wanted products. > > Can you share full json faceting request - numFound:0 suggest that something > is completely wrong. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 13 Nov 2017, at 21:56, ch...@yeeplusplus.com wrote: >> >> >> >> >> � >> Hi Emir, >> I can't apply filters to the original query because I don't know in advance >> which filters will meet the criterion I'm looking for.� Unless I'm missing >> something obvious.� >> � >> I tried the JSON facet you suggested but received >> >> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] >> >> � }, >> >> � "facet_counts":{ >> >> � � "facet_queries":{}, >> >> � � "facet_fields":{}, >> >> � � "facet_dates":{}, >> >> � � "facet_ranges":{}, >> >> � � "facet_intervals":{}, >> >> � � "facet_heatmaps":{}}, >> >> � "facets":{ >> >> � � "count":0}} >> >> � >> >> � >> >> >>> Hi Chris, >> >>> You mention it returns all manufacturers? Even after you apply filters >>> (don’t see filter in your example)? You can control how many facets >>> are returned with facet.limit and you can use face.pivot.mincount to >>> determine how many facets are returned. If you calculate sum on all >> manufacturers, it can last. >>> >> >>> Maybe you can try json faceting. Something like (url style): >> >>> >> >>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)” >> >>> >> >>> HTH, >> >>> Emir >> >>> -- >> >>> Monitoring - Log Management - Alerting - Anomaly Detection >> >>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >>> >> >>> >> >>> >> >>>> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote: >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> I have documents in solr that look like this: >> >>>> { >> >>>> "id": "acme-1", >> >>>> "manufacturer": "acme", >> >>>> "product_name": "Foo", >> >>>> "price": 3.4 >> >>>> } >> >>>> >> >>>> There are about >> >>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 >>>> products. >> >>>> I'd like to return the sum of all prices that are in the range [100, 200], >>>> faceted by manufacturer. In other words, for each manufacturer, sum the >>>> prices of all products for that manufacturer, >> >>>> and return the sum and the manufacturer name. For example: >> >>>> [ >> >>>> { >> >>>> "manufacturer": "acme", >> >>>> "sum": 150.5 >> >>>> }, >> >>>> { >> >>>> "manufacturer": "Johnson, >> >>>> Inc.", >> >>>> "sum": 167.0 >> >>>> }, >> >>>> ... >> >>>> ] >> >>>> >> >>>> I tried this: >> >>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 >>>> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer >> >>>> which "works" on a test >> >>>> subset of 1,000 manufacturers. However, there are two problems: >> >>>> 1) This query returns all the manufacturers, so I have to iterate over the >>>> entire response object to extract the ones I want. >> >>>> 2) The query on the whole data set takes more than 600 seconds to return, >>>> which doesn't fit >> >>>> our target response time >> >>>> >> >>>> How can I perform this query? >> >>>> We're using solr version 5.5.5. >> >>>> >> >>>> >> >>>> >> >>>> Thanks, >> >>>> Chris >> >>>> >> >>> >> >>> > >
Re: Limiting by range of sum across documents
Emir, It certainly seems like I'll need to use streaming expressions. Thanks for your help! Chris > Hi Chris, > I misunderstood your requirement. I am not aware of some facet result > filtering feature. What you could do is sort facet results by sum and load > page by page but that does not sound like a good solution. Did you try using > streaming expressions - I don’t have much experience with this feature so would have to play a bit before giving answer if possible and how to do it, but I guess someone will be able to give some pointers. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 14 Nov 2017, at 16:51, ch...@yeeplusplus.com wrote: >> >> >> >> >> I'm not looking for products where the price is in the range [100, 200]. >> I'm looking for manufacturers for which the sum of the prices of all of >> their products is in the range [100, 200]. >> � >> >> >>> Hi Chris, >> >>> >> >>> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on >>> wanted products. >> >>> >> >>> Can you share full json faceting request - numFound:0 suggest that >>> something is completely wrong. >> >>> >> >>> Thanks, >> >>> Emir >> >>> -- >> >>> Monitoring - Log Management - Alerting - Anomaly Detection >> >>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >>> >> >>> >> >>> >> >>>> On 13 Nov 2017, at 21:56, ch...@yeeplusplus.com wrote: >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> � >> >>>> Hi Emir, >> >>>> I can't apply filters to the original query because I don't know in >>>> advance which filters will meet the criterion I'm looking for.� Unless I'm >>>> missing something obvious.� >> >>>> � >> >>>> I tried the JSON facet you suggested but received >> >>>> >> >>>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] >> >>>> >> >>>> � }, >> >>>> >> >>>> � "facet_counts":{ >> >>>> >> >>>> � � "facet_queries":{}, >> >>>> >> >>>> � � "facet_fields":{}, >> >>>> >> >>>> � � "facet_dates":{}, >> >>>> >> >>>> � � "facet_ranges":{}, >> >>>> >> >>>> � � "facet_intervals":{}, >> >>>> >> >>>> � � "facet_heatmaps":{}}, >> >>>> >> >>>> � "facets":{ >> >>>> >> >>>> � � "count":0}} >> >>>> >> >>>> � >> >>>> >> >>>> � >> >>>> >> >>>> >> >>>>> Hi Chris, >> >>>> >> >>>>> You mention it returns all manufacturers? Even after you apply filters >>>>> (don’t see filter in your example)? You can control how many facets >>>>> are returned with facet.limit and you can use face.pivot.mincount to >>>>> determine how many facets are returned. If you calculate sum on >> all >>>> manufacturers, it can last. >> >>>>> >> >>>> >> >>>>> Maybe you can try json faceting. Something like (url style): >> >>>> >> >>>>> >> >>>> >> >>>>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)” >> >>>> >> >>>>> >> >>>> >> >>>>> HTH, >> >>>> >> >>>>> Emir >> >>>> >> >>>>> -- >> >>>> >> >>>>> Monitoring - Log Management - Alerting - Anomaly Detection >> >>>> >> >>>>> Solr & Elasticsearch Consulting S
joining across sharded collection
I'm trying to figure out how to structure this query. I have two types of documents: items and sources. Previously, they were all in the same collection. I'm now testing a cluster with separate collections. The items collection has 38,034,895,527 documents, and the sources collection has 417,618,443 documents. I have all of the documents in the same collection in a solr cluster running version 6.0.1, with 100 shards and replication factor 1. The following query works as expected: q=type:source&fq={!join from=source_id to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1 In the source documents, the source_id identifies the source. In the items documents, the source_id identifies the unique source document related to it. There is a 1:many relationship between sources and items. The above query gets the sources that are associated with items that have item_category "abc", and then facets on the sources' source_factory field. Now, I'm testing a separate cluster that has the same data, but organized into two collections: items and sources. In order to do the same query, I have to use a cross-collection join, which requires the FROM collection to be unsharded. However, in this case, the FROM collection is the items collection, which due to its size cannot be unsharded. I'm hoping there's an easy way to restructure my data / query to accomplish the faceting I need. The data set is static so can be re-indexed and reconfigured as needed. It's also not under any load yet.
Re: joining across sharded collection
Hi Erick, No, we have not yet looked at the streaming functionality.� But we've started to explore it, so we'll look at that. I briefly considered denormalizing the data but the sources documents have ~200 fields so it seems to me that the index size would explode.� (The items documents have 65 fields) Thank you for your help. � Chris � Original Message Subject: Re: joining across sharded collection From: "Erick Erickson" Date: Sat, December 9, 2017 10:16 pm To: "solr-user" -- > Have you looked at the streaming functionality (StreamingExpressions > and ParllelSQL in particular)? While it has some restrictions, it > easily handles cross-collection joins. It's generally intended for > analytic-type queries, but at your scale that may be what you need. > > At that scale denoramlizing the data doesn't seem feasible > > Best, > Erick > > On Sat, Dec 9, 2017 at 6:02 PM, wrote: >> >> >> I'm trying to figure out how to structure this query. >> >> I have two types of documents: items and sources. Previously, they were all >> in the same collection. I'm now testing a cluster with separate collections. >> >> The items collection has 38,034,895,527 documents, and the sources >> collection has 417,618,443 documents. >> >> I have all of the documents in the same collection in a solr cluster running >> version 6.0.1, with 100 shards and replication factor 1. >> >> The following query works as expected: >> >> q=type:source&fq={!join from=source_id >> to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 >> count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1 >> >> In the source documents, the source_id identifies the source. In the items >> documents, the source_id identifies the unique source document related to >> it. There is a 1:many relationship between sources and items. >> >> The above query gets the sources that are associated with items that have >> item_category "abc", and then facets on the sources' source_factory field. >> >> >> Now, I'm testing a separate cluster that has the same data, but organized >> into two collections: items and sources. >> >> In order to do the same query, I have to use a cross-collection join, which >> requires the FROM collection to be unsharded. However, in this case, the >> FROM collection is the items collection, which due to its size cannot be >> unsharded. >> >> I'm hoping there's an easy way to restructure my data / query to accomplish >> the faceting I need. >> >> The data set is static so can be re-indexed and reconfigured as needed. It's >> also not under any load yet. >> >
Regarding Solr Cloud issue...
Hi, I am using solr 4.4 as cloud. while creating shards i see that the last shard has range of "null". i am not sure if this is a bug. I am stuck with having null value for the range in clusterstate.json (attached below) "shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{ "state":"active", "core":"Web_shard5_replica1", "node_name":"domain-name.com:1981_solr", "base_url":" http://domain-name.com:1981/solr";, "leader":"true", "router":"compositeId"}, I tried to use zookeeper cli to change this, but it was not able to. I tried to locate this file, but didn't find it anywhere. Can you please let me know how do i change the range from null to something meaningful? i have the range that i need, so if i can find the file, maybe i can change it manually. My next question is - can we have a catch all for ranges, i mean if things don't match any other range then insert in this shard..is this possible? Kindly advice. Chris
Regarding Solr Cloud issue...
Hi, I am using solr 4.4 as cloud. while creating shards i see that the last shard has range of "null". i am not sure if this is a bug. I am stuck with having null value for the range in clusterstate.json (attached below) "shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{ "state":"active", "core":"Web_shard5_replica1", "node_name":"domain-name.com:1981_solr", "base_url":" http://domain-name.com:1981/solr";, "leader":"true", "router":"compositeId"}, I tried to use zookeeper cli to change this, but it was not able to. I tried to locate this file, but didn't find it anywhere. Can you please let me know how do i change the range from null to something meaningful? i have the range that i need, so if i can find the file, maybe i can change it manually. My next question is - can we have a catch all for ranges, i mean if things don't match any other range then insert in this shard..is this possible? Kindly advice. Chris
Re: Regarding Solr Cloud issue...
Hi Shalin,. Thank you for your quick reply. I appreciate all the help. I started the solr cloud servers first...with 5 nodes. then i issued a command like below to create the shards - http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=1<http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4> Please advice. Regards, Chris On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > How did you create these shards? Can you tell us how to reproduce the > issue? > > Any shard in a collection with compositeId router should never have null > ranges. > > > On Tue, Oct 15, 2013 at 7:07 PM, Chris wrote: > > > Hi, > > > > I am using solr 4.4 as cloud. while creating shards i see that the last > > shard has range of "null". i am not sure if this is a bug. > > > > I am stuck with having null value for the range in clusterstate.json > > (attached below) > > > > "shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{ > > "state":"active", "core":"Web_shard5_replica1", > > "node_name":"domain-name.com:1981_solr", "base_url":" > > http://domain-name.com:1981/solr";, "leader":"true", > > "router":"compositeId"}, > > > > I tried to use zookeeper cli to change this, but it was not able to. I > > tried to locate this file, but didn't find it anywhere. > > > > Can you please let me know how do i change the range from null to > something > > meaningful? i have the range that i need, so if i can find the file, > maybe > > i can change it manually. > > > > My next question is - can we have a catch all for ranges, i mean if > things > > don't match any other range then insert in this shard..is this possible? > > > > Kindly advice. > > Chris > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Regarding Solr Cloud issue...
e":"8000-b332", "state":"active", "replicas":{ "core_node4":{ "state":"active", "core":"web_shard1_replica2", "node_name":"64.251.14.47:1983_solr", "base_url":"http://64.251.14.47:1983/solr"}, "core_node2":{ "state":"active", "core":"web_shard1_replica1", "node_name":"64.251.14.47:1981_solr", "base_url":"http://64.251.14.47:1981/solr";, "leader":"true"}}}, "shard2":{ "range":"b333-e665", "state":"active", "replicas":{ "core_node6":{ "state":"active", "core":"web_shard2_replica1", "node_name":"64.251.14.47:1982_solr", "base_url":"http://64.251.14.47:1982/solr"}, "core_node7":{ "state":"active", "core":"web_shard2_replica2", "node_name":"64.251.14.47:1984_solr", "base_url":"http://64.251.14.47:1984/solr";, "leader":"true"}}}, "shard3":{ "range":"e666-1998", "state":"active", "replicas":{ "core_node9":{ "state":"active", "core":"web_shard3_replica1", "node_name":"64.251.14.47:1985_solr", "base_url":"http://64.251.14.47:1985/solr"}, "core_node1":{ "state":"active", "core":"web_shard3_replica2", "node_name":"64.251.14.47:1981_solr", "base_url":"http://64.251.14.47:1981/solr";, "leader":"true"}}}, "shard4":{ "range":"1999-4ccb", "state":"active", "replicas":{ "core_node3":{ "state":"active", "core":"web_shard4_replica2", "node_name":"64.251.14.47:1982_solr", "base_url":"http://64.251.14.47:1982/solr"}, "core_node5":{ "state":"active", "core":"web_shard4_replica1", "node_name":"64.251.14.47:1983_solr", "base_url":"http://64.251.14.47:1983/solr";, "leader":"true"}}}, "shard5":{ "range":"4ccc-7fff", "state":"active", "replicas":{ "core_node8":{ "state":"active", "core":"web_shard5_replica1", "node_name":"64.251.14.47:1984_solr", "base_url":"http://64.251.14.47:1984/solr"}, "core_node10":{ "state":"active", "core":"web_shard5_replica2", "node_name":"64.251.14.47:1985_solr", "base_url":"http://64.251.14.47:1985/solr";, "leader":"true", "router":"compositeId"}, "News":{ "shards":{ "shard1":{ "range":"8000-b332", "state":"active", "replicas":{"core_node1":{ "state":"active", "core":"News_shard1_replica1", "node_name":"64.251.14.47:1984_solr", "base_url":"http://64.251.14.47:1984/solr";, "leader":"true"}}}, "shard2":{ "range":"b333-e665", "state":"active", "replicas":{"core_node3":{ "state":"active", "core":"News_shard2_replica1", "node_name":"64.251.14.47:1983_solr", "base_url":"http://64.251.14.47:1983/solr&
Re: Regarding Solr Cloud issue...
oops, the actual url is -http://64.251.14.47:1981/solr/ Also, another issue that needs to be raised is the creation of cores from the "core admin" section of the gui, doesnt really work well, it creates files but then they do not work (again i am using 4.4) On Wed, Oct 16, 2013 at 4:12 PM, Chris wrote: > Hi, > > Please find the clusterstate.json as below: > > I have created a dev environment on one of my servers so that you can see > the issue live - http://64.251.14.47:1984/solr/ > > Also, There seems to be something wrong in zookeeper, when we try to add > documents using solrj, it works fine as long as load of insert is not much, > but once we start doing many inserts, then it throws a lot of errors... > > I am doing something like - > > CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); > solrCoreCloud.setDefaultCollection("Image"); > UpdateResponse up = solrCoreCloud.addBean(resultItem); > UpdateResponse upr = solrCoreCloud.commit(); > > > > clusterstate.json --- > > { > "collection1":{ > "shards":{ > "shard2":{ > "range":"b333-e665", > "state":"active", > "replicas":{"core_node4":{ > "state":"active", > "core":"collection1", > "node_name":"64.251.14.47:1984_solr", > "base_url":"http://64.251.14.47:1984/solr";, > "leader":"true"}}}, > "shard3":{ > "range":"e666-1998", > "state":"active", > "replicas":{"core_node5":{ > "state":"active", > "core":"collection1", > "node_name":"64.251.14.47:1985_solr", > "base_url":"http://64.251.14.47:1985/solr";, > "leader":"true"}}}, > "shard4":{ > "range":"1999-4ccb", > "state":"active", > "replicas":{ > "core_node2":{ > "state":"active", > "core":"collection1", > "node_name":"64.251.14.47:1982_solr", > "base_url":"http://64.251.14.47:1982/solr"}, > "core_node6":{ > "state":"active", > "core":"collection1", > "node_name":"64.251.14.47:1981_solr", > "base_url":"http://64.251.14.47:1981/solr";, > "leader":"true"}}}, > "shard5":{ > "range":"4ccc-7fff", > "state":"active", > "replicas":{"core_node3":{ > "state":"active", > "core":"collection1", > "node_name":"64.251.14.47:1983_solr", > "base_url":"http://64.251.14.47:1983/solr";, > "leader":"true", > "router":"compositeId"}, > "Web":{ > "shards":{ > "shard1":{ > "range":"8000-b332", > "state":"active", > "replicas":{"core_node2":{ > "state":"active", > "core":"Web_shard1_replica1", > "node_name":"64.251.14.47:1983_solr", > "base_url":"http://64.251.14.47:1983/solr";, > "leader":"true"}}}, > "shard2":{ > "range":"b333-e665", > "state":"active", > "replicas":{"core_node3":{ > "state":"active", > "core":"Web_shard2_replica1", > "node_name":"64.251.14.47:1984_solr", > "base_url":"http://64.251.14.47:1984/solr";, > "leader":"true"}}}, > "shard3":{ > "range":"e666-1998", > "state"
Re: Regarding Solr Cloud issue...
Also, is there any easy way upgrading to 4.5 without having to change most of my plugins & configuration files? On Wed, Oct 16, 2013 at 4:18 PM, Chris wrote: > oops, the actual url is -http://64.251.14.47:1981/solr/ > > Also, another issue that needs to be raised is the creation of cores from > the "core admin" section of the gui, doesnt really work well, it creates > files but then they do not work (again i am using 4.4) > > > On Wed, Oct 16, 2013 at 4:12 PM, Chris wrote: > >> Hi, >> >> Please find the clusterstate.json as below: >> >> I have created a dev environment on one of my servers so that you can see >> the issue live - http://64.251.14.47:1984/solr/ >> >> Also, There seems to be something wrong in zookeeper, when we try to add >> documents using solrj, it works fine as long as load of insert is not much, >> but once we start doing many inserts, then it throws a lot of errors... >> >> I am doing something like - >> >> CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); >> solrCoreCloud.setDefaultCollection("Image"); >> UpdateResponse up = solrCoreCloud.addBean(resultItem); >> UpdateResponse upr = solrCoreCloud.commit(); >> >> >> >> clusterstate.json --- >> >> { >> "collection1":{ >> "shards":{ >> "shard2":{ >> "range":"b333-e665", >> "state":"active", >> "replicas":{"core_node4":{ >> "state":"active", >> "core":"collection1", >> "node_name":"64.251.14.47:1984_solr", >> "base_url":"http://64.251.14.47:1984/solr";, >> "leader":"true"}}}, >> "shard3":{ >> "range":"e666-1998", >> "state":"active", >> "replicas":{"core_node5":{ >> "state":"active", >> "core":"collection1", >> "node_name":"64.251.14.47:1985_solr", >> "base_url":"http://64.251.14.47:1985/solr";, >> "leader":"true"}}}, >> "shard4":{ >> "range":"1999-4ccb", >> "state":"active", >> "replicas":{ >> "core_node2":{ >> "state":"active", >> "core":"collection1", >> "node_name":"64.251.14.47:1982_solr", >> "base_url":"http://64.251.14.47:1982/solr"}, >> "core_node6":{ >> "state":"active", >> "core":"collection1", >> "node_name":"64.251.14.47:1981_solr", >> "base_url":"http://64.251.14.47:1981/solr";, >> "leader":"true"}}}, >> "shard5":{ >> "range":"4ccc-7fff", >> "state":"active", >> "replicas":{"core_node3":{ >> "state":"active", >> "core":"collection1", >> "node_name":"64.251.14.47:1983_solr", >> "base_url":"http://64.251.14.47:1983/solr";, >> "leader":"true", >> "router":"compositeId"}, >> "Web":{ >> "shards":{ >> "shard1":{ >> "range":"8000-b332", >> "state":"active", >> "replicas":{"core_node2":{ >> "state":"active", >> "core":"Web_shard1_replica1", >> "node_name":"64.251.14.47:1983_solr", >> "base_url":"http://64.251.14.47:1983/solr";, >> "leader":"true"}}}, >> "shard2":{ >> "range":"b333-e665", >> "state":"
Re: Regarding Solr Cloud issue...
oh great. Thanks Primoz. is there any simple way to do the upgrade to 4.5 without having to change my configurations? update a few jar files etc? On Wed, Oct 16, 2013 at 4:58 PM, wrote: > >>> Also, another issue that needs to be raised is the creation of cores > from > >>> the "core admin" section of the gui, doesnt really work well, it > creates > >>> files but then they do not work (again i am using 4.4) > > From my experience "core admin" section of the GUI does not work well in > SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0 > which acts much better. > > I would use only HTTP requests ("cores and collections API") with > SolrCloud and would use GUI only for viewing the state of cluster and > cores. > > Primoz > > >
Re: Regarding Solr Cloud issue...
very well, i will try the same, maybe an auto update tool should be also put on the line...just a thought ... On Wed, Oct 16, 2013 at 6:20 PM, wrote: > Hm, good question. I haven't really done any upgrading yet, because I just > reinstall and reindex everything. I would replace jars with the new ones > (if needed - check release notes for version 4.4.0 and 4.5.0 where all the > versions of external tools [tika, maven, etc.] are stated) and deploy the > updated WAR file to servlet container. > > Primoz > > > > > From: Chris > To: solr-user > Date: 16.10.2013 14:30 > Subject:Re: Regarding Solr Cloud issue... > > > > oh great. Thanks Primoz. > > is there any simple way to do the upgrade to 4.5 without having to change > my configurations? update a few jar files etc? > > > On Wed, Oct 16, 2013 at 4:58 PM, wrote: > > > >>> Also, another issue that needs to be raised is the creation of cores > > from > > >>> the "core admin" section of the gui, doesnt really work well, it > > creates > > >>> files but then they do not work (again i am using 4.4) > > > > From my experience "core admin" section of the GUI does not work well in > > SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0 > > which acts much better. > > > > I would use only HTTP requests ("cores and collections API") with > > SolrCloud and would use GUI only for viewing the state of cluster and > > cores. > > > > Primoz > > > > > > > >
Re: Regarding Solr Cloud issue...
Wow thanks for all that, i just upgraded, linked my plugins & it seems fine so far, but i have run into another issue while adding a document to the solr cloud it says - org.apache.solr.common.SolrException: Unknown document router '{name=compositeId}' in the clusterstate.json i can see - "shard5":{ "range":"4ccc-7fff", "state":"active", "replicas":{"core_node4":{ "state":"active", "base_url":"http://64.251.14.47:1984/solr";, "core":"web_shard5_replica1", "node_name":"64.251.14.47:1984_solr", "leader":"true", "maxShardsPerNode":"2", "router":{"name":"compositeId"}, "replicationFactor":"1"}, I am using this to add - CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); solrCoreCloud.setDefaultCollection("web"); UpdateResponse up = solrCoreCloud.addBean(resultItem); UpdateResponse upr = solrCoreCloud.commit(); Please advice. On Wed, Oct 16, 2013 at 9:49 PM, Shawn Heisey wrote: > On 10/16/2013 4:51 AM, Chris wrote: > > Also, is there any easy way upgrading to 4.5 without having to change > most > > of my plugins & configuration files? > > Upgrading is something that should be done carefully. If you can, it's > always recommended that you try it out on dev hardware with your real > index data beforehand, so you can deal with any problems that arise > without causing problems for your production cluster. Upgrading > SolrCloud is particularly tricky, because for a while you will be > running different versions on different machines in your cluster. > > If you're using your own custom software to go with Solr, or you're > using third-party plugins that aren't included in the Solr download, > upgrading might take more effort than usual. Also, if you are doing > anything in your config/schema that changes the format of the Lucene > index, you may find that it can't be upgraded without completely > rebuilding the index. Examples of this are changing the postings format > or docValues format. This is a very nasty complication with SolrCloud, > because those configurations affect the entire cluster. In that case, > the whole index may need to be rebuilt without custom formats before > upgrading is attempted. > > If you don't have any of the complications mentioned in the preceding > paragraph, upgrading is usually a very simple process: > > *) Shut down Solr. > *) Delete the extracted WAR file directory. > *) Replace solr.war with the new war from dist/ in the download. > **) Usually it must actually be named solr.war, which means renaming it. > *) Delete and replace other jars copied from the download. > *) Change luceneMatchVersion in all solrconfig.xml files. ** > *) Start Solr back up. > > ** With SolrCloud, you can't actually change the luceneMatchVersion > until all of your servers have been upgraded. > > A full reindex is strongly recommended. With SolrCloud, it normally > needs to wait until all servers are upgraded. In situations where it > won't work at all without a reindex, upgrading SolrCloud can be very > challenging. > > It's strongly recommended that you look over CHANGES.txt and compare the > new example config/schema with the example from the old version, to see > if there are any changes that you might want to incorporate into your > own config. As with luceneMatchVersion, if you're running SolrCloud, > those changes might need to wait until you're fully upgraded. > > Side note: When upgrading to a new minor version, config changes aren't > normally required. They will usually be required when upgrading major > versions, such as 3.x to 4.x. > > If you *do* have custom plugins that aren't included in the Solr > download, you may have to recompile them for the new version, or wait > for the vendor to create a new version before you upgrade. > > This is only the tip of the iceberg, but a lot of the rest of it depends > greatly on your configurations. > > Thanks, > Shawn > >
Re: Regarding Solr Cloud issue...
I am also trying with something like - java -Durl=http://domainname.com:1981/solr/web/update-Dtype=application/json -jar /solr4RA/example1/exampledocs/post.jar /root/Desktop/web/*.json but it is giving error - 19:06:22 ERROR SolrCore org.apache.solr.common.SolrException: Unknown command: subDomain [12] org.apache.solr.common.SolrException: Unknown command: subDomain [12] at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:152) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) On Thu, Oct 17, 2013 at 6:31 PM, Chris wrote: > Wow thanks for all that, i just upgraded, linked my plugins & it seems > fine so far, but i have run into another issue > > while adding a document to the solr cloud it says - > org.apache.solr.common.SolrException: Unknown document router > '{name=compositeId}' > > in the clusterstate.json i can see - > > "shard5":{ > "range":"4ccc-7fff", > "state":"active", > "replicas":{"core_node4":{ > "state":"active", > "base_url":"http://64.251.14.47:1984/solr";, > "core":"web_shard5_replica1", > "node_name":"64.251.14.47:1984_solr", > "leader":"true", > "maxShardsPerNode":"2", > "router":{"name":"compositeId"}, > "replicationFactor":"1"}, > > I am using this to add - > > > CloudSolrServer solrCoreCloud = new > CloudSolrServer(cloudURL); > > solrCoreCloud.setDefaultCollection("web"); >
Two easy questions...
Hi, I am new to solr & have two questions - 1. how do i get an excerpt for a huge content field (would love to show google like excerpts, where word searched for is highlighted) 2. If i have a field - A, is it possible to get top results with only unique values for this field in a page...? Thanks, Chris
Solr Index corrupted...
Hi, I am running solr 4.4 & one of my collections seems to have a corrupted index... I tried doing - java -cp lucene-core-4.4.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /solr2/example/solr/w1/data/index/ -fix But it didnt help...gives - ERROR: could not read any segments file in directory java.io.FileNotFoundException: /solr2/example/solr/w1/data/index/segments_hid (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:318) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:380) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:812) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:663) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:376) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:382) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1854) Please help. Chris
character encoding issue...
Hi All, I get characters like - �� - CTA - in the solr index. I am adding Java beans to solr by the addBean() function. This seems to be a character encoding issue. Any pointers on how to resolve this one? I have seen that this occurs mostly for japanese chinese characters.
Re: character encoding issue...
Hi Rajani, I followed the steps exactly as in http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ However, when i send a query to this new instance in tomcat, i again get the error - Scheduled Groups Maintenance In preparation for the new release roll-out, Diigo groups won’t be accessible on Sept 28 (Mon) around midnight 0:00 PST for several hours. Stay tuned to say hello to Diigo V4 soon! location of the text - http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/ same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/ All text in title comes like - - � - � Can you please advice? Chris On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski wrote: > Hi, > >If you are using Apache Tomcat Server, hope you are not missing the > below mentioned configuration: > > connectionTimeout=”2″ > redirectPort=”8443″ *URIEncoding=”UTF-8″*/> > > I had faced similar issue with Chinese Characters and had resolved with the > above config. > > Links for reference : > > http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ > > http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8 > > > Thanks > > > > On Tue, Oct 29, 2013 at 9:20 PM, Chris wrote: > > > Hi All, > > > > I get characters like - > > > > �� - CTA - > > > > in the solr index. I am adding Java beans to solr by the addBean() > > function. > > > > This seems to be a character encoding issue. Any pointers on how to > > resolve this one? > > > > I have seen that this occurs mostly for japanese chinese characters. > > >
Re: character encoding issue...
Sorry, was away a bit & hence the delay. I am inserting java strings into a java bean class, and then doing a addBean() method to insert the POJO into Solr. When i Query using either tomcat/jetty, I get these special characters. But I have noted, if I change output to - "Shift-JIS" encoding then those characters appear as some japanese characters I think. But then this solution doesn't work for all special characters as I can still see some of them...isn't there an encoding that can cover all the characters whatever they might be? Any ideas on what do i do? Regards, Chris On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson wrote: > The problem is there are about a dozen places where the character > encoding can be mis-configured. The problem you're seeing above > actually looks like a problem with the character set configured in > your browser, it may have nothing to do with what's actually in Solr. > > You might write small SolrJ program and see if you can dump the contents > in binary and examine to see... > > Best > Erick > > > On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski > wrote: > > > How are you extracting the text that is there in the website[1] you are > > referring to? Apache Nutch or any other crawler? If yes, initially check > > whether that crawler engine is giving you data in correct format before > you > > invoke solr index method. > > > > [1]http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/ > > > > URI encoding should resolve this problem. > > > > > > > > > > On Fri, Nov 1, 2013 at 10:50 AM, Chris wrote: > > > > > Hi Rajani, > > > > > > I followed the steps exactly as in > > > > > > > > > http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ > > > > > > However, when i send a query to this new instance in tomcat, i again > get > > > the error - > > > > > > Scheduled Groups Maintenance > > > In preparation for the new release roll-out, Diigo groups won’t be > > > accessible on Sept 28 (Mon) around midnight 0:00 PST for several > > > hours. > > > Stay tuned to say hello to Diigo V4 soon! > > > > > > location of the text - > > > http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/ > > > > > > same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/ > > > > > > All text in title comes like - > > > > > > - � > > > > > > > > > - > > > � > > > > > > > > > > > > Can you please advice? > > > > > > Chris > > > > > > > > > > > > > > > On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski > > >wrote: > > > > > > > Hi, > > > > > > > >If you are using Apache Tomcat Server, hope you are not missing > the > > > > below mentioned configuration: > > > > > > > > > > > connectionTimeout=”2″ > > > > redirectPort=”8443″ *URIEncoding=”UTF-8″*/> > > > > > > > > I had faced similar issue with Chinese Characters and had resolved > with > > > the > > > > above config. > > > > > > > > Links for reference : > > > > > > > > > > > > > > http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ > > > > > > > > > > > > > > http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8 > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > On Tue, Oct 29, 2013 at 9:20 PM, Chris wrote: > > > > > > > > > Hi All, > > > > > > > > > > I get characters like - > > > > > > > > > > �� - CTA - > > > > > > > > > > in the solr index. I am adding Java beans to solr by the addBean() > > > > > function. > > > > > > > > > > This seems to be a character encoding issue. Any pointers on how to > > > > > resolve this one? > > > > > > > > > > I have seen that this occurs mostly for japanese chinese > characters. > > > > > > > > > > > > > > >
Re: character encoding issue...
I tried a lot of things and almost am at my wit's end :( Here is the code I used to get the strings - String htmlContent = readPage(page.getWebURL().getURL()); I even tried - Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url); String htmlContent = doc.html(); & Document doc = Jsoup.parse(htmlContent,"UTF-8"); No improvement so far, any advice for me please? function that gets the html public static String readPage(String urlString) { try{ URL url = new URL(urlString); DefaultHttpClient client = new DefaultHttpClient(); client.getParams().setParameter(ClientPNames.COOKIE_POLICY, CookiePolicy.BROWSER_COMPATIBILITY); HttpGet request = new HttpGet(url.toURI()); HttpResponse response = client.execute(request); if(response.getStatusLine().getStatusCode() == 200 && response.getEntity().getContentType().toString().contains("text/html")) { Reader reader = null; try { reader = new InputStreamReader(response.getEntity().getContent()); StringBuffer sb = new StringBuffer(); { int read; char[] cbuf = new char[1024]; while ((read = reader.read(cbuf)) != -1) sb.append(cbuf, 0, read); } return sb.toString(); } finally { if (reader != null) { try { reader.close(); } catch (IOException e) { e.printStackTrace(); } } } } else return ""; }catch(Exception e){return "";} } --- On Wed, Nov 6, 2013 at 2:53 AM, T. Kuro Kurosaka wrote: > It sounds like the characters were mishandled at index build time. > I would use Luke to see if a character that appear correctly > when you change the output to be SHIFT JIS is actually > stored as one Unicode. I bet it's stored as two characters, > each having the character of the value that happened > to be high and low bytes of the SHIFT JIS character. > > There are many possible cause of this. If you are indexing > the HTML document from HTTP servers, HTTP server may > be configured to send wrong charset= info in Content-Type > header. If the document is directly from a file system, > and if the document doesn't have META header declaring > the charset, then the system assumes a default charset, > which is typically ISO-8859-1 or UTF-8, and misinterprets > SHIF-JIS encoded characters. > > You need to debug to find out where the characters > get corrupted. > > > On 11/04/2013 11:15 PM, Chris wrote: > >> Sorry, was away a bit & hence the delay. >> >> I am inserting java strings into a java bean class, and then doing a >> addBean() method to insert the POJO into Solr. >> >> When i Query using either tomcat/jetty, I get these special characters. >> But >> I have noted, if I change output to - "Shift-JIS" encoding then those >> characters appear as some japanese characters I think. >> >> But then this solution doesn't work for all special characters as I can >> still see some of them...isn't there an encoding that can cover all the >> characters whatever they might be? Any ideas on what do i do? >> >> Regards, >> Chris >> >> >> On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson >> wrote: >> >> The problem is there are about a dozen places where the character >>> encoding can be mis-configured. The problem you're seeing above >>> actually looks like a problem with the character set configured in >>> your browser, it may have nothing to do with what's actually in Solr. >>> >>> You might write small SolrJ program and see if you can dump the contents >>> in binary and examine to see... >>> >>> Best >>> Erick >>> >>> >>> On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski >>> wrote: >>> >>> How are you extracting the text that is there in the website[1] you are >>>> referring to? Apache Nutch or any other crawler? If yes, initially check >>>> whether that crawler engine is giving you data in correct format before >>>> >>> you >>&g
Query Relevancy tuning...
Hi Gurus, I have a relevancy ranking questrion - 1. I have fields - title, domain, domainrank in index. 2. I am looking to maybe load a txt file of prefered domains at solr startup & boost documents from those domains if keyword matches text in title or domain (if it exactly matches the domain, it should rank higher, than if it were a semi match) 3. Also, i would like to have 2-3 results per domain per page.(at the max) 4. Also, is it possible to do intersection - if all 4 words(say) matches it should rank higher than maybe 3 word match & so on.. I would like this to be as fast as possible, so kindly suggest an optimal way of doing this. a few things that were tried edismax fulltxt^0.5 title^2.0 domain^3 urlKeywords^1.5 anchorText^2.0 h1Keywords^1.5 text 100% *:* 10 *,score
Solr ranking query..
Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60 " + "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20 " + "OR (title:"+keyword+" AND domainRank:[10001 TO *] AND adultFlag:N)^2 " + "OR (fulltxt:"+keyword+") "); In case we have multiple words in keywords - "A B C D" then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq & other techniques to speed it up) Please advice. Document Structure in XML - www ncoah.com /links.html http://www.ncoah.com/links.html North Carolina Office of Administrative Hearings - Links North Carolina Office of Administrative Hearings - Links - http://www.ncoah.com/links.html"; title="Hearings">Hearings - http://www.ncoah.com/links.html"; title="Rules">Rules - http://www.ncoah.com/links.html"; title="Civil Rights">Civil Rights - http://www.ncoah.com/links.html"; title="Welcome">Welcome - http://www.ncoah.com/links.html"; title="General Information">General Information - http://www.ncoah.com/links.html"; title="Directions to OAH">Directions to OAH - http://www.ncoah.com/links.html"; title="Establishment of OAH">Establishment of OAH - http://www.ncoah.com/links.html"; title="G.S. 150B">G.S. 150B - http://www.ncoah.com/links.html"; title="Forms">Forms - http://www.ncoah.com/links.html"; title="Links">Links - http://www.nc.gov/"; title="Visit the North Carolina State web portal">Visit the North Carolina State web portal - http://ncinfo.iog.unc.edu/library/counties.html"; title="North Carolina Counties">North Carolina Counties - http://ncinfo.iog.unc.edu/library/cities.html"; title="North Carolina Cities & Towns">North Carolina Cities & Towns - http://www.nccourts.org/"; title="Administrative Office of the Courts">Administrative Office of the Courts - http://www.ncleg.net/"; title="North Carolina General Assembly">North Carolina General Assembly - http://www.doa.state.nc.us/"; title="Department of Administration">Department of Administration - http://www.ncagr.com/"; title="Department of Agriculture">Department of Agriculture - http://www.nccommerce.com"; title="Department of Commerce">Department of Commerce - http://www.doc.state.nc.us/"; title="Department of Correction">Department of Correction - http://www.nccrimecontrol.org/"; title="Department of Crime Control & Public Safety">Department of Crime Control & Public Safety - http://www.ncdcr.gov/"; title="Department of Cultural Resources">Department of Cultural Resources - http://www.ncdenr.gov/"; title="Department of Environment and Natural Resources">Department of Environment and Natural Resources - http://www.dhhs.state.nc.us"; title="Department of Health and Human Services">Department of Health and Human Services - http://www.ncdoi.com/"; title="Department of Insurance">Department of Insurance - http://www.ncdoj.com/"; title="Department of Justice">Department of Justice - http://www.juvjus.state.nc.us/"; title="Department of Juvenile Justice and Delinquency Prevention">Department of Juvenile Justice and Delinquency Prevention - http://www.nclabor.com/"; title="Department of Labor">Department of Labor - http://www.dpi.state.nc.us/"; title="Department of Public Instruction">Department of Public Instruction - http://www.dor.state.nc.us/"; title="Department of Revenue">Department of Revenue - http://www.treasurer.state.nc.us/"; title="Department of State Treasurer">Department of State Treasurer - http://www.ncdot.org/"; title="Department of Transportation">Department of Transportation - http://www.secstate.state.nc.us/"; title="Department of the Secretary of State">Department of the Secretary of State - http://www.osp.state.nc.us/"; title="Office of State Personnel">Office of State Personnel - http://www.governor.state.nc.us/"; title="Office of the Governor">Office of the Governor - http://www.ltgov.state.nc.us/"; title="Office of the Lt. Governor">Office of the Lt. Governor - http://www.ncauditor.net/"; title="Office of the State Auditor">Office of the State Auditor - http://www.osc.nc.gov/"; title="Office of the State Controller">Office of the State Controller - http://www.ncbar.org/"; title="North Carolina Bar Association">North Carolina Bar Association - http://www.ncbar.com/index.asp"; title="North Carolina State Bar">North Carolina State Bar - http://ncrules.state.nc.us/ncadministrativ_/default.htm"; title="North Carolina Administrative Code">North Carolina Administrative Code - http://www.ncoah.com/rules/register/"; title="North Carolina Register">North Carolina Register - http://www.g
Re: Solr ranking query..
Dear Varun, Thank you for your replies, I managed to get point 1 & 2 done, but for the boost query, I am unable to figure it out. Could you be kind enough to point me to an example or maybe advice a bit more on that one? Thanks for your help, Chris On Tue, Feb 4, 2014 at 3:14 PM, Varun Thacker wrote: > Hi Chris, > > I think what you are looking for could be solved using the eDismax query > parser. > > https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser > > 1. Your Query Fields ( qf ) would be - "urlKeywords^60 title^40 fulltxt^1" > 2. To check on adultFlag:N you could use &fq=adultFlag:N > 3. For Lowest Domain Rank within the same group to rank higher you could > use the "boost" parameter and use a recip ( > http://wiki.apache.org/solr/FunctionQuery#recip ) function query to > achieve > this. > > Hope this works for you > > > On Tue, Feb 4, 2014 at 12:19 PM, Chris wrote: > > > Hi, > > > > I have a document structure that looks like the below. I would like to > > implement something like - > > > > (urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60 > " > > + > > "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20 > " + > > "OR (title:"+keyword+" AND domainRank:[10001 TO *] AND adultFlag:N)^2 > " + > > "OR (fulltxt:"+keyword+") "); > > > > > > In case we have multiple words in keywords - "A B C D" then for the > > documents that have all the words should rank highest (Group1), then 3 > > words(Group2), then 2 words(Group 3) etc > > AND - Within each group (Group1, 2, 3) I would want the ones with the > > lowest domain rank value to rank higher (but within the group) > > > > How can i do this in a single query? and please advice on the fastest way > > possible, > > (open to implementing fq & other techniques to speed it up) > > > > Please advice. > > > > > > Document Structure in XML - > > > > > > www > > ncoah.com > > /links.html > > http://www.ncoah.com/links.html > > North Carolina Office of Administrative Hearings > > - Links > > > > North Carolina Office of Administrative Hearings - Links > > > > - > href="http://www.ncoah.com/links.html"; title="Hearings">Hearings > > - http://www.ncoah.com/links.html"; title="Rules">Rules - > > http://www.ncoah.com/links.html"; title="Civil Rights">Civil > > Rights - http://www.ncoah.com/links.html"; > > title="Welcome">Welcome - > href="http://www.ncoah.com/links.html"; title="General > > Information">General Information - > href="http://www.ncoah.com/links.html"; title="Directions to > > OAH">Directions to OAH - http://www.ncoah.com/links.html"; > > title="Establishment of OAH">Establishment of OAH - > href="http://www.ncoah.com/links.html"; title="G.S. 150B">G.S. > > 150B - http://www.ncoah.com/links.html"; > > title="Forms">Forms - http://www.ncoah.com/links.html"; > > title="Links">Links - http://www.nc.gov/"; title="Visit > > the North Carolina State web portal">Visit the North Carolina State > > web portal - > href="http://ncinfo.iog.unc.edu/library/counties.html"; title="North > > Carolina Counties">North Carolina Counties - > href="http://ncinfo.iog.unc.edu/library/cities.html"; title="North > > Carolina Cities & Towns">North Carolina Cities & Towns - > href="http://www.nccourts.org/"; title="Administrative Office of the > > Courts">Administrative Office of the Courts - > href="http://www.ncleg.net/"; title="North Carolina General > > Assembly">North Carolina General Assembly - > href="http://www.doa.state.nc.us/"; title="Department of > > Administration">Department of Administration - > href="http://www.ncagr.com/"; title="Department of > > Agriculture">Department of Agriculture - > href="http://www.nccommerce.com"; title="Department of > > Commerce">Department of Commerce - > href="http://www.doc.state.nc.us/"; title="Department of > > Correction">Department of Correction - > href="http://www.nccrimecontrol.org/";
Solr the right thing for me?
Hello all, I'm searching for a possibility to: - Receive an email when a site changed/was added to a web. - Only index sites, that contain a reg exp in the content. - Receive the search results in machine readable way (RSS/SOAP/..) This should be possible to organize in sets. (set A with 40 Websites, set B with 7 websites) Does it sound possible with SOLR? Do I have to expect custom development? If so, how much? Thank you in advance Bye, Chris
Capabilities of solr
Hello, We currently have a ton of documents that we would like to index and make search-able. I came across solr and it seems like it offers a lot of nice features and would suite our needs. The documents are in similar structure to java code, blocks representing functions, variables, comment blocks etc. We would also like to provide our users the ability to "tag" a line, or multiple lines of the document with comments that would be stored externally, for future reference or notes for enhancements. These documents are also updated frequently. I also noticed in the examples that XML documents are used to import documents into solr. If we have code like documents vs. for example products is there any specific way to define the solr schema for these types of documents? Currently we maintain these documents as flat files and in MySQL. Does solr sound like a good option for what we are looking to do? If so, could anybody provide some starting points for my research? Thank you
Re: slowdown after 15K queries
Maybe you jetty need to turning how many memory in your system ? Can you show the processes information with the java processes ? above Chris 2008/6/2 Bram de Jong <[EMAIL PROTECTED]>: > Hello all, > > > Still running tests on solr using the example jetty container. I've > been getting nice performance. However, suddenly between 15400 and > 15600 queries, I get a very serious drop in performance, and this > every time I run my test, independent of what I'm searching for. The > performance STAYS low and doesn't come up again until I restart > Jerry/Solr. > > This what I'm getting. The number between parentheses is the nr of > queries done until "now". > > > average query time this batch ( 2798 ) : 21.7171502113 > average query time this batch ( 2998 ) : 21.556429863 > average query time this batch ( 3197 ) : 20.7244367456 > average query time this batch ( 3397 ) : 20.9529149532 > average query time this batch ( 3597 ) : 21.7199647427 > > > Then sudenly around 14K my average time goes up 3 fold: > > > average query time this batch ( 15183 ) : 22.5312757732 > average query time this batch ( 15383 ) : 27.6089298725 <- > average query time this batch ( 15583 ) : 66.8137800694 <- > average query time this batch ( 15783 ) : 67.5224089622 <- > average query time this batch ( 15983 ) : 68.210555315 <- > > > I tried taking another set of searches (I'm replaying searches done on > our website), but exactly the same pattern occurs. The cumulative > evictions for all caches is 0 before and after the slowdown, so my > initial thought (i.e. full cache) was not it. I did some further > investigating, and it looks like only once every few searches becomes > slow. This is the batch of searches for block nr 15583. Every search > string is mentioned and then the query_time as reported by Solr: > > ('electricity', 19), ('killed', 16), ('radio static', 179), ('monster > killed', 15), ('heavy machinery', 16), ('killed', 179), ('chimes', > 17), ('video games', 17), ('sword', 16), ('construction machine', 17), > ('graveyard', 15), ('people', 16), ('yard', 179), ('horn', 15), > ('bugle', 14), ('trumpet', 17), ('grass walking', 177), ('walking', > 17), ('horn', 15), ('clunk', 14), ('hydraulic', 178), ('jet landing', > 16), ('o fortuna', 14), ('large crowd', 180), ('dj', 15), ('hallway', > 15), ('scrach', 13), ('jet tires screeching', 177), ('tires > screeching', 14), ('ambient', 16), ('electricity', 180), ('tires', > 15), ('hospital', 15), ('chimes', 17), ('win chimes', 178), ('wind > chimes', 15), ('sex', 15), ('SPACE', 17), ('river', 180), ('thunder > storms', 16), ('crash', 15), ('boss', 16), ('thunder', 16), ('car > braking', 16), ('vocal', 17), ('vocal dance', 182), ('vocal', 16), > ('stream', 16), ('whale', 14), ('space ambient', 183), ('animal', 16), > ('pad', 15), ('body', 16), ('crickets', 180), ('fall', 16), ('camera > flash bulb', 15), ('arctic', 178), ('flash bulb', 13), ('camera', 15), > ('drawn', 16), ('next level', 180), ('timbale', 14), ('navigation', > 14), ('bass', 14), ('blips', 179), > > > Any hints of things I can try would be superb. Having to wait 15000 * > 25ms every time I want to try something else to fix this is becoming a > bit annoying :) > > > - Bram > > PS: if you are curious: the things people search for are sounds :) > hence the very varied set of search strings. > > -- > http://freesound.iua.upf.edu > http://www.smartelectronix.com > http://www.musicdsp.org > -- Chris Lin [EMAIL PROTECTED] Taipei , Taiwan. ---
Re: CURL command problem on Solr
HTTP headers are case insensitive Original message From: simon Date: 5/29/18 12:17 PM (GMT-05:00) To: solr-user Subject: Re: CURL command problem on Solr Could it be that the header should be 'Content-Type' (which is what I see in the relevant RFC) rather than 'Content-type' as shown in your email ? I don't know if headers are case-sensitive, but it's worth checking. -Simon On Tue, May 29, 2018 at 11:02 AM, Roee Tarab wrote: > Hi , > > I am having some troubles with pushing a features file to solr while > building an LTR model. I'm trying to upload a JSON file on windows cmd > executable from an already installed CURL folder, with the command: > > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' > --data-binary "@/path/myFeatures.json" -H 'Content-type:application/json'. > > I am receiving the following error massage: > > { > "responseHeader":{ > "status":500, > "QTime":7}, > "error":{ > "msg":"Bad Request", > "trace":"Bad Request (400) - Invalid content type > application/x-www-form-urlencoded; only application/json is > supported.\r\n\tat org.apache.solr.rest.RestManager$ManagedEndpoint. > parseJsonFromRequestBody(RestManager.java:407)\r\n\tat > org.apache.solr.rest. > RestManager$ManagedEndpoint.put(RestManager.java:340) > > This is definitely a technical issue, and I have not been able to overcome > it for 2 days. > > Is there another option of uploading the file to our core? Is there > something we are missing in our command? > > Thank you in advance for any help, >
Best practice for saving state of large cluster?
I have a cluster of 100 shards on 100 nodes, with solr 7.5, running in AWS.The use case is read-dominant, with ingestion performed about once per week. There are about 84 billion documents in the cluster. It is unused on weekends and only used during normal business hours M-F.What I do now is after each round of ingestion, create a new set of AMIs, then terminate each instance.The next morning, the cluster is restarted by creating a new set of spot requests, using the most recent AMIs. At the end of the day, the cluster is turned off by terminating the instances (if no data was changed), or by creating a new set of AMIs and then terminating the instances.Is there a better way to do this? I'm not facing any real problems with this setup, but I want to make sure I'm not missing something obvious.Thanks,Chris
Re: Expected mime type application/octet-stream but got text/html
: : Indeed, it's a doc problem. A long time ago in a Solr far away, there : was a bunch of effort to use the "default" collection (collection1). : When that was changed, this documentation didn't get updated. : : We'll update it in a few, thanks for reporting! Fixed on erick's behalf because he had to run to a meeting... https://cwiki.apache.org/confluence/display/solr/Distributed+Requests ...i also wen't ahead to shift the examples to more emphasize using shard Ids since that's probably safer/cleaner for most people. -Hoss http://www.lucidworks.com/
Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execut e(Runnable)
: Not sure I'm onboard with the first proposed solution, but yes, I'd open a : JIRA issue to discuss. we should standardize the context keys to use use fully qualified (org.apache.solr.*) java class name prefixes -- just like we do with the logger names themselves. : : - Mark : : On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith : wrote: : : > Hi, : > : > I'm using SLF4J MDC to log additional Information in my WebApp. Some of my : > MDC-Parameters even include Line-Breaks. : > It seems, that Solr takes _all_ MDC parameters and puts them into the : > Thread-Name, see : > : > org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable). : > : > When there is some logging of Solr, the log gets cluttered: : > : > [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169 : > [zkCallback-14-thread-1-processing-My : > Custom : > MDC : > Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN : > common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh] : > [user=SANDHO]: zkClient received AuthFailed : > : > (some of my MDC-Parameters are only active in Email-Logs and are not : > included in the file-log) : > : > I think this is a Bug. Solr should only put its own MDC-Parameter into the : > Thread-Name. : > : > Possible Solution: Since all (as far as i can check) invocations in Solr of : > MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or : > "CloudSolrClient" etc., it would be possible to put a check into : > MDCAwareThreadPoolExecutor.execute(Runnable) that process only those : > Prefixes. : > : > Should i open a Jira-Issue for this? : > : > Thanks, : > : > Konstantin : > : > Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12, : > all jars are in WEB-INF/lib. : > : -- : - Mark : about.me/markrmiller : -Hoss http://www.lucidworks.com/
Re: Boost does not appear in solr debug explain debug
: ((attr_search:8 attr_search:gb)~2^5.0) : : I hope to be right, but I expect to find a boost in both the values : matches. 1) "boost" information should show up as a detail of the "queryWeight", which is itself a detail of the "weight" of term clauses -- in the output you've included below, you've replaced those details with "..." so it's not clear if they are missing or not. 2) the query syntax you posted, as far as i know, is not a valid syntax for any of the parsers Solr ships with out of the box -- for the default parser it produces a syntax error, for parsers like dismax and edismax i'm fairly certain what you are getting is a query for the term "2" that has a boost of 5.0 on it. hard to be sure since you didn't give us the full details of your request/config... 3) based on the output you provided, you are using a custom "EpriceSimilarity" similarity class ... custom similarities are heavily involved in both the score, and score explaination generation -- so even if the query syntax is valid, and meaningful for whatever query parser you are using, it's possible that the EpriceSimilarity is doing something odd with that boost info at query time. : Now, I don't understand why, even if both the terms matches, I don't see : the boost in the explain. : : : true : 0.10516781 : sum of: : : : true : 0.06489531 : : weight(attr_search:8 in 927) [EpriceSimilarity], result of: : : ... : : : true : 0.040272504 : : weight(attr_search:gb in 927) [EpriceSimilarity], result of: : : ... : : : : : I suppose to find something like: attr_search:8^5 and attr_search:gb^5 in : the explain. : or something that tells I have both the matches so there is a boost : somewhere. : What's wrong in my assumption? What's I'm missing? : : : -- : Vincenzo D'Amore : email: v.dam...@gmail.com : skype: free.dev : mobile: +39 349 8513251 : -Hoss http://www.lucidworks.com/
Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases
: What I’m finding is that now and then base_url for the replica in : state.json is set to the internal IP of the AWS node. i.e.: : : "base_url":"http://10.29.XXX.XX:8983/solr”, : : On other attempts it’s set to the public DNS name of the node: : : "base_url":"http://ec2_host:8983/solr”, : : In my /etc/defaults/solr.in.sh I have: : : SOLR_HOST=“ec2_host” : : which I thought is what I needed to get the public DNS name set in base_url. i believe you are correct. the "now and then" part of your question is weird -- it seems to indicate that sometimes the "correct" thing is happening, and other times it is not. /etc/defaults/solr.in.sh isn't the canonical path for solr.in.sh according to the docs/install script for running a production solr instance... https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-ServiceInstallationScript ...how *exactly* are you running Solr on all of your nodes? because my guess is that you've got some kind of inconsistent setup where sometimes when you startup (or restart) a new node it does refer to your solr.in.sh file, and other times it does not -- so sometimes solr never sees your SOLR_HOST option. In those cases, when it regesters itself with ZooKeeper it uses the current IP as a fallback, and then that info gets backed into the metadata for the replicas that get created on that node at that point in time. FWIW, you should be able to spot check that the SOLR_HOST is being applied correctly by looking at the java process command line args (using PS, or loading the SOlr UI in your browser) and checking for the "-Dhost=..." option -- if it's not there, then your solr.in.sh probably wasn't read in correctly -Hoss http://www.lucidworks.com/
Re: Boost query vs function query in edismax query
: Boost Query (bq) accepts lucene queries. E.g. bq=price:[50 TO 100]^100 : boost and bf parameters accept Function queries, e.g. boost=log(popularity) while these statements are both true, they don't tell the full story. for example you can also specify a function as a query using the appropriate parser: bq={!func}log(popularity) or turn any query into a function that produces values according to the query score: boost=query({!lucene v='price:[50 TO 100]^100'}) The fundemental difference between bq & boost is: "bq" causes a an additional 'boost query' clause to be *added* to your original query "boost" causes the scores for each doc from your original to be *multiplied* by the results of the specified function evaluated against the same doc. (in both cases "original query" refers to your "q" param parsed with respects to qf, pf, etc...) So a query like this... q={!edismax}bar & qf=foo & bq=x:y ...is roughly equivilent to: q={!lucene}+foo:bar x:y While a query like this... q={!edismax}bar & qf=foo & boost=query({!lucene v='x:y'}) ...is roughly equivilent to... q={!func}prod(query({!edismax qf='foo' v='bar'}), query({!lucene v='x:y'})) because of how they affect final scores, the 'boost' param is almost always what you really want and is really nothing more then shorthand for wrapping your entire query in a "BoostQParser" ... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser -Hoss http://www.lucidworks.com/
Re: Solr trying to auto-update schema.xml
: Thanks, very helpful. I think I'm on the right track now, but when I do a : post now and my UpdateRequestProcessor extension tries to add a field to a : document, I get: : : RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] : Error adding field 'myField'='2234543' : : The call I'm making is SolrInputDocument.addField(field name, value). Is : that trying to add a field to the schema.xml? The field (myField) is : already defined in schema.xml. By calling SolrInputDocument.addField(), my : goal is to add the field to the document and give it a value. what is the full stack trace of that error in your logs? it's not indicating that it's trying to add a *schema* field named "myField", it's saying that it's trying to add a *document* field with the name 'myField' and the value '2234543' and some soert of problem is occurring -- it may be because the schema doesn't have that field, or because the FieldType of myField complained that the value wasn't valid for tha type, etc... the stack trace has the answers. -Hoss http://www.lucidworks.com/
Re: SearchComponent does not handle negative fq ???
Concrete details are crucial -- what exactly are you trying, what results are you getting, how do those results differ from what you expect? https://wiki.apache.org/solr/UsingMailingLists Normally, even when someone only gives a small subset of the crucial details needed to answer their question, there are at least some loose threads of terms that help other folks make guesses as to what the question is about -- but in your case i really can't even begin to imagine... "SearchComponent" is an abstract class implemented by doxens of concrete classes that do everything under teh sun. what aspect of SearchComponent do you think causes you problems with negative fq clauses? or are you trying to ask a question about some specific SearchComponent that you didn't mention by name? : Date: Fri, 22 Jan 2016 15:53:20 -0700 (MST) : From: vitaly bulgakov : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: SearchComponent does not handle negative fq ??? : : From my experiments looks like SearchComponent does not handle negative fq : correctly. : Does anybody have have such experience ? : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/SearchComponent-does-not-handle-negative-fq-tp4252688.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss http://www.lucidworks.com/
Near Duplicate Documents, "authorization"? tf/idf implications, spamming the index?
Hey Solr people: Suppose that we did not want to break up our document set into separate indexes, but had certain cases where many versions of a document were not relevant for certain searches. I guess this could be thought of as a "authorization" class of problem, however it is not that for us. We have a few other fields that determine relevancy to the current query, based on what page the query is coming from. It's kind of like authorization, but not really. Anyway, I think the answer for how you would do it for authorization would solve it for our case too. So I guess suppose you had 99 users and 100 documents and Document 1 everybody could see it the same, but for the 99 documents, there was a slightly different document, and it was unique for each of 99 users, but not "very" unique. Suppose for instance that the only thing different in the text of the 99 different documents was that it was watermarked with the users name. Aren't you spamming your tf/idf at that point? Is there a way around this? Is there a way to say, hey, group these 99 documents together and only count 1 of them for tf/idf purposes? When doing queries, each user would only ever see 2 documents, Document 1 , plus whichever other document they specifically owned. If there are web pages or book chapters I can read or re-read that address this class of problem, those references would be great. -Chris.
Re: Why is my index size going up (or: why it was smaller)?
: I'm testing this on Windows, so that maybe a factor too (the OS is not : releasing file handles?!) specifically: Windows won't let Solr delete files on disk that have open file handles... https://wiki.apache.org/solr/FAQ#Why_doesn.27t_my_index_directory_get_smaller_.28immediately.29_when_i_delete_documents.3F_force_a_merge.3F_optimize.3F -Hoss http://www.lucidworks.com/
Re: Retrieving 1000 records at a time
: I have a requirement where I need to retrieve 1 to 15000 records at a : time from SOLR. : With 20 or 100 records everything happens in milliseconds. : When it goes to 1000, 1 it is taking more time... like even 30 seconds. so far all you've really told us about your setup is that some queries with "rows=1000" are slow -- but you haven't really told us anything else we can help you with -- for example it's not obvious if you mean that you are using start=0 in all ofthose queries andthey are slow, or if you mean you are paginating through results (ie: increasing start param) 1000 at a time nad it starts getting slow as you page deeply. you also haven't told us anything about the fields you are returning -- how many are there?, what data types are they? are they large string values? how are you measuring the time? are you sure network lag, or client side processing of the data as solr returns it isn't the bulk of the time you are measuring? what does the QTime in the solr responses for these slow queries say? my best guesses are that either: you are doing deep paging and conflating the increased response time for deep results with an increase in response time for large rows params (because you are getting "deeper" faster with a large rows#) or you are seeing an increase in processing time on the client due ot the large volume of data being returned -- possibly even with SolrJ which is designed to parse the entire response into java data structures by default before returning to the client. w/o more concrete information, it's hard to give you advice beyond guesses. potentially helpful links... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html -Hoss http://www.lucidworks.com/
[ANNOUNCE] Apache Solr 5.5.0 and Reference Guide for 5.5 available
Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 5.5.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/solr/5_5_0/changes/Changes.html This is expected to be the last 5.x feature release before Solr 6.0.0. Solr 5.5 Release Highlights: * The schema version has been increased to 1.6, and Solr now returns non-stored doc values fields along with stored fields * The PERSIST CoreAdmin action has been removed * The element is deprecated in favor of a similar element, in solrconfig.xml * CheckIndex now works on ?HdfsDirectory * RuleBasedAuthorizationPlugin now allows wildcards in the role, and accepts an 'all' permission * Users can now choose compression mode in SchemaCodecFactory * Solr now supports Lucene's XMLQueryParser * Collections APIs now have async support * Uninverted field faceting is re-enabled, for higher performance on rarely changing indices Further details of changes are available in the change log available at: http://lucene.apache.org/solr/5_5_0/changes/Changes.html Also available is the Solr Reference Guide for Solr 5.5. This PDF serves as the definitive user's manual for Solr 5.5. It can be downloaded from the Apache mirror network: https://s.apache.org/Solr-Ref-Guide-PDF Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access. -Hoss http://www.lucidworks.com/
RE: Solr debug 'explain' values differ from the Solr score
Sounds like a mismatch in the way the BooleanQuery explanation generation code is handling situations where there is/isn't a coord factor involved in computing the score itself. (the bug is almost certainly in the "explain" code, since that is less rigorously tested in most cases, and the score itself is probably correct) I tried to trivially reproduce the symptoms you described using the techproducts example and was unable to generate a discrepency using a simple boolean query w/a fuzzy clause... http://localhost:8983/solr/techproducts/query?q=ipod~%20belkin&fl=id,name,score&debug=query&debug=results&debug.explain.structured=true ...can you distill one of your problematic queries down to a shorter/simpler reproducible example, and/or provide us with the field & fieldType details for all of the fields used in your example? (i'm guessing it probably relates to your firstName_phonetic field?) : Date: Tue, 15 Mar 2016 13:17:04 -0700 : From: Rick Sullivan : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" : Subject: RE: Solr debug 'explain' values differ from the Solr score : : After some digging and experimentation, here are some more details on the issue I'm seeing. : : : 1. The adjusted documents' scores are always exactly (debug_score/N), where N is the number of OR items in the query. : : For example, `&q=firstName:gabby~ firstName_phonetic:gabby firstName_tokens:(gabby)` will result in some of the documents with firstName==GABBY receiving a score 1/3 of the score of other GABBY documents, even though the debug explanation shows that they generated the same score. : : : 2. This doesn't appear to be a brand new issue, or an issue with SolrCloud. : : I've tested the problem using SolrCloud 5.5.0, Solr 5.5.0 (not cloud), and Solr 5.4.1. : : : Anyone have any ideas? : : Thanks, : -Rick : : From: r...@ricksullivan.net : To: solr-user@lucene.apache.org : Subject: Solr debug 'explain' values differ from the Solr score : Date: Thu, 10 Mar 2016 08:34:30 -0800 : : Hi, : : I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the debug response don't always correspond with the scores Solr assigns to the matched documents. : : For example, here is the top-level debug information for two documents matched by a query: : : 114628: Object : description: "sum of:" : details: Array[2] : match: true : value: 20.542768 : : 357547: Object : description: "sum of:" : details: Array[2] : match: true : value: 26.517654 : : But they have scores : : 114628: 20.542767 : 357547: 13.258826 : : I expect the second document to be the most relevant for my query, and the debug values seem to agree. However, in the final score I receive, that document's score has been adjusted down. : : The relevant debug response information can be found here: http://apaste.info/mju : : Does anyone have an idea why the Solr score may differ from the debug value? : : Thanks, : -Rick -Hoss http://www.lucidworks.com/
Re: Explain style json? Without using wt=json...
: We are using Solrj to query our solr server, and it works great. : However, it uses the binary format wt=javabin, and now when I'm trying : to get better debug output, I notice a problem with this. The thing is, : I want to include the explain data for each search result, by adding : "[explain]" as a field for the fl parameter. And when using [explain : style=nl] combined with wt=json, the explain output is proper and valid : json. However, when I use something other than wt=json, the explain : output is not proper json. Forget about qt param and/or the responsewriter/response parser used by SolrJ for a minute. When you use "fl=[explain style=nl]" what's happening is that *structured* named list containing hte explanation metadata is included in each document in the response -- as opposed to "style=text" in which a simple whitespace indented string representation of the score explanation is included. Now lets think about the "wt" param -- that controls how *structured* data is written over the wirte -- so with "fl=[explain style=nl]" the structurred score explanation is written over the wire as json with wt=json, or xml with wt=xml, or in solr's custom binary protocol with wt=javabin. As a SolrJ user, regardless of what "wt" value is used by SolrJ under the covers, SolrJ will use an appropriate response parser to recreate the structure of the data in your client application. So what you get in your java application when you access that psuedo-field on each document is going to depend on the (effective) "style" value of that transformer -- not the "wt" used. So for "style=text" your client code will find a java.lang.String containing the same simple string representation mentioned above (just like you see in your browser with wt=json or wt=xml). for "style=nl" you're going to get back and org.apache.solr.common.NamedList object (with the same structure as what you would see in your browser with wt=xml or wt=json) which you can traverse using the appropriate java methods to pull out the various keys & values that you want. if you simply call toString() on this object you're going to get a basic dump of the data which might look like broken JSON, but is relaly just an attempt at returning some useful toString() info for debugging. I suspect that "NamedList.toString()" output is what's confusing you... : And, the reason I want to explain segment in proper json format, is that : I want to turn it into a JSONObject, in order to get proper indentation : for easier reading. Because the regular output doesn't have proper : indentation. in your java code, you can walk the NamedList structure you get back recursively, and call the appropraite methods to get the list of key=>val pairs to add them to convert it to a JSONObject. There is no server side option to write the explain data back as a "String contianing a JSON representation of the structured data" which will then be passed as a raw string all the way back to the client. -Hoss http://www.lucidworks.com/
Re: BCE dates on solr TrieDateField
BCE dates have historically been problematic because of ambiguity in both the ISO format that we use for representing dates as well as the internal java representation, more details... https://issues.apache.org/jira/browse/SOLR-1899 ..the best work around I can suggest is to use simple numeric fields to represent your dates -- either as millis since whatever epoch you want, or as distinct year, month, day fields. : Date: Mon, 21 Mar 2016 12:53:50 -0400 : From: jude mwenda : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: BCE dates on solr TrieDateField : : Hey, : : I hope this email finds you well. I have a solr.TrieDateField and I am : trying to send -ve dates to this field. Does the TrieDateField allow for : -ve dates? when I push the date -1600-01-10 to solr i get 1600-01-10 as the : date registered. Please advise. : : -- : Regards, : : Jude Mwenda : -Hoss http://www.lucidworks.com/
RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
: What I'm wondering is, what should one do to fix this issue when it : happens. Is there a way to recover? after the WARN appears. It's just a warning that you have a sub-optimal situation from a performance standpoint -- either committing too fast, or warming too much. It's not a failure, and Solr will continue to serve queries and process updates -- but meanwhile it's detected that the situation it's in involves wasted CPU/RAM. : In my observation, this WARN comes when I hit frequent hard commits or : hit re-load config. I'm not planning on to hit frequent hard commits, : however sometimes accidently it happens. And when it happens the : collection crashes without a recovery. If you're seeing a crash, then that's a distinct problem from the WARN -- it might be related tothe warning, but it's not identical -- Solr doesn't always (or even normally) crash in the "Overlapping onDeckSearchers" sitaution So if you are seeing crashes, please give us more detials about these crashes: namely more details about everything you are seeing in your logs (on all the nodes, even if only one node is crashing) https://wiki.apache.org/solr/UsingMailingLists -Hoss http://www.lucidworks.com/
Re: Is there any JIRA changed the stored order of multivalued field?
: We do POST to add data to Solr v4.7 and Solr v5.3.2 respectively. The : attachmentnames are in 789, 456, 123 sequence: ... : And we do GET to select data from solr v4.7 and solr v5.3.2 respectively: : http://host:port/solr/collection1/select?q=id:1&wt=json&indent=true ... : Is there any JIRA fixed making this order changed? Thanks! https://issues.apache.org/jira/browse/SOLR-5777 The bug wasn't in returning stored fields, it was in how the JSON was parsed when a field name was specified multiple times (instead of a single time with an array of values) when adding a document. -Hoss http://www.lucidworks.com/
Re: Solr response error 403 when I try to index medium.com articles
403 means "forbidden" Something about the request Solr is sending -- or soemthing about the IP address Solr is connecting from when talking to medium.com -- is causing hte medium.com web server to reject the request. This is something that servers may choose to do if they detect (via headers, or missing headers, or reverse ip lookup, or other distinctive nuances of how the connection was made) that the client connecting to their server isn't a "human browser" (ie: firefox, chrome, safari) and is a Robot that they don't want to cooperate with (ie: they might be happy toserve their pages to the google-bot crawler, but not to some third-party they've never heard of. The specifics of how/why you might get a 403 for any given url are hard to debug -- it might literally depend on how many requests you've sent tothat domain in the past X hours. In general Solr's ContentStream indexing from remote hosts isn't inteded to be a super robust solution for crawling arbitrary websites on the web -- if that's your goal, then i would suggest you look into running a more robust crawler (nutch, droids, Lucidworks Fusion, etc...) that has more features and debugging options (notably: rate limiting) and use that code to feath the content, then push it to Solr. : Date: Tue, 29 Mar 2016 20:54:52 -0300 : From: Jeferson dos Anjos : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Solr response error 403 when I try to index medium.com articles : : I'm trying to index some pages of the medium. But I get error 403. I : believe it is because the medium does not accept the user-agent solr. Has : anyone ever experienced this? You know how to change? : : I appreciate any help : : : 500 : 94 : : : : Server returned HTTP response code: 403 for URL: : https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 : : : java.io.IOException: Server returned HTTP response code: 403 for URL: : https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 : at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown : Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown : Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) : at java.security.AccessController.doPrivileged(Native Method) at : sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown : Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown : Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown : Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown : Source) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87) : at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) : at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) : at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) : at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at : org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) : at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) : at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) : at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) : at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) : at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) : at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) : at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) : at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) : at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) : at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) : at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) : at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) : at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) : at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) : at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) : at org.eclipse.jetty.server.Server.handle(Server.java:368) at : org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) : at org.eclipse.jetty.server.Bl
Re: Load Resource from within Solr Plugin
: : 1) as a general rule, if you have a delcaration which includes "WEB-INF" you are probably doing something wrong. Maybe not in this case -- maybe "search-webapp/target" is a completley distinct java application and you are just re-using it's jars. But 9 times out of 10, when people have a WEB-INF path they are trying to load jars from, it's because they *first* added their jars to Solr's WEB_INF directory, and then when that didn't work they added the path to the WEB-INF dir as a ... but now you've got those classes being loaded twice, and you've multiplied all of your problems. 2) let's ignore the fact that your path has WEB-INF in it, and just assume it's some path to somewhere where on disk that has nothing to do with solr, and you want to load those jars. great -- solr will do that for you, and all of those classes will be available to plugins. Now if you wnat to explicitly do something classloader related, you do *not* want to be using Thread.currentThread().getContextClassLoader() ... because the threads that execute everything in Solr are a pool of worker threads that is created before solr ever has a chance to parse your directive. You want to ensure anything you do related to a Classloader uses the ClassLoader Solr sets up for plugins -- that's available from the SolrResourceLoader. You can always get the SolrResourceLoader via SolrCore.getSolrResourceLoader(). from there you can getClassLoader() if you really need some hairy custom stuff -- or if you are just trying to load a simple resource file as an InputStream, use openResource(String name) ... that will start by checking for it in the conf dir, and will fallback to your jar -- so you can have a default resource file shipped with your plugin, but allow users to override it in their collection configs. -Hoss http://www.lucidworks.com/
Re: Sort order for *:* query
1) The hard coded implicit default sort order is "score desc" 2) Whenever a sort results in ties, the final ordering of tied documents is non-deterministic 3) currently the behavior is that tied documents are returned in "index order" but that can change as segments are merged 4) if you wish to change the beahvior when there is a tie, just add additional deterministic sort clauses to your sort param. This can be done at the request level, or as a user specified "default" for the request handler... https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig : Date: Mon, 4 Apr 2016 13:34:27 -0400 : From: Steven White : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Sort order for *:* query : : Hi everyone, : : When I send Solr the query *:* the result I get back is sorted based on : Lucene's internal DocID which is oldest to most recent (can someone correct : me if I get this wrong?) Given this, the most recently added / updated : document is at the bottom of the list. Is there a way to reverse this sort : order? If so, how can I make this the default in Solr's solrconfig.xml : file? : : Thanks : : Steve : -Hoss http://www.lucidworks.com/
Re: Tutorial example loading of exampledocs for *.xml fails due to bad request
: When I attempt the second example, of loading the *.xml files, I receive an : error back. I tried just one of the XMLs and receive the same error. Yeah ... there's a poor assumption here in the tutorial. note in particular this paragraph... --SNIP-- Solr's install includes a handful of Solr XML formatted files with example data (mostly mocked tech product data). NOTE: This tech product data has a more domain-specific configuration, including schema and browse UI. The bin/solr script includes built-in support for this by running bin/solr start -e techproducts which not only starts Solr but also then indexes this data too (be sure to bin/solr stop -all before trying it out). However, the example below assumes Solr was started with bin/solr start -e cloud to stay consistent with all examples on this page, and thus the collection used is "gettingstarted", not "techproducts". --SNIP-- If you use "bin/solr start -e techproducts" (or explicitly create a solr collection using the "sample_techproducts" config set) then those documents will index just fine -- but the assumption written here in the tutorial that you can index those tech product documents to the same gettingstarted collection you've been indexing to earlier in the tutorial is definitely flawed -- the fieldtype deduction logic that's applied for the gettingstarted collection (and the specific type deduced from the earlier docs) won't neccessarily apply to the sample tech product documents. https://issues.apache.org/jira/browse/SOLR-8943 -Hoss http://www.lucidworks.com/
Re: Sort order for *:* query
: You can sort like this (I believe that _version_ is the internal id/index : number for the document, but you might want to verify) that is not true, and i strongly advise you not to try to sort on the _version_ field ... for some queries/testing it may deceptively *look* like it's sorting by the order the documents are added, but it will not actaully sort in any useful way -- two documents added in sequence A, B may have version values that are not in ascending sequence (depending on the hash bucket their uniqueKeys fall in for routing purposes) so sorting on that field will not give you any sort of meaningful order If you want to sort by "recency" or "date added you need to add a date based field to capture this. see for example the TimestampUpdateProcessorFactory... https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html -Hoss http://www.lucidworks.com/
Re: Sort order for *:* query
: : Not sure I understand... _version_ is time based and hence will give : roughly the same accuracy as something like : TimestampUpdateProcessorFactory that you recommend below. Both Hmmm... last time i looked, i thought _version_ numbers were allocated & incremented on a per-shard basis and "time" was only used for initial seeding when the leader started up -- so in a stable system running for a long time, if shardA gets signifcantly more updates then shardB the _version_ numbers can get skewed and a new doc in shardB might be updated with a _version_ less then the _version_ of a document added to shardA well before that. But maybe I'm remembering wrong? -Hoss http://www.lucidworks.com/
Re: Complex Sort
: I am not sure how to use "Sort By Function" for Case. : : |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0| : : Can you tell how to fetch 40 when input is 10. Something like... if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,))) But i suspect there may be a much better way to achieve your ultimate goal if you tell us what it is. what do these fields represent? what makes these numeric valuessignificant? do you know which values are significant when indexing, or do they vary for every query? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss http://www.lucidworks.com/
Re: How to use TZ parameter in a query
Please note the exact description of hte property on the URL you mentioned.. "The TZ parameter can be specified to override the default TimeZone (UTC) used for the purposes of adding and rounding in date math" The newer ref guide docs for this param also explain... https://cwiki.apache.org/confluence/display/solr/Working+with+Dates "By default, all date math expressions are evaluated relative to the UTC TimeZone, but the TZ parameter can be specified to override this behaviour, by forcing all date based addition and rounding to be relative to the specified time zone." The TZ param does not change the *format* of the response in XML or JSON, which is an ISO standard format that always uses UTC for rendering as a string, because it is unambiguious regardless of the client parsing it. Just because you might want "date range faceting by day according to localtime in denver" doesn't mean your python or perl or javascript code for parsing the response will suddenly realize that the string responses are sometimes GMT-7 and sometimes GMT-8 (depending on the local daylight savings rules in colorado) -Hoss http://www.lucidworks.com/
Re: Solr update fails with “Could not initialize class sun.nio.fs.LinuxNativeDispatcher”
hat's a strainge error to get. I can't explain why LinuxFileSystem can't load LinuxNativeDispatcher, but you can probably bypass hte entire situation by explicitly configuring ConcurrentMergeScheduler with defaults so that it doesn't try determine wether you are using an SSD or "spinning" disk... http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/ConcurrentMergeScheduler.html https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments Something like this in your indexConfig settings... 42 7 ...will force those specific settings, instead of trying to guess defaults. I haven't tested this, but in theory you can also use something like to indicate definitively that you are using a spinning disk (or not) but let it pick the appropriate default values for the merge count & threads accordingly ... true : Date: Thu, 7 Apr 2016 22:56:54 + : From: David Moles : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" : Subject: Solr update fails with “Could not initialize class : sun.nio.fs.LinuxNativeDispatcher” : : Hi folks, : : New Solr user here, attempting to apply the following Solr update command via curl : : curl 'my-solr-server:8983/solr/my-core/update?commit=true' \ : -H 'Content-type:application/json' -d \ : '[{"my_id_field":"some-id-value","my_other_field":{"set":"new-field-value"}}]' : : I'm getting an error response with a stack trace that reduces to: : : Caused by: java.lang.NoClassDefFoundError: Could not initialize class sun.nio.fs.LinuxNativeDispatcher : at sun.nio.fs.LinuxFileSystem.getMountEntries(LinuxFileSystem.java:81) : at sun.nio.fs.LinuxFileStore.findMountEntry(LinuxFileStore.java:86) : at sun.nio.fs.UnixFileStore.(UnixFileStore.java:65) : at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:44) : at sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:51) : at sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:39) : at sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368) : at java.nio.file.Files.getFileStore(Files.java:1461) : at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:528) : at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:483) : at org.apache.lucene.util.IOUtils.spins(IOUtils.java:472) : at org.apache.lucene.util.IOUtils.spins(IOUtils.java:447) : at org.apache.lucene.index.ConcurrentMergeScheduler.initDynamicDefaults(ConcurrentMergeScheduler.java:371) : at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:457) : at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1817) : at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2761) : at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2866) : at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2833) : at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:586) : at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) : at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) : at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635) : at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612) : at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161) : at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) : at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78) : at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) : at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) : at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) : ... 22 more : : It looks like sun.nio.fs can't find its own classes, which seems odd. Solr is running with OpenJDK 1.8.0_77 on Amazon Linux AMI release 2016.03. : : Does anyone know what might be going on here? Is it an OpenJDK / Amazon Linux problem? : : -- : David Moles : UC Curation Center : California Digital Library : : : -Hoss http://www.lucidworks.com/
Re: Range filters: inclusive?
: When I perform a range query of ['' TO *] to filter out docs where a : particular field has a value, this does what I want, but I thought using the : square brackets was inclusive, so empty-string values should actually be : included? I'm not sure i understand your question ... if you are dealing with something like a StrField, then the empty string (ie: 0 byte long string: "") is in fact a real term. you are inclusively including that term in what you match on. That is differnet from matching docs that do not have any values at all -- ie: they do not contain a signle term. -Hoss http://www.lucidworks.com/
Re: Range filters: inclusive?
: > When I perform a range query of ['' TO *] to filter out docs where a : > particular field has a value, this does what I want, but I thought using the : > square brackets was inclusive, so empty-string values should actually be : > included? : : They should be. Are you saying that zero length values are not : included by the range query above? Oh ... maybe i missread the question ... are you are saying that when you add a document you explicitly include the empty string as a field value, but later when yoy search for ['' TO *] those documents do not get returned? what exactly is the field type you are using, and what update processors do you have configured? If you are using a StrField (w/o any special processors) then the literal value "" should exist a a term -- but if you are using a TextField w/some analyzer then the analyzer may be throwing that input away. Likewise there are update processors that do this explicitly: https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html -Hoss http://www.lucidworks.com/
Re: How to set multivalued false, using SolrJ
: Can you do me a favour, I use solrJ to index, but I get all the : Field is multivalued. How can I set my Field to not : multivalued, can you tell me how to setting use solrJ. If you are using a "Managed Schema" (which was explicitly configured in most Solr 5.x exampleconfigs, and is now the implicit default in Solr 6) you can use the Schema API to make these changes. There is also a "SchemaRequest" convinience class for this if you are a SolrJ user... https://cwiki.apache.org/confluence/display/solr/Schema+API https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.html SolrClient client = ...; SchemaRequest req = new SchemaRequest.ReplaceField(...); ... req.process(client) -Hoss http://www.lucidworks.com/
Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from file: null
: I'm upgrading a plugin and use the AbstractSolrTestCase for tests. My tests : work fine in 5.X but when I upgraded to 6.X the tests sometimes throw an : error during initialization. Basically it says, : "org.apache.solr.common.SolrException: Error instantiating : shardHandlerFactory class : [org.apache.solr.handler.component.HttpShardHandlerFactory]: Unable to : build KeyStore from file: null" Ugh. and of course there are no other details to troubleshoot that because the stupid error handling doesn't wrap the original exception -- it just throws it away. I'm pretty sure the problem you are seeing (unfortunately manifested in a really confusing way) is that SolrTestCaseJ4 (and AbstractSolrTestCase which subclasses it) has randomized the use of SSL for a while, but at some point it also started randomizing the use of client auth -- but this randomization happens very infrequently. (for details, check out the SSLTestConfig and it's usage in SolrTestCaseJ4) The bottom line is, in order for the (randomized) clientAuth stuff to work, SolrTestCaseJ4 assumes it can find an "../etc/test/solrtest.keystore" realtive to ExternalPaths.SERVER_HOME. If you don't have that in your test setup, bad things happen. I believe the quickest way for you to resolve this failure in your own usage of AbstractSolrTestCase is to just add the @SupressSSL annotation to your tests -- assuming you don't care about randomly testing your plugin with SSL authentication (for 99.999% of solr plugins, wether solr is being used over http or https shouldn't matter for test purposes) If you do want to include randomized SSL testing, then you need to make sure your that when/how you run your tests, ExternalPaths.SERVER_HOME resolves to the correct place, and "../etc/test/solrtest.keystore" resolves to a real file solr can use as the keystore. I'll file some Jiras to try and improve the error handline in these situations. -Hoss http://www.lucidworks.com/
Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from file: null
https://issues.apache.org/jira/browse/SOLR-8970 https://issues.apache.org/jira/browse/SOLR-8971 : Date: Mon, 11 Apr 2016 20:35:22 -0400 : From: Joe Lawson : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from : file: null : : Thanks for the insight. I figured that it was something like that and : perhaps I has thread contention on a resource that wasn't really thread : safe. : : I'll give your suggestions a shot tomorrow. : : Regards, : : Joe Lawson : On Apr 11, 2016 8:24 PM, "Chris Hostetter" wrote: : : > : > : I'm upgrading a plugin and use the AbstractSolrTestCase for tests. My : > tests : > : work fine in 5.X but when I upgraded to 6.X the tests sometimes throw an : > : error during initialization. Basically it says, : > : "org.apache.solr.common.SolrException: Error instantiating : > : shardHandlerFactory class : > : [org.apache.solr.handler.component.HttpShardHandlerFactory]: Unable to : > : build KeyStore from file: null" : > : > Ugh. and of course there are no other details to troubleshoot that : > because the stupid error handling doesn't wrap the original exception -- : > it just throws it away. : > : > I'm pretty sure the problem you are seeing (unfortunately manifested in : > a really confusing way) is that SolrTestCaseJ4 (and AbstractSolrTestCase : > which subclasses it) has randomized the use of SSL for a while, but at : > some point it also started randomizing the use of client auth -- but this : > randomization happens very infrequently. : > : > (for details, check out the SSLTestConfig and it's usage in : > SolrTestCaseJ4) : > : > The bottom line is, in order for the (randomized) clientAuth stuff to : > work, SolrTestCaseJ4 assumes it can find an : > "../etc/test/solrtest.keystore" realtive to ExternalPaths.SERVER_HOME. : > : > If you don't have that in your test setup, bad things happen. : > : > I believe the quickest way for you to resolve this failure in your own : > usage of AbstractSolrTestCase is to just add the @SupressSSL annotation to : > your tests -- assuming you don't care about randomly testing your plugin : > with SSL authentication (for 99.999% of solr plugins, wether solr is being : > used over http or https shouldn't matter for test purposes) : > : > If you do want to include randomized SSL testing, then you need to make : > sure your that when/how you run your tests, ExternalPaths.SERVER_HOME : > resolves to the correct place, and "../etc/test/solrtest.keystore" : > resolves to a real file solr can use as the keystore. : > : > I'll file some Jiras to try and improve the error handline in these : > situations. : > : > : > : > -Hoss : > http://www.lucidworks.com/ : > : -Hoss http://www.lucidworks.com/
Re: Commiting with no updates
the autoCommit settings initialize trackers so that they only fire after some updates have been made -- don't think of it as a cron that fires every X seconds, think of it as an update monitor that triggers timers. if an update comes in, and there are no timers currently active, a timer is created to to the commit in X seconds. independend of autocommit, there is other intelegence lower down in solr to try and recognize if a redundet commit is fired but no changes will result in a new search, to prevent unnneccessary object churn and cache clearing. : My autoSoftCommit is set to 1 minute. Does this actually affect things if no : documents have actually been updated/created? Will this also affect the : clearing of any caches? : : Is this also the same for hard commits, either with autoCommit or making an : explicit http request to commit. -Hoss http://www.lucidworks.com/
Re: Solr Support for BM25F
: a per field basis. I understand BM25 similarity is now supported in Solr BM25 has been supported for a while, the major change recently is that it is now the underlying default in Solr 6. : but I was hoping to be able to configure k1 and b for different fields such : as title, description, anchor etc, as they are structured documents. What you can do in Solr is configured idff Similarity instances on a per-fieldType basis -- but you can have as many fieldTypes in your schema as you want, so you could have one type used just by your title field, and a diff type used just by your description field, etc... : Current Solr Version 5.4.1 You can download the solr refrence guide for 5.4 from here... http://archive.apache.org/dist/lucene/solr/ref-guide/ You'll want to search for Similarity and in particularly "SchemaSimilarityFactory" which (in 5.4) you'll have to configure explicitly in order to use diff BM25Similarity instances for each fieldType. In 6.0, SchemaSimilarityFactory is the global default, with BM25 as the per-field default... The current (draft) guide for 6.0 (not yet released) has info on that... https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements -Hoss http://www.lucidworks.com/
RE: Shard ranges seem incorrect
: Hi - bumping this issue. Any thoughts to share? Shawn's response to your email seemed spot on acurate to me -- is there something about his answer that doesn't match up with what you're seeing? can you clarify/elaborate your concerns? http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c570d0a03.5010...@elyograg.org%3E : : -Original message- : > From:Markus Jelsma : > Sent: Tuesday 12th April 2016 13:49 : > To: solr-user : > Subject: Shard ranges seem incorrect : > : > Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we noticed something odd, the hashing ranges don't make sense (full state.json below): : > shard1 Range: 8000-d554 : > shard2 Range: d555-2aa9 : > shard3 Range: 2aaa-7fff : > : > We've also noticed ranges not going from 0 to for a 5.5 create single shard collection. Another collection created on an older (unknown) release has correct shard ranges. Any idea what's going on? : > Thanks, : > Markus : > : > {"logs":{ : > "replicationFactor":"3", : > "router":{"name":"compositeId"}, : > "maxShardsPerNode":"9", : > "autoAddReplicas":"false", : > "shards":{ : > "shard1":{ : > "range":"8000-d554", : > "state":"active", : > "replicas":{ : > "core_node3":{ : > "core":"logs_shard1_replica3", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}, : > "core_node4":{ : > "core":"logs_shard1_replica1", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active", : > "leader":"true"}, : > "core_node8":{ : > "core":"logs_shard1_replica2", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}}}, : > "shard2":{ : > "range":"d555-2aa9", : > "state":"active", : > "replicas":{ : > "core_node1":{ : > "core":"logs_shard2_replica1", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active", : > "leader":"true"}, : > "core_node2":{ : > "core":"logs_shard2_replica2", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}, : > "core_node9":{ : > "core":"logs_shard2_replica3", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}}}, : > "shard3":{ : > "range":"2aaa-7fff", : > "state":"active", : > "replicas":{ : > "core_node5":{ : > "core":"logs_shard3_replica1", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active", : > "leader":"true"}, : > "core_node6":{ : > "core":"logs_shard3_replica2", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}, : > "core_node7":{ : > "core":"logs_shard3_replica3", : > "base_url":"http://127.0.1.1:8983/solr";, : > "node_name":"127.0.1.1:8983_solr", : > "state":"active"}} : > : > : > : > : > : -Hoss http://www.lucidworks.com/
Re: UUID processor handling of empty string
I'm also confused by what exactly you mean by "doesn't work" but a general suggestion you can try is putting the RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html If you are also worried about strings that aren't exactly empty, but consist only of whitespace, you can put TrimFieldUpdateProcessorFactory before RemoveBlankFieldUpdateProcessorFactory ... https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html : Date: Thu, 14 Apr 2016 12:30:24 -0700 : From: Erick Erickson : Reply-To: solr-user@lucene.apache.org : To: solr-user : Subject: Re: UUID processor handling of empty string : : What do you mean "doesn't work"? An empty string is : different than not being present. Thee UUID update : processor (I'm pretty sure) only adds a field if it : is _absent_. Specifying it as an empty string : fails that test so no value is added. : : At that point, if this uuid field is also the , : then each doc that comes in with an empty field will replace : the others. : : If it's _not_ the , the sorting will be confusing. : All the empty string fields are equal, so the tiebreaker is : the internal Lucene doc ID, which may change as merges : happen. You can specify secondary sort fields to make the : sort predictable (the field is popular for this). : : Best, : Erick : : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla wrote: : > Hi, : > : > I have configured solr schema to generate unique id for a collection using : > UUIDUpdateProcessorFactory : > : > I am seeing a peculiar behavior - if the unique 'id' field is explicitly : > set as empty string in the SolrInputDocument, the document gets indexed : > with UUID update processor generating the id. : > However, sorting does not work if uuid was generated in this way. Also : > cursor functionality that depends on unique id sort also does not work. : > I guess the correct behavior would be to fail the indexing if user provides : > an empty string for a uuid field. : > : > The issues do not happen if I omit the id field from the SolrInputDocument . : > : > SolrInputDocument : > : > solrDoc.addField("id", ""); : > : > ... : > : > I am using schema similar to below- : > : > : > : > : > : > : > : > id : > : > : > : > : > id : > : > : > : > : > : > : > : > uuid : > : > : > : > : > Thanks, : > Susmit : -Hoss http://www.lucidworks.com/
Re: How to get stats on currency field?
The thing to remember about currency fields is that even if you tend to only put one currency value in it, any question of interpreting the values in that field has to be done relative to a specific currency, and the exchange rates may change dynamically. So use the currency function to get a numerical value in some explicit currency at the moment you execute the query, and then do stats over that function. Something like this IIRC: stats.field={!func}currency(your_field,EUR) -Hoss http://www.lucidworks.com/
Re: MiniSolrCloudCluster usage in solr 7.0.0
: At first, I saw the same exception you got ... but after a little while : I figured out that this is because I was running the program more than : once without deleting everything in the baseDir -- so the zookeeper : server was starting with an existing database already containing the : solr.xml. When MiniSolrCloudCluster is used in Solr tests, the baseDir : is newly created for each test class, so this doesn't happen. Yeah ... this is interesting. I would definitely suggest that for now you *always* start with a clean baseDir. I've opened an issue to figure out wether MiniSolrCloudCluster should fail if you don't, or make it a supported usecase... https://issues.apache.org/jira/browse/SOLR-8999 -Hoss http://www.lucidworks.com/
Re: Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5
I can't explain the results you are seeing, but you also didn't provide us with your schema.xml (ie; how are "text" and "text_auto" defined?) or enough details ot try and reproduce on a small scale (ie: what does the source data look like in the documents where these suggestion values are coming from. If i start up the "bin/solr -e techproducts" example, which is also configured to use DocumentDictionaryFactory, I don't see any duplicate suggestions... curl 'http://localhost:8983/solr/techproducts/suggest?suggest.dictionary=mySuggester&suggest=true&suggest.build=true&wt=json' {"responseHeader":{"status":0,"QTime":13},"command":"build"} curl 'http://localhost:8983/solr/techproducts/suggest?wt=json&indent=true&suggest.dictionary=mySuggester&suggest=true&suggest.q=elec' { "responseHeader":{ "status":0, "QTime":1}, "suggest":{"mySuggester":{ "elec":{ "numFound":3, "suggestions":[{ "term":"electronics and computer1", "weight":2199, "payload":""}, { "term":"electronics", "weight":649, "payload":""}, { "term":"electronics and stuff2", "weight":279, "payload":""}] ...can you provide us with some precises (and ideally minimal) steps to reproduce the problem you are describing? For Example... 1) "Add XYZ to the 5.5 sample_techproducts_configs solrconfig.xml" 2) "Add ABC to the 5.5 sample_techproducts_configs managed-schema" 3) run this curl command to index a few sample documents... 4) run this curl command to see some suggest results that have duplicates in them based on the sample data from step #3 ? -Hoss http://www.lucidworks.com/
Re: [Installation] Solr log directory
: I have a question for installing solr server. Using ' : install_solr_service.sh' with option -d , the solr home directory can be : set. But the default log directory is under $SOLR_HOME/logs. : : Is it possible to specify the logs directory separately from solr home directory during installation? install_solr_service.sh doesn't do anything special as far where logs should live -- it just writes out a (default) "/etc/default/$SOLR_SERVICE.in.sh" (if it doesn't already exist) that specifies a (default) log directory for solr to use once the service starts you are absolutely expected to overwrite that "$SOLR_SERVICE.in.sh" file with your own specific settings -- in fact you *must* to configure things like ZooKeeper or SSL -- after the installation script finishes, and you are welcome to change the SOLR_LOGS_DIR setting to anything you want. -Hoss http://www.lucidworks.com/
Re: OOM script executed
: You could, but before that I'd try to see what's using your memory and see : if you can decrease that. Maybe identify why you are running OOM now and : not with your previous Solr version (assuming you weren't, and that you are : running with the same JVM settings). A bigger heap usually means more work : to the GC and less memory available for the OS cache. FWIW: One of the bugs fixed in 6.0 was regarding the fact that the oom_killer wasn't being called properly on OOM -- so the fact that you are getting OOMErrors in 6.0 may not actually be a new thing, it may just be new that you are being made aware of them by the oom_killer https://issues.apache.org/jira/browse/SOLR-8145 That doesn't negate Tomás's excelent advice about trying to determine what is causing the OOM, but i wouldn't get too hung up on "what changed" between 5.x and 6.0 -- possibly nothing other then "now you know about it." : : Tomás : : On Sun, May 1, 2016 at 11:20 PM, Bastien Latard - MDPI AG < : lat...@mdpi.com.invalid> wrote: : : > Hi Guys, : > : > I got several times the OOM script executed since I upgraded to Solr6.0: : > : > $ cat solr_oom_killer-8983-2016-04-29_15_16_51.log : > Running OOM killer script for process 26044 for Solr on port 8983 : > : > Does it mean that I need to increase my JAVA Heap? : > Or should I do anything else? : > : > Here are some further logs: : > $ cat solr_gc_log_20160502_0730: : > } : > {Heap before GC invocations=1674 (full 91): : > par new generation total 1747648K, used 1747135K [0x0005c000, : > 0x00064000, 0x00064000) : > eden space 1398144K, 100% used [0x0005c000, 0x00061556, : > 0x00061556) : > from space 349504K, 99% used [0x00061556, 0x00062aa2fc30, : > 0x00062aab) : > to space 349504K, 0% used [0x00062aab, 0x00062aab, : > 0x00064000) : > concurrent mark-sweep generation total 6291456K, used 6291455K : > [0x00064000, 0x0007c000, 0x0007c000) : > Metaspace used 39845K, capacity 40346K, committed 40704K, reserved : > 1085440K : > class spaceused 4142K, capacity 4273K, committed 4368K, reserved : > 1048576K : > 2016-04-29T21:15:41.970+0200: 20356.359: [Full GC (Allocation Failure) : > 2016-04-29T21:15:41.970+0200: 20356.359: [CMS: : > 6291455K->6291456K(6291456K), 12.5694653 secs] : > 8038591K->8038590K(8039104K), [Metaspace: 39845K->39845K(1085440K)], : > 12.5695497 secs] [Times: user=12.57 sys=0.00, real=12.57 secs] : > : > : > Kind regards, : > Bastien : > : > : -Hoss http://www.lucidworks.com/
Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors
: Thanks, Nick. Do we know any suggested # for file descriptor limit with : Solr6? Also wondering why i haven't seen this problem before with Solr 5.x? are you running Solr6 on the exact same host OS that you were running Solr5 on? even if you are using the "same OS version" on a diff machine, that could explain the discrepency if you (or someone else) increased the file descriptor limit on the "old machine" but that neverh appened on the 'new machine" : On Wed, May 4, 2016 at 4:54 PM, Nick Vasilyev : wrote: : : > It looks like you have too many open files, try increasing the file : > descriptor limit. : > : > On Wed, May 4, 2016 at 3:48 PM, Susheel Kumar : > wrote: : > : > > Hello, : > > : > > I am trying to setup 2 node Solr cloud 6 cluster with ZK 3.4.8 and used : > the : > > install service to setup solr. : > > : > > After launching Solr Admin Panel on server1, it looses connections in few : > > seconds and then comes back and other node server2 is marked as Down in : > > cloud graph. After few seconds its loosing the connection and comes back. : > > : > > Any idea what may be going wrong? Has anyone used Solr 6 with ZK 3.4.8. : > > Have never seen this error before with solr 5.x with ZK 3.4.6. : > > : > > Below log from server1 & server2. The ZK has 3 nodes with chroot : > enabled. : > > : > > Thanks, : > > Susheel : > > : > > server1/solr.log : > > : > > : > > : > > : > > 2016-05-04 19:20:53.804 INFO (qtp1989972246-14) [ ] : > > o.a.s.c.c.ZkStateReader path=[/collections/collection1] : > > [configName]=[collection1] specified config exists in ZooKeeper : > > : > > 2016-05-04 19:20:53.806 INFO (qtp1989972246-14) [ ] : > o.a.s.s.HttpSolrCall : > > [admin] webapp=null path=/admin/collections : > > params={action=CLUSTERSTATUS&wt=json&_=1462389588125} status=0 QTime=25 : > > : > > 2016-05-04 19:20:53.859 INFO (qtp1989972246-19) [ ] : > > o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params : > > action=LIST&wt=json&_=1462389588125 and sendToOCPQueue=true : > > : > > 2016-05-04 19:20:53.861 INFO (qtp1989972246-19) [ ] : > o.a.s.s.HttpSolrCall : > > [admin] webapp=null path=/admin/collections : > > params={action=LIST&wt=json&_=1462389588125} status=0 QTime=2 : > > : > > 2016-05-04 19:20:57.520 INFO (qtp1989972246-13) [ ] : > o.a.s.s.HttpSolrCall : > > [admin] webapp=null path=/admin/cores : > > params={indexInfo=false&wt=json&_=1462389588124} status=0 QTime=0 : > > : > > 2016-05-04 19:20:57.546 INFO (qtp1989972246-15) [ ] : > o.a.s.s.HttpSolrCall : > > [admin] webapp=null path=/admin/info/system : > > params={wt=json&_=1462389588126} status=0 QTime=25 : > > : > > 2016-05-04 19:20:57.610 INFO (qtp1989972246-13) [ ] : > > o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params : > > action=LIST&wt=json&_=1462389588125 and sendToOCPQueue=true : > > : > > 2016-05-04 19:20:57.613 INFO (qtp1989972246-13) [ ] : > o.a.s.s.HttpSolrCall : > > [admin] webapp=null path=/admin/collections : > > params={action=LIST&wt=json&_=1462389588125} status=0 QTime=3 : > > : > > 2016-05-04 19:21:29.139 INFO (qtp1989972246-5980) [ ] : > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException) : > caught : > > when connecting to {}->http://server2:8983: Too many open files : > > : > > 2016-05-04 19:21:29.139 INFO (qtp1989972246-5983) [ ] : > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException) : > caught : > > when connecting to {}->http://server2:8983: Too many open files : > > : > > 2016-05-04 19:21:29.139 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException) : > caught : > > when connecting to {}->http://server2:8983: Too many open files : > > : > > 2016-05-04 19:21:29.141 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983 : > > : > > 2016-05-04 19:21:29.141 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException) : > caught : > > when connecting to {}->http://server2:8983: Too many open files : > > : > > 2016-05-04 19:21:29.142 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983 : > > : > > 2016-05-04 19:21:29.142 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException) : > caught : > > when connecting to {}->http://server2:8983: Too many open files : > > : > > 2016-05-04 19:21:29.142 INFO (qtp1989972246-5984) [ ] : > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983 : > > : > > 2016-05-04 19:21:29.140 INFO (qtp1989972246-5983) [ ] : > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983 : > > : > > 2016-05-04 19:21:29.140 INFO (qtp1989972246-5980) [ ] : > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983 : > > : > > 2016-05-04 19:21:29.143 INFO (qtp1989972246-598
Re: shareSchema property unknown in new solr.xml format
: > I’m getting this error on startup: : > : > section of solr.xml contains 1 unknown config parameter(s): [shareSchema] Pretty sure that's because it was never a supported property of the section -- even in the old format of solr.xml. it's just a top level property -- ie: create a child node for it directly under , outside of . Ah ... i see, this page is giving an incorrect example... https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format ...I'll fix that. -Hoss http://www.lucidworks.com/
Re: set the param [facet.offset] for EVERY [facet.pivot]
: HI All:I need a pagenigation with facet offset. : There are two or more fields in [facet.pivot], but only one value : for [facet.offset], eg: facet.offset=10&facet.pivot=field_1,field_2. : In this condition, field_2 is 10's offset and then field_1 is 10's : offset. But what I want is field_2 is 1's offset and field_1 is 10's : offset. How can I fix this problem or try another way to complete? As noted in the ref guide... https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.offsetParameter ...facet.offset supports per field overriding, just like like most (all?) facet options... facet.pivot=field_1,field_2 f.field_2.facet.offset=10 ...or using localparams (in case you are using field_2 in another facet.pivot param... facet.pivot={!key=pivot2}field_0,field_2 facet.pivot={!key=pivot1 f.field_2.facet.offset=10}field_1,field_2 -Hoss http://www.lucidworks.com/
Re: Multiple boost queries on a specific field
: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/ : My first results have provider A. : ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:B^1.5 : My first results have provider B. Good! : /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:(A^2.0 B^1.5)/ : Then my first results have provider B. It's not logical. Why is that not logical? If you provide us with the details from your schema about the provider field, and the debug=true output from your query showing the score explanations for the top doc of that query (and for the first "provider A" doc so we can compare) then we might be able to help explain why a "B" doc sows up before an "A" doc -- but you haven't provided near enough info for anyhitng other then a wild guess... https://wiki.apache.org/solr/UsingMailingLists ...my best wild guess is that it has to do with either the IDF of those two terms, or the lengthNorm of the "provider" field for the various docs. Most likely "bq" isn't even remotely what you want however, since it's an *additive* boost, and will be affected by the overall queryNorm of the query it's a part of -- so even if you get things dialled in just like you want them with a "*:*" query, you might find yourself with totlaly differnet results once you start using a "real" query. Assuming every document has at most 1 "provider" then what would probably work best for you is to use (edismax with) something like this... boost=max(prod(2.0, termfreq(provider,'A')), prod(1.5, termfreq(provider,'B')), prod(..., termfreq(provider,...)), ...) ...or if you want use edismax, then instead wrap the "boost" QParser arround your dismax query... q={!boost b=$boost v=$qq defType=dismax} qq=...whatever your normal dismax query is... ... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser https://cwiki.apache.org/confluence/display/solr/Function+Queries What that will give you (in either case) is a *multiplicitive* boost by each of those values depending on which of those terms exists in the provier field -- the "prod" function multiples each value by "1" if the corrisponding provider string is in the term once, or "0" if that provider isn't in the field (hence the assumption of "at most 1 provider") and then the max function just picks one. Depending on the specifics of your usecase, you could alterantive use sum(...) instead of max if some docs are from multiple providers, etc... But the details of *why* you are currently getting the results you are getting, and what you consider illogical about them, are a huge factor in giving you good advice to move forward. -Hoss http://www.lucidworks.com/
Re: custom function for multivalued fields
Thanks to the SortedSetDocValues this is in fact possible -- in fact i just uploaded a patch for SOLR-2522 that you can take a look at to get an idea of how to make it work (the main class you're probably going to want to look at is SortedSetSelector: you're going to want a similar "SortedDocValues proxy" class on top of SortedSetDocValues -- but instead of picking a single value, you want to pick your new synthetic value based on your custom function logic. https://issues.apache.org/jira/browse/SOLR-2522 : I have a requirement where i want to maintain a multivalued field. However, : at query time, i want to query on only one value we store in multivalued : field. That one value should be output of a custom function which should : execute on all values of multivalued field at query time. : Can we write such function and plug into solr. -Hoss http://www.lucidworks.com/
Re: SolrNet and deep pagination
: Has anyone worked with deep pagination using SolrNet? The SolrNet : version that I am using is v0.4.0.2002. I followed up with this article, : https://github.com/mausch/SolrNet/blob/master/Documentation/CursorMark.md : , however the version of SolrNet.dll does not expose the a StartOrCursor : property in the QueryOptions class. I don't know anything about SolrNet, but i do know that the URL you list above is for the documentation on the master branch. If i try to look at the the same document on the 0.4.x branch, that document doesn't exist -- suggesting the feature isn't supported in the version of SolrNet you are using... https://github.com/mausch/SolrNet/blob/0.4.x/Documentation/CursorMark.md https://github.com/mausch/SolrNet/tree/0.4.x/Documentation In fact, if i search the repo for "StartOrCursor" i see a file named "StartOrCursor.cs" exists on the master branch, but not on the 0.4.x branch... https://github.com/mausch/SolrNet/blob/master/SolrNet/StartOrCursor.cs https://github.com/mausch/SolrNet/blob/0.4.x/SolrNet/StartOrCursor.cs ...so it seems unlikely that this (class?) is supported in the release you are using. Note: according to the docs, there is a SolrNet google group where this question is probably the most appopriate: https://github.com/mausch/SolrNet/blob/master/Documentation/README.md https://groups.google.com/forum/#!forum/solrnet -Hoss http://www.lucidworks.com/
Re: date field in the schema causing a problem
: : : : Most documents have a correctly formatted date string and I would like to keep : that data available for search on the date field. ... : I realize it is complaining because the date string isn't matching the : data_driven_schema file. How can I coerce it into allowing the non-standard : date strings while still using the correctly formatted ones? If you want to preserve all of the data, and don't care about doing Date operations (ie: date range queries, date faceting, etc...) on the field, then you could always just define these fields to use a String based field type. If you want to only preserve the data that can be cleanly parsed as a Date, then one workarround would be probably be to configure something like this *after* the ParseDateFieldUpdateProcessorFactory... solr.TrieDateField .* true solr.TrieDateField ...that should work because the RegexReplaceProcessorFactor will only operate on _string_ values in the incoming docs -- if ParseDateFieldUpdateProcessorFactory has already been able to parse the string into a Date object, it will be ignored. If you want *both* (ie: to do Date specific operations on docs that can be parsed, but also know when docs provide other non-Date values in those fields) you'll need to use more then one field -- CloneFieldUpdateProcessor can handle that for you... https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html -Hoss http://www.lucidworks.com/
Re: Solr MLT with stream.body returns different results on each shard
: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly : indexed (I can also reproduce this issue on 4.10.0). When I use the Solr : MorelikeThisHandler with content stream I'm getting different results per : shard. I haven't looked at the code recently but i'm 99% certain that the MLT handler in general doesn't work with distributed (ie: sharded) queries. (unlike the MLT component and the recently added MLT qparser) I suspect that in the specific case of stream.body, what you are seeing is that the interesting terms are being computed relative the local tf/idf stats for that shard, and then only local results from that shard are being returned. : I also looked at using a standard MLT query, but I need to be able to : stream in a fairly large block of text for comparison that is not in the : index (different type of document). A standard MLT query Until/unless the MLT parser supports arbitrary text (there's some mention of this in SOLR-7639 but i'm not sure what the status of that is) you might find that just POSTing all of your text as a regular query (q) using dismax or edismax is suitable for your needs -- that's essentially the equivilent of what MLTHandler does with a stream.body, except it tries to only focus on "interesting terms" based on tf/idf, but if your fields are all configured with stopword files anyway, then the results and performance may be similar. -Hoss http://www.lucidworks.com/
Re: Exception while using {!cardinality=1.0}.
: > I am getting following exception for the query : : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The : > exception is not seen once the cardinality is set to 0.9 or less. : > The field is *docValues enabled* and *indexed=false*. The same exception : > I tried to reproduce on non docValues field but could not. Please help me : > resolve the issue. Hmmm... this is a weird error ... but you haven't really given us enough information to really guess what the root cause is - What was the datatype of the field(s)? - Did you have the exact same data in both fields? - Are these multivalued fields? - Did your "real" query actually compute stats on the same field you had done your main term query on? I know we have some tests of this bsaic siuation, and i tried to do ome more manual testing to spot check, but i can't reproduce. If you can please provide a full copy of the data (as csv o xml or whatever) to build your index along with all solr configs and the exact queries to reproduce that would really help get to the bottom of this -- if you can't provide all the data, then can you at least reproduce with a small set of sample data? either way: please file a new jira issue and attach as much detail as you can -- this URL has a lot of great tips on the types of data we need to be able to get to the bottom of bugs... https://wiki.apache.org/solr/UsingMailingLists : > ERROR - 2015-08-11 12:24:00.222; [core] : > org.apache.solr.common.SolrException; : > null:java.lang.ArrayIndexOutOfBoundsException: 3 : > at : > net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152) : > at : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247) : > at net.agkn.hll.HLL.toBytes(HLL.java:917) : > at net.agkn.hll.HLL.toBytes(HLL.java:869) : > at : > org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348) : > at : > org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151) : > at : > org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62) : > at : > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255) : > at : > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) : > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) : > at : > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) : > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) : > at : > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) : > at : > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) : > at : > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) : > at : > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) : > at : > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) : > at : > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) : > at : > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) : > at : > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) : > at : > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) : > at : > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) : > at : > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) : > at : > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) : > at : > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) : > at : > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) : > at : > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) : > at org.eclipse.jetty.server.Server.handle(Server.java:497) : > at : > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) : > at : > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) : > at : > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) : > at : > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) : > at : > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) : > at java.lang.Thread.run(Thread.java:745) : > : > Kindly let me know if I need to ask this on any of the related jira issue. : > : > Thanks, : > Modassar : > : -Hoss http://www.lucidworks.com/
Re: Query term matches
https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. : Message-ID: <55d0c940.4060...@tnstaafl.net> : Subject: Query term matches : References: : : : In-Reply-To: -Hoss http://www.lucidworks.com/
Re: pre-loaded function-query?
: My current expansion expands from the :user-query : to the :+user-query favouring-query-depending-other-params overall-favoring-query : (where the overall-favoring-query could be computed as a function). : With the boost parameter, i'd do: :(+user-query favouring-query-depending-other-params)^boost-function : : Not exactly the same or? w/o more specifics it's hard to be certain, but nothing you've described so far sounds like you really need custom code. just use things like the "boost" QParser in conjunction with other nested parsers. ie: instead of users sending q=user-query have hte user send qq=user-query and write your main query something like... ?q={!boost b=boost-function v=$x} &x=(+{!query v=$qq} {!query v=$favor}) &favor=favouring-query-depending-other-params &qq=user-query See also... https://people.apache.org/~hossman/ac2012eu/ http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630 -Hoss http://www.lucidworks.com/
Re: Exception while using {!cardinality=1.0}.
: - Did you have the exact same data in both fields? : Both the field are string type. that's not the question i asked. is the data *in* these fields (ie: the actual value of each field for each document) the same for both of the fields? This is important to figuring out if the root problem that having docValues (or not having docValues) causes a problem, or is the root problem that having certain kinds of *data* in a string field (regardless of docValues) can cause this problem. Skimming the sample code you posted to SOLR-7954 you are definitley putting differnet data into "field" then you put into "field1" so it's still not clear what the problem is. : - Did your "real" query actually compute stats on the same field you had : done your main term query on? : I did not get the question but as much I understood and verified in the : Solr log the stat is computed on the field given with : stats.field={!cardinality=1.0}field. the question is sepcific to the example query you mentioned before and again in your descripion in SOLR-7954. They show that the same field name you are computing stats on ("field") is also used in your main query as a constraint on the documents ("q=field:query") which is an odd and very special edge case that may be pertinant to the problem you are seeing. Depending on what data you index, that might easily only match 1 document -- in the case of the test code you put in jira, exactly 0 documents since you never index the text "query" into field "field" for any document) I haven't had a chance to review the jira in depth or actaully run your code with those configs -- but if you get a chance before i do, please re-review the code & configs you posted and see if you can reproduce using the *exact* same data in two different fields, and if the choice of query makes a differnce in the behavior you see. : : Regards, : Modassar : : On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather : wrote: : : > Ahmet/Chris! Thanks for your replies. : > : > Ahmet I think "net.agkn.hll.serialization" is used by hll() function : > implementation of Solr. : > : > Chris I will try to create sample data and create a jira ticket with : > details. : > : > Regards, : > Modassar : > : > : > On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter > wrote: : > : >> : >> : > I am getting following exception for the query : : >> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The : >> : > exception is not seen once the cardinality is set to 0.9 or less. : >> : > The field is *docValues enabled* and *indexed=false*. The same : >> exception : >> : > I tried to reproduce on non docValues field but could not. Please : >> help me : >> : > resolve the issue. : >> : >> Hmmm... this is a weird error ... but you haven't really given us enough : >> information to really guess what the root cause is : >> : >> - What was the datatype of the field(s)? : >> - Did you have the exact same data in both fields? : >> - Are these multivalued fields? : >> - Did your "real" query actually compute stats on the same field you had : >> done your main term query on? : >> : >> I know we have some tests of this bsaic siuation, and i tried to do ome : >> more manual testing to spot check, but i can't reproduce. : >> : >> If you can please provide a full copy of the data (as csv o xml or : >> whatever) to build your index along with all solr configs and the exact : >> queries to reproduce that would really help get to the bottom of this -- : >> if you can't provide all the data, then can you at least reproduce with a : >> small set of sample data? : >> : >> either way: please file a new jira issue and attach as much detail as you : >> can -- this URL has a lot of great tips on the types of data we need to be : >> able to get to the bottom of bugs... : >> : >> https://wiki.apache.org/solr/UsingMailingLists : >> : >> : >> : >> : >> : >> : > ERROR - 2015-08-11 12:24:00.222; [core] : >> : > org.apache.solr.common.SolrException; : >> : > null:java.lang.ArrayIndexOutOfBoundsException: 3 : >> : > at : >> : > : >> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152) : >> : > at : >> : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247) : >> : > at net.agkn.hll.HLL.toBytes(HLL.java:917) : >> : > at net.agkn.hll.HLL.toBytes(HLL.java:869) : >>
Re: Exception while using {!cardinality=1.0}.
: Can you please explain how having the same field for query and stat can : cause some issue for my better understanding of this feature? I don't know if it can, it probably shouldn't, but in terms of trying ot udnerstand the bug and reproduce it, any pertinant facts may be relivant - particularly the unusual ones. if no one else has ever seen a bug in X, but you were doing something unusual with X, and you get a bug 100% of the time, then that suggests that your unusual usecase would be a very important place to start looking -- so when you posted an example that looks weird nad unusual and unlike any typical usecase of field stats, i wanted ot understand what exactly you were doing and how much of that example was "real" and how much was just you munging your "real" query to hide something yo udidn't wnat to share. -Hoss http://www.lucidworks.com/
Re: Solr relevancy score order
: A follow up question. Is the sub-sorting on the lucene internal doc IDs : ascending or descending order? That is, do the most recently index doc you can not make any generic assumptions baout hte order of the internal lucene doc IDS -- the secondary sort on the internal IDs is stable (and FWIW: ascending) for static indexes, but as mentioned before: the *actual* order hte the IDS changes as the index changes -- if there is an index merge, the ids can be totally different and docs can be re-arranged into a diff order... : > However, internal Lucene Ids can change when index changes. (merges, : > updates etc). ... : show up first in this set of docs that have tied score? If not, who can I : have the most recent be first? Do I have to sort on lucene's internal doc add a "timestamp" or "counter" field when you index your documents that means whatevery you want it to mean (order added, order updated, order according to some external sort criteria from some external system) and then do an explicit sort on that. -Hoss http://www.lucidworks.com/
Re: Unknown query parser 'terms' with TermsComponent defined
1) The "terms" Query Parser (TermsQParser) has nothing to do with the "TermsComponent" (the first is for quering many distinct terms, the later is for requesting info about low level terms in your index) https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser https://cwiki.apache.org/confluence/display/solr/The+Terms+Component 2) TermsQParser (which is what you are trying to use with the "{!terms..." query syntax) was not added to Solr until 4.10 3) based on your example query, i'm pretty sure what you want is the TermQParser: "term" (singular, no "s") ... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser {!term f=id}ft849m81z : We've encountered a strange situation, I'm hoping someone might be able to : shed some light. We're using Solr 4.9 deployed in Tomcat 7. ... : 'q'=>'_query_:"{!raw f=has_model_ssim}Batch" AND ({!terms f=id}ft849m81z)', ... : 'msg'=>'Unknown query parser \'terms\'', : 'code'=>400}} ... : The terms component is defined in solrconfig.xml: : : -Hoss http://www.lucidworks.com/
Re: how to prevent uuid-field changing in /update query?
: updates? i can't do this because i have delta-import queries which also : should be able to assign uuid when it is needed You really need to give us a full and complete picture of what exactly you are currently doing, what's working, what's not working, and when it's not working what is it doing and how is that differnet from what you expect. example: you mentioned you have "requesthandler with name "/update" which contains uuid update срфшт" (presumably you mean the processor) but you haven't shown us your configs, or any of your logs, so we can see how exactly it's configured, or if/how it's being used. If UUIDUpdateProcessorFactory is in place, then it should only generate a new UUID if the document doesn't already have one -- if you are using DIH to add documents to the index, and the uuid you are using/generating isn't also the uniqueKey field, then the UUIDUpdateProcessorFactory doens't have any way of magically knowing when a "new" document is actually a replacement for an old document. (If you are using Atomic Updates, then registering UUIDUpdateProcessorFactory *after* the DistributedUpdateProcessorFactory can help -- but that doesn't sound like it's relevant if you are using DIH detla updates) Please review this page and give us *all* the details about your current setup, your goal, and the specific problem you are facing... https://wiki.apache.org/solr/UsingMailingLists -Hoss http://www.lucidworks.com/
Re: find documents based on specific term frequency
: "Is there a way to search for documents that have a word appearing more : than a certain number of times? For example, I want to find documents : that only have more than 10 instances of the word "genetics" …" Try... q=text:genetics&fq={!frange+incl=false+l=10}termfreq('text','genetics') Note: the q=text:genetics isn't neccessary -- you could do any query and then filter on the numeric function range of the termfreq() function, or use that {!frange} as your main query (in which case all matchin docs will have identical scores). i just included that in the example to show how you can search & sort by the "normal" style scoring (which takes into account full TF-IDF and length normalization) while filtering on the TF using a function query. You can also request the termfreq() as a psuedo field for each doc in the the results, and parameterize the details to eliminate redundency in the request params... ...&fq={!frange+incl=false+l=10+v=$tf}&fl=*,$tf&tf=termfreq('text','genetics') Is the same as... ...&fq={!frange+incl=false+l=10}termfreq('text','genetics')&fl=*,termfreq('text','genetics') A big caveat to this however is that the termfreq function operates on the *RAW* underlying term values -- no query time analyzer is used -- so if you do stemming, or lowercasing in your index analyzer, you have to pass the stemmed/lowercased values to the function (Although i just filed SOLR-7981 since it occurs to me we can make this automatic in the future with a new function argument) https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser https://cwiki.apache.org/confluence/display/solr/Function+Queries -Hoss http://www.lucidworks.com/
Re: "no default request handler is registered"
Thats... strange. Looking at hte code it appears to be a totally bogus and missleading warning -- but it also shouldn't affect anything. You can feel free to ignore it for now... https://issues.apache.org/jira/browse/SOLR-7984 : Date: Thu, 27 Aug 2015 15:10:18 -0400 : From: Scott Hollenbeck : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: "no default request handler is registered" : : I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of : the Apache Solr Search module for Drupal. Things seem to be working fine, : except that this warning message appears in the Solr admin logging window : and in the server log: : : "no default request handler is registered (either '/select' or 'standard')" : : Looking at the solrconfig.xml file that comes with the Drupal module I see a : requestHandler named "standard": : : : :content :explicit :true : : : : I also see a handler named pinkPony with a "default" attribute set to : "true": : : : : : edismax : content : explicit : true : 0.01 : : ${solr.pinkPony.timeAllowed:-1} : *:* : : : false : : true : false : : 1 : : : spellcheck : elevator : : : : So it seems like there are both standard and default requestHandlers : specified. Why is the warning produced? What am I missing? : : Thank you, : Scott : : -Hoss http://www.lucidworks.com/
Re: "no default request handler is registered"
I just want to clarify: all of Shawn's points below are valid and good -- but they stll don't explain the warning messgae you are getting. it makes no sense as the code is currently written, and doesn't do anything to help encourage people to transition to path based handler names. : Date: Thu, 27 Aug 2015 13:50:51 -0600 : From: Shawn Heisey : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: "no default request handler is registered" : : On 8/27/2015 1:10 PM, Scott Hollenbeck wrote: : > I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of : > the Apache Solr Search module for Drupal. Things seem to be working fine, : > except that this warning message appears in the Solr admin logging window : > and in the server log: : > : > "no default request handler is registered (either '/select' or 'standard')" : > : > Looking at the solrconfig.xml file that comes with the Drupal module I see a : > requestHandler named "standard": : > : > : > : >content : >explicit : >true : > : > : > : > I also see a handler named pinkPony with a "default" attribute set to : > "true": : : : : > So it seems like there are both standard and default requestHandlers : > specified. Why is the warning produced? What am I missing? : : I think the warning message may be misworded, or logged in incorrect : circumstances, and might need some attention. : : The solrconfig.xml that you are using (which I assume came from the : Drupal project) is geared towards a 3.x version of Solr prior to 3.6.x : (the last minor version in the 3.x line). : : Starting in the 3.6 version, all request handlers in examples have names : that start with a forward slash, like "/select", none of them have the : "default" attribute, and the handleSelect parameter found elsewhere in : the solrconfig.xml is false. : : You should bring this up with the Drupal folks and ask them to upgrade : their config/schema and their code for modern versions of Solr. Solr : 3.6.0 (which deprecated their handler naming convention and the : "default" attribute) was released over three years ago. : : More info than you probably wanted to know: The reason this change was : made is security-related. With the old way of naming request handlers : and handling /select indirectly, you could send a query to /select, : include a qt=/update parameter, and change the index via a handler : intended only for queries. : : Thanks, : Shawn : : -Hoss http://www.lucidworks.com/
Re: Sorting by function
: I have a "country" field in my index, with values like 'US', 'FR', 'UK', : etc... : : Then I want our users to be able to define the order of their preferred : countries so that grouped results are sorted according to their preference. ... : Is there any other function that would allow me to map from a predefined : String constant into an Integer that I can sort on ? Because of how they evolved, and most of the common usecases for them, there aren't a lot of functions that operate on "strings". Assuming your "country" field is a single valued (indexed) string field, then what you want can be done fairly simply using the the "termfreq()" function. termfreq(country,US) will return the (raw integer) term frequency for "Term(country,US)" for each doc -- assuming it's single valued (and not tokenized) that means for every doc it will be either a 0 or a 1. so you can either modify your earlier attempt at using "map" on the string values to do a map over the termfreq output, or you can simplify things to just multiply take the max value -- where max is just a short hand for "the non 0 value" ... max(mul(9,termfreq(country,US)), mul(8,termfreq(country,FR)), mul(7,termfreq(country,UK)), ...) Things get more interesting/complicated if the field isn't single valued, or is tokenized -- then individual values (like "US") might have a termfreq that is greater then 1, or a doc might have more then one value, and you have to decide what kind of math operation you want to apply over those... * ignore termfreqs and ony look at if term exists? - wrap each termfreq in map to force value to either 0 or 1 * want to sort by sum of (weights * termfreq) for each term? - change max to sum in above example * ignore all but the "main" term that has hte highest freq for each doc? - not easy at query time - best to figure out the "main" term at index time and put in it's own field. -Hoss http://www.lucidworks.com/
Re: which solrconfig.xml
: various $HOME/solr-5.3.0 subdirectories. The documents/tutorials say to edit : the solrconfig.xml file for various configuration details, but they never say : which one of these dozen to edit. Moreover, I cannot determine which version can you please give us a specific examples (ie: urls, page numbers & version of the ref guide, etc...) of documentation that tell you to edit the solrconfig.xml w/o being explicit about where to to find it so that we can fix the docs? FWIW: The official "Quick Start" tutorial does not mention editing solrconfig.xml at all... http://lucene.apache.org/solr/quickstart.html -Hoss http://www.lucidworks.com/
Re: Local Params for Stats field
: I'm trying to use localparams for stats component on Solr 4.4, exact query: : q=*:*&core=hotel_reviews&collection=hotel_reviews&fq=checkout_date:[* TO : *]&fq={!tag=period1}checkout_date:[2011-12-25T00:00:00.000Z TO : 2012-01-02T00:00:00.000Z}&fq={!tag=period2}checkout_date:[2011-12-25T00:00:00.000Z : TO : 2012-01-02T00:00:00.000Z}&rows=0&stats=true&stats.field={!ex=period2}checkout_date : : and it fails with error "unknown field" checkout_date. : Should localparams for stats field be supported for v. 4.4? : If I run same query for v.4.8 -- it returns result w/o error what is the exact error message you get? specifically what shows up in your logs (with stack trace) so we can understand what piece of code is complaining about hte "unknown field" (you are asking here about stats component, but you are using "checkout_date" in several places in your query, we have no way of knowing for sure if the problem is coming from stats -- you haven't given us any examples of queries that *do* work (or details about how your checkout_date field is defined) https://wiki.apache.org/solr/UsingMailingLists are you absolutely certain this collection has a checkout_date in your 4.4 solr instance? -Hoss http://www.lucidworks.com/
Re: Position of Document in Listing (Search Result)
: Write a PostFilter which takes in a document id. It lets through all : documents until it sees that document id. Once it sees it, it stops : letting them through. : : Thus, the total count of documents would be the position of your queried : car. Sorry guys, that won't work. PostFilter's can be used to collect & filter the documents returned as the result of a query, after the main query logic (so you can delay expensive filter checks) but they still happen before any sorting -- they have to in order to in order for the sorting logic to know *whic* documents should be added to the priority queue. - - - I can only think of two appraoches to this general problem: 1) 2 queries with frange filter on score. this solution is only applicable in situations where: a) you are only sorting on scores b) the position information can be aproximate as far as other docs with identical scores (ie: you can say "X documents have a higher score" instead of "exactly X documents come before this one") The key is to first do a query on whever where you filter (fq) on the doc id(s) you are interested in so you can get them back along with the scores, then you do another query where you do something like... ?rows=0&q=whatever&fq={!frange incl=false l=THE_SCORE v=$q} ...so that you post filter and ignore any doc that doesn't have a higher score and look at hte total numFound. if there are multiple docs you need to get info about at one time, instead of filtering you can use facet.query the same way rows=0 q=whatever facet=true facet.query={!frange key=doc1 incl=false l=DOC1_SCORE v=$q} facet.query={!frange key=doc2 incl=false l=DOC2_SCORE v=$q} facet.query={!frange key=doc3 incl=false l=DOC3_SCORE v=$q} ...etc... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser 2) cursor deep paging this solution will work regardless of the number of docs you are interested in, and regardless of how complex your sort options are -- just use the cursorMark param to iterate over all the results in your client until you've found all the unqiueKeys you are looking for, counting the docs found as you go. The various docs on deep paging and using cursors go into some background hwich may help you understand why what you are asking for in general is a hard problem, and why suggestion #1 only works with a simple sort on score, and for anything more complex you really have to go the cursor route... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ -Hoss http://www.lucidworks.com/
Re: Why is Process Total Time greater than Elapsed Time?
depends on where you are reading "Process Total Time" from. that terminology isn't something i've ever sen used in the context of solr (fairly certain nothing in solr refers to anything that way) QTime is the amount of time spent processing a request before it starts being written out over the wire to the client, so it is almost garunteed to be *less* then the total elapsed (wall clock) time witnessed by your solrJ client ... but i have no idea what "Process Total Time" is if you are seeing it greater then wall clock. : From what I can tell, each component processes the request sequentially. So : how can I see an Elapsed Time of 750ms (SolrJ client) and a Process Total : Time of 1300ms? Does the Process Total Time add up the amount of time each : leaf reader takes, or some other concurrent things? -Hoss http://www.lucidworks.com/
RE: Trouble making tests with BaseDistributedSearchTestCase
: Strange enough, the following code gives different errors: : : assertQ( I'm not sure what exactly assertQ will do in a distributed test like this ... probably nothing good. you'll almost certainly want to stick with the distributed indexDoc() and query* methods and avoid assertU and assertQ : [TEST-TestComponent.test-seed#[EA2ED1E118114486]] ERROR org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: xpath=//result/doc[1]/str[@name='id'][.='1'] : xml response was: ... : : ...i'm guessing that's because assertQ is (probably) querying the "local" core from the TestHarness, not any of the distributed cores setup by BaseDistributedSearchTestCase and your docs didn't get indexed there. : And, when i forcfully add distrib=true, i get a NPE in SearchHandler! which is probably becaues since you (manually) added the debug param, but didn't add a list of shards to query, you triggered some slopy code in SearchHandler that should be giving you a nice error about shards not being specified. (i bet you can manually reproduce this in a single-node solr setup by adding distrib=true to any query thta doesn't have a "shards" param, if so please file a bug that it should produce a sane error message) if you use something like BaseDistributedSearchTestCase.query on the other hand, it takes care of adding hte correct distrib related request params for the shards it creates under the covers. (allthough at this point, in general, i would strongly suggest that you instead consider using AbstractFullDistribZkTestBase instead of BaseDistributedSearchTestCase -- assuming of course that your goal is good tests of how some distributed queries behave in a modern solr cloud setup. if your goal is to test solr under manual sharding/distributed queries, BaseDistributedSearchTestCase still makes sense.) As to your first question (which applies to both old school and cloud/zk related tests)... : > Executing the above text either results in a: IOException occured when talking to server at: https://127.0.0.1:44761//collection1 That might be due ot a timing issue of the servers not completley starting up before you start sending requests to them? not really sure ... would need to see the logs. : > Or it fails with a curous error: .response.maxScore:1.0!=null : > : > The score correctly changes according to whatever value i set for parameter q. that has to do with teh way the BaseDistributedSearchTestCase plumbing tries to help ensure that a distribute query returns the same results as a single shard query by "diffing" the responses (note: this is why BaseDistributedSearchTestCase.indexDoc adds your doc to both a random shard *and* to a "control collection"). But there are some legacy quirks about how things like "maxScore" are handled: notably SOLR-6612 (historically, because of the possibility of filter optimizations, solr only kept track of the scores if it needed to. in single core, this was if you asked for "fl=score,..." but in a distributed query it might also compute scores (and maxScore) if you are sorting on scores (which is the default) they way to indicate that you don't want BaseDistributedSearchTestCase's response diff checking to freak out over the max score is using the (horribly undocumented) "handle" feature... handle.put("maxScore", SKIPVAL); ...that's not the default in all tests because it could hide errors in situations where tests *are* expecting the maxScore to be the same. the same mechanism can be used to ignore things like the _version_ field, or timestamp fields which are virtually garunteed not to be the same between two differnet collections. (see uses of the "handle" Map in existing test cases for examples). -Hoss http://www.lucidworks.com/
Re: how to parse json response from Solr Term Vector Component (5.3.0)
: : how to parse json response from Solr Term Vector Component? : : I got following json structure from response when testing Solr 5.3.0 : tvComponent: ... : Is it correct ? Why solr makes the json response for term vector : information so difficult to extract from the client side ? why it use list : to encode rather than dictionary? What you're seeing has to do with how the general purpose datastructures used in the response are serialized into JSON. By default, solr's "NamedLIst" datastructure (which can support hte same key associated with multiple values) is modeled in JSON as a list of alternativing key value pairs for simplicity, but you can add a "json.nl=map" to force these to be a Map (in which case your parsing code has to decide what to do if/when a key is specified multiple times) or "json.nl=arrarr" (for an array of array pairs) http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters -Hoss http://www.lucidworks.com/
Re: firstSearcher cache warming with own QuerySenderListener
You haven't really provided us enough info to make any meaningful suggestions. You've got at least 2 custom plugins -- but you don't give us any idea what the implementations of those plugins look like, or how you've configured them. Maybe there is a bug in your code? maybe it's misconfigured? You said that initial queries seem a little faster when you use your custom plugin(s) but not as fast as if you manual warm those queries from a browser first -- what do the queries look like? how fast is fast? ... w/o specifics it's impossible to guess where the added time (or added time savings when using hte browser to warm them) may be coming from ... and again: maybe the issue is that the code in your custom only is only partially right? maybe it's giving you a slight bit of warming just by executing a query to get some index data strucutres into ram, but it's actaully executing the wrong query? Show us the details single query, and tell us how *exactly* does the timing comapare between: no warming; warming just that query with your custom plugin; warming just that query from your browser? show us the *logs* from solr in all of those cases as well so we can see what is actaully getting executed under the hood. As far as caching goes: all of the cache statistics are easily available from the plugin UI / handler... https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604180 https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler what do you see in terms of insertions/hits/misses on all of the caches in each of the above scenerios? : Date: Fri, 25 Sep 2015 17:31:30 +0200 : From: Christian Reuschling : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" : Subject: firstSearcher cache warming with own QuerySenderListener : : Hey all, : : we want to avoid cold start performance issues when the caches are cleared after a server restart. : : For this, we have written a SearchComponent that saves least recently used queries. These are : written to a file inside a closeHook of a SolrCoreAware at server shutdown. : : The plan is to perform these queries at server startup to warm up the caches. For this, we have : written a derivative of the QuerySenderListener and configured it as firstSearcher listener in : solrconfig.xml. The only difference to the origin QuerySenderListener is that it gets it's queries : from the formerly dumped lru queries rather than getting them from the config file. : : It seems that everything is called correctly, and we have the impression that the query response : times for the dumped queries are sometimes slightly better than without this warming. : : Nevertheless, there is still a huge difference against the times when we manually perform the same : queries once, e.g. from a browser. If we do this, the second time we perform these queries they : respond much faster (up to 10 times) than the response times after the implemented warming. : : It seems that not all caches are warmed up during our warming. And because of these huge : differences, I doubt we missed something. : : The index has about 25M documents, and is splitted into two shards in a cloud configuration, both : shards are on the same server instance for now, for testing purposes. : : Does anybody have an idea? I tried to disable lazy field loading as a potential issue, but with no : success. : : : Cheers, : : Christian : : -Hoss http://www.lucidworks.com/
Re: How can I get a monotonically increasing field value for docs?
You're basically re-implementing Solr' cursors. you can change your system of reading docs from the old collection to use... cursorMark=*&sort=timestamp+asc,id+asc ...and then instead of keeping track of the last timestamp & id values and constructing a filter, you can just keep track of the nextCursorMark and pass it the next time you want to check for newer documents... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results : Date: Mon, 21 Sep 2015 21:32:33 +0300 : From: Gili Nachum : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: How can I get a monotonically increasing field value for docs? : : Thanks for the indepth explanation! : : The secondary sort by uuid would allow me to read a series of docs with : identical time over multiple batches by specifying filtering : time>timeOnLastReadDoc or (time=timeOnLastReadDoc and : uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to : track progress over. : On Sep 21, 2015 19:56, "Shawn Heisey" wrote: : : > On 9/21/2015 9:01 AM, Gili Nachum wrote: : > > TimestampUpdateProcessorFactory takes place only on the leader shard, or : > on : > > each shard replica? : > > if on each replica then I would get different values on each replica. : > > : > > My alternative would be to perform secondary sort on a UUID to ensure : > order. : > : > If the update chain is configured properly, it runs on the leader, so : > all replicas get the same timestamp. : > : > Without SolrCloud, the way to create an "indexed at" time field is in : > the schema -- specify a default value of NOW on the field definition and : > don't send the field when indexing. The old master/slave replication : > copies the actual index contents, so the indexed values in all replicas : > are the same. : > : > The problem with NOW in the schema when running SolrCloud is that each : > replica indexes the document independently, so each replica can have a : > different timestamp. This is why the timestamp update processor exists : > -- to set the timestamp to a specific value before the document is : > duplicated to each replica, eliminating the problem. : > : > FYI, secondary sort parameters affect the order when the primary sort : > field is identical between two documents. It may not do what you are : > intending because of that. : > : > Thanks, : > Shawn : > : > : -Hoss http://www.lucidworks.com/