multivalued coordinate for geospatial search

2016-10-12 Thread Chris Chris
Hello solr users!

I am trying to use geospatial to do some basic distance search in Solr4.10

At the moment, I got it working if I have just on set of coordinate
(latitude,longitude) per document.

However, I need to get it to work when I have an unknown numbers of set of
coordinates per document: the document should be returned if any of its
coordinates is within the distance threshold of a given coordinate.


Below is how it is working when I have just one set of coordinate per
document.









The reason why I am using the copyField is because the latitude and
longitude are provided in separate fields, not in the "lat,lon" format.
So far, all my attempts to use multivalued failed, and I would greatly
appreciate some help.

Thanks!

Chris


Conditional atomic update

2016-05-18 Thread chris



(Resending because DMARC-compliant ESPs bounced the previous version)
�

I'm looking for a way to do an atomic update, but if a certain 
condition exists on the existing document, abort the update.

�

Each document has the fields id, count, and value.� The source data has 
just id and value.

�

When the source data is indexed, I use atomic updates to:

- Increment the count value in the existing document

- Add the source value to the existing document's value

�

What I'd like to do is abort the update if the existing document has a 
count of 5.� Is there a way to do this with a custom update processor?



JSON facet API: exclusive lower bound, inclusive upper bound

2017-06-16 Thread chris



The docs for the JSON facet API tell us that the default ranges are inclusive 
of the lower bounds and exclusive of the upper bounds. �I'd like to do the 
opposite (exclusive lower, inclusive upper), but I can't figure out how to 
combine the 'include' parameters to make it
work.
�
�

Limiting by range of sum across documents

2017-11-12 Thread chris



I have documents in solr that look like this:
{
� "id": "acme-1",
� "manufacturer": "acme",
� "product_name": "Foo",
� "price": 3.4
}
�
There are about
150,000 manufacturers, each of which have between 20,000 and 1,000,000 
products.��
I'd like to return the sum of all prices that are in the range [100, 200], 
faceted by manufacturer.� In other words, for each manufacturer, sum the prices 
of all products for that manufacturer,
and return the sum and the manufacturer name.� For example:
[
� {
� � "manufacturer": "acme",
� � "sum": 150.5
� },
� {
� � "manufacturer": "Johnson,
Inc.",
� � "sum": 167.0
� },
...
]
�
I tried this:
q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
which "works" on a test
subset of 1,000 manufacturers.� However, there are two problems:
1) This query returns all the manufacturers, so I have to iterate over the 
entire response object to extract the ones I want.
2) The query on the whole data set takes more than 600 seconds to return, which 
doesn't fit
our target response time
�
How can I perform this query?
We're using solr version 5.5.5.
�

�
Thanks,
Chris
�


Re: Limiting by range of sum across documents

2017-11-13 Thread chris



�
Hi Emir,
I can't apply filters to the original query because I don't know in advance 
which filters will meet the criterion I'm looking for.� Unless I'm missing 
something obvious.��
�
I tried the JSON facet you suggested but received

"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

� },

� "facet_counts":{

� � "facet_queries":{},

� � "facet_fields":{},

� � "facet_dates":{},

� � "facet_ranges":{},

� � "facet_intervals":{},

� � "facet_heatmaps":{}},

� "facets":{

� � "count":0}}

�

�


> Hi Chris,

> You mention it returns all manufacturers? Even after you apply filters 
> (don’t see filter in your example)? You can control how many facets are 
> returned with facet.limit and you can use face.pivot.mincount to determine 
> how many facets are returned. If you calculate sum on all
manufacturers, it can last.
>

> Maybe you can try json faceting. Something like (url style):

>

> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”

>

> HTH,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote:

>>

>>

>>

>>

>> I have documents in solr that look like this:

>> {

>> "id": "acme-1",

>> "manufacturer": "acme",

>> "product_name": "Foo",

>> "price": 3.4

>> }

>>

>> There are about

>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 
>> products.

>> I'd like to return the sum of all prices that are in the range [100, 200], 
>> faceted by manufacturer. In other words, for each manufacturer, sum the 
>> prices of all products for that manufacturer,

>> and return the sum and the manufacturer name. For example:

>> [

>> {

>> "manufacturer": "acme",

>> "sum": 150.5

>> },

>> {

>> "manufacturer": "Johnson,

>> Inc.",

>> "sum": 167.0

>> },

>> ...

>> ]

>>

>> I tried this:

>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
>> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>> which "works" on a test

>> subset of 1,000 manufacturers. However, there are two problems:

>> 1) This query returns all the manufacturers, so I have to iterate over the 
>> entire response object to extract the ones I want.

>> 2) The query on the whole data set takes more than 600 seconds to return, 
>> which doesn't fit

>> our target response time

>>

>> How can I perform this query?

>> We're using solr version 5.5.5.

>>

>>

>>

>> Thanks,

>> Chris

>>

>

>


Re: Limiting by range of sum across documents

2017-11-14 Thread chris



I'm not looking for products where the price is in the range [100, 200].
I'm looking for manufacturers for which the sum of the prices of all of their 
products is in the range [100, 200].
�


> Hi Chris,

>

> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on 
> wanted products.

>

> Can you share full json faceting request - numFound:0 suggest that something 
> is completely wrong.

>

> Thanks,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 13 Nov 2017, at 21:56, ch...@yeeplusplus.com wrote:

>>

>>

>>

>>

>> �

>> Hi Emir,

>> I can't apply filters to the original query because I don't know in advance 
>> which filters will meet the criterion I'm looking for.� Unless I'm missing 
>> something obvious.�

>> �

>> I tried the JSON facet you suggested but received

>>

>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

>>

>> � },

>>

>> � "facet_counts":{

>>

>> � � "facet_queries":{},

>>

>> � � "facet_fields":{},

>>

>> � � "facet_dates":{},

>>

>> � � "facet_ranges":{},

>>

>> � � "facet_intervals":{},

>>

>> � � "facet_heatmaps":{}},

>>

>> � "facets":{

>>

>> � � "count":0}}

>>

>> �

>>

>> �

>>

>>

>>> Hi Chris,

>>

>>> You mention it returns all manufacturers? Even after you apply filters 
>>> (don’t see filter in your example)? You can control how many facets 
>>> are returned with facet.limit and you can use face.pivot.mincount to 
>>> determine how many facets are returned. If you calculate sum on
all
>> manufacturers, it can last.

>>>

>>

>>> Maybe you can try json faceting. Something like (url style):

>>

>>>

>>

>>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”

>>

>>>

>>

>>> HTH,

>>

>>> Emir

>>

>>> --

>>

>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>>

>>>

>>

>>>

>>

>>>

>>

>>>> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote:

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> I have documents in solr that look like this:

>>

>>>> {

>>

>>>> "id": "acme-1",

>>

>>>> "manufacturer": "acme",

>>

>>>> "product_name": "Foo",

>>

>>>> "price": 3.4

>>

>>>> }

>>

>>>>

>>

>>>> There are about

>>

>>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 
>>>> products.

>>

>>>> I'd like to return the sum of all prices that are in the range [100, 200], 
>>>> faceted by manufacturer. In other words, for each manufacturer, sum the 
>>>> prices of all products for that manufacturer,

>>

>>>> and return the sum and the manufacturer name. For example:

>>

>>>> [

>>

>>>> {

>>

>>>> "manufacturer": "acme",

>>

>>>> "sum": 150.5

>>

>>>> },

>>

>>>> {

>>

>>>> "manufacturer": "Johnson,

>>

>>>> Inc.",

>>

>>>> "sum": 167.0

>>

>>>> },

>>

>>>> ...

>>

>>>> ]

>>

>>>>

>>

>>>> I tried this:

>>

>>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
>>>> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>>

>>>> which "works" on a test

>>

>>>> subset of 1,000 manufacturers. However, there are two problems:

>>

>>>> 1) This query returns all the manufacturers, so I have to iterate over the 
>>>> entire response object to extract the ones I want.

>>

>>>> 2) The query on the whole data set takes more than 600 seconds to return, 
>>>> which doesn't fit

>>

>>>> our target response time

>>

>>>>

>>

>>>> How can I perform this query?

>>

>>>> We're using solr version 5.5.5.

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> Thanks,

>>

>>>> Chris

>>

>>>>

>>

>>>

>>

>>>

>

>


Re: Limiting by range of sum across documents

2017-11-15 Thread chris



Emir,
It certainly seems like I'll need to use streaming expressions.
Thanks for your help!
Chris


> Hi Chris,

> I misunderstood your requirement. I am not aware of some facet result 
> filtering feature. What you could do is sort facet results by sum and load 
> page by page but that does not sound like a good solution. Did you try using 
> streaming expressions - I don’t have much experience with this
feature so would have to play a bit before giving answer if possible and how to 
do it, but I guess someone will be able to give some pointers.
>

> Thanks,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 14 Nov 2017, at 16:51, ch...@yeeplusplus.com wrote:

>>

>>

>>

>>

>> I'm not looking for products where the price is in the range [100, 200].

>> I'm looking for manufacturers for which the sum of the prices of all of 
>> their products is in the range [100, 200].

>> �

>>

>>

>>> Hi Chris,

>>

>>>

>>

>>> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on 
>>> wanted products.

>>

>>>

>>

>>> Can you share full json faceting request - numFound:0 suggest that 
>>> something is completely wrong.

>>

>>>

>>

>>> Thanks,

>>

>>> Emir

>>

>>> --

>>

>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>>

>>>

>>

>>>

>>

>>>

>>

>>>> On 13 Nov 2017, at 21:56, ch...@yeeplusplus.com wrote:

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> �

>>

>>>> Hi Emir,

>>

>>>> I can't apply filters to the original query because I don't know in 
>>>> advance which filters will meet the criterion I'm looking for.� Unless I'm 
>>>> missing something obvious.�

>>

>>>> �

>>

>>>> I tried the JSON facet you suggested but received

>>

>>>>

>>

>>>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

>>

>>>>

>>

>>>> � },

>>

>>>>

>>

>>>> � "facet_counts":{

>>

>>>>

>>

>>>> � � "facet_queries":{},

>>

>>>>

>>

>>>> � � "facet_fields":{},

>>

>>>>

>>

>>>> � � "facet_dates":{},

>>

>>>>

>>

>>>> � � "facet_ranges":{},

>>

>>>>

>>

>>>> � � "facet_intervals":{},

>>

>>>>

>>

>>>> � � "facet_heatmaps":{}},

>>

>>>>

>>

>>>> � "facets":{

>>

>>>>

>>

>>>> � � "count":0}}

>>

>>>>

>>

>>>> �

>>

>>>>

>>

>>>> �

>>

>>>>

>>

>>>>

>>

>>>>> Hi Chris,

>>

>>>>

>>

>>>>> You mention it returns all manufacturers? Even after you apply filters 
>>>>> (don’t see filter in your example)? You can control how many facets 
>>>>> are returned with facet.limit and you can use face.pivot.mincount to 
>>>>> determine how many facets are returned. If you calculate sum
on
>> all

>>>> manufacturers, it can last.

>>

>>>>>

>>

>>>>

>>

>>>>> Maybe you can try json faceting. Something like (url style):

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>> HTH,

>>

>>>>

>>

>>>>> Emir

>>

>>>>

>>

>>>>> --

>>

>>>>

>>

>>>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>>>

>>

>>>>> Solr & Elasticsearch Consulting S

joining across sharded collection

2017-12-09 Thread chris


I'm trying to figure out how to structure this query.

I have two types of documents: items and sources.  Previously, they were all in 
the same collection.  I'm now testing a cluster with separate collections.

The items collection has 38,034,895,527 documents, and the sources collection 
has 417,618,443 documents.

I have all of the documents in the same collection in a solr cluster running 
version 6.0.1, with 100 shards and replication factor 1.

The following query works as expected:

q=type:source&fq={!join from=source_id 
to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 
count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1

In the source documents, the source_id identifies the source.  In the items 
documents, the source_id identifies the unique source document related to it.  
There is a 1:many relationship between sources and items.

The above query gets the sources that are associated with items that have 
item_category "abc", and then facets on the sources' source_factory field.


Now, I'm testing a separate cluster that has the same data, but organized into 
two collections: items and sources.

In order to do the same query, I have to use a cross-collection join, which 
requires the FROM collection to be unsharded.  However, in this case, the FROM 
collection is the items collection, which due to its size cannot be unsharded.

I'm hoping there's an easy way to restructure my data / query to accomplish the 
faceting I need.

The data set is static so can be re-indexed and reconfigured as needed.  It's 
also not under any load yet.



Re: joining across sharded collection

2017-12-10 Thread chris



Hi Erick,
No, we have not yet looked at the streaming functionality.� But we've started 
to explore it, so we'll look at that.
I briefly considered denormalizing the data but the sources documents have ~200 
fields so it seems to me that the index size would explode.� (The
items documents have 65 fields)
Thank you for your help.
�
Chris
�
 Original Message 
Subject: Re: joining across sharded collection

From: "Erick Erickson" 

Date: Sat, December 9, 2017 10:16 pm

To: "solr-user" 

--



> Have you looked at the streaming functionality (StreamingExpressions

> and ParllelSQL in particular)? While it has some restrictions, it

> easily handles cross-collection joins. It's generally intended for

> analytic-type queries, but at your scale that may be what you need.

>

> At that scale denoramlizing the data doesn't seem feasible

>

> Best,

> Erick

>

> On Sat, Dec 9, 2017 at 6:02 PM,  wrote:

>>

>>

>> I'm trying to figure out how to structure this query.

>>

>> I have two types of documents: items and sources. Previously, they were all 
>> in the same collection. I'm now testing a cluster with separate collections.

>>

>> The items collection has 38,034,895,527 documents, and the sources 
>> collection has 417,618,443 documents.

>>

>> I have all of the documents in the same collection in a solr cluster running 
>> version 6.0.1, with 100 shards and replication factor 1.

>>

>> The following query works as expected:

>>

>> q=type:source&fq={!join from=source_id 
>> to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 
>> count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1

>>

>> In the source documents, the source_id identifies the source. In the items 
>> documents, the source_id identifies the unique source document related to 
>> it. There is a 1:many relationship between sources and items.

>>

>> The above query gets the sources that are associated with items that have 
>> item_category "abc", and then facets on the sources' source_factory field.

>>

>>

>> Now, I'm testing a separate cluster that has the same data, but organized 
>> into two collections: items and sources.

>>

>> In order to do the same query, I have to use a cross-collection join, which 
>> requires the FROM collection to be unsharded. However, in this case, the 
>> FROM collection is the items collection, which due to its size cannot be 
>> unsharded.

>>

>> I'm hoping there's an easy way to restructure my data / query to accomplish 
>> the faceting I need.

>>

>> The data set is static so can be re-indexed and reconfigured as needed. It's 
>> also not under any load yet.

>>

>


Regarding Solr Cloud issue...

2013-10-15 Thread Chris
Hi,

I am using solr 4.4 as cloud. while creating shards i see that the last
shard has range of "null". i am not sure if this is a bug.

I am stuck with having null value for the range in clusterstate.json
(attached below)

"shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{
"state":"active", "core":"Web_shard5_replica1",
"node_name":"domain-name.com:1981_solr", "base_url":"
http://domain-name.com:1981/solr";, "leader":"true",
"router":"compositeId"},

I tried to use zookeeper cli to change this, but it was not able to. I
tried to locate this file, but didn't find it anywhere.

Can you please let me know how do i change the range from null to something
meaningful? i have the range that i need, so if i can find the file, maybe
i can change it manually.

My next question is - can we have a catch all for ranges, i mean if things
don't match any other range then insert in this shard..is this possible?

Kindly advice.

Chris


Regarding Solr Cloud issue...

2013-10-15 Thread Chris
Hi,

I am using solr 4.4 as cloud. while creating shards i see that the last
shard has range of "null". i am not sure if this is a bug.

I am stuck with having null value for the range in clusterstate.json
(attached below)

"shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{
"state":"active", "core":"Web_shard5_replica1",
"node_name":"domain-name.com:1981_solr", "base_url":"
http://domain-name.com:1981/solr";, "leader":"true",
"router":"compositeId"},

I tried to use zookeeper cli to change this, but it was not able to. I
tried to locate this file, but didn't find it anywhere.

Can you please let me know how do i change the range from null to something
meaningful? i have the range that i need, so if i can find the file, maybe
i can change it manually.

My next question is - can we have a catch all for ranges, i mean if things
don't match any other range then insert in this shard..is this possible?

Kindly advice.
Chris


Re: Regarding Solr Cloud issue...

2013-10-15 Thread Chris
Hi Shalin,.

Thank you for your quick reply. I appreciate all the help.

I started the solr cloud servers first...with 5 nodes.

then i issued a command like below to create the shards -

http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=1<http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4>

Please advice.

Regards,
Chris


On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> How did you create these shards? Can you tell us how to reproduce the
> issue?
>
> Any shard in a collection with compositeId router should never have null
> ranges.
>
>
> On Tue, Oct 15, 2013 at 7:07 PM, Chris  wrote:
>
> > Hi,
> >
> > I am using solr 4.4 as cloud. while creating shards i see that the last
> > shard has range of "null". i am not sure if this is a bug.
> >
> > I am stuck with having null value for the range in clusterstate.json
> > (attached below)
> >
> > "shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{
> > "state":"active", "core":"Web_shard5_replica1",
> > "node_name":"domain-name.com:1981_solr", "base_url":"
> > http://domain-name.com:1981/solr";, "leader":"true",
> > "router":"compositeId"},
> >
> > I tried to use zookeeper cli to change this, but it was not able to. I
> > tried to locate this file, but didn't find it anywhere.
> >
> > Can you please let me know how do i change the range from null to
> something
> > meaningful? i have the range that i need, so if i can find the file,
> maybe
> > i can change it manually.
> >
> > My next question is - can we have a catch all for ranges, i mean if
> things
> > don't match any other range then insert in this shard..is this possible?
> >
> > Kindly advice.
> > Chris
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Regarding Solr Cloud issue...

2013-10-16 Thread Chris
e":"8000-b332",
"state":"active",
"replicas":{
  "core_node4":{
"state":"active",
"core":"web_shard1_replica2",
"node_name":"64.251.14.47:1983_solr",
"base_url":"http://64.251.14.47:1983/solr"},
  "core_node2":{
"state":"active",
"core":"web_shard1_replica1",
"node_name":"64.251.14.47:1981_solr",
"base_url":"http://64.251.14.47:1981/solr";,
"leader":"true"}}},
  "shard2":{
"range":"b333-e665",
"state":"active",
"replicas":{
  "core_node6":{
"state":"active",
"core":"web_shard2_replica1",
"node_name":"64.251.14.47:1982_solr",
"base_url":"http://64.251.14.47:1982/solr"},
  "core_node7":{
"state":"active",
"core":"web_shard2_replica2",
"node_name":"64.251.14.47:1984_solr",
"base_url":"http://64.251.14.47:1984/solr";,
"leader":"true"}}},
  "shard3":{
"range":"e666-1998",
"state":"active",
"replicas":{
  "core_node9":{
"state":"active",
"core":"web_shard3_replica1",
"node_name":"64.251.14.47:1985_solr",
"base_url":"http://64.251.14.47:1985/solr"},
  "core_node1":{
"state":"active",
"core":"web_shard3_replica2",
"node_name":"64.251.14.47:1981_solr",
"base_url":"http://64.251.14.47:1981/solr";,
"leader":"true"}}},
  "shard4":{
"range":"1999-4ccb",
"state":"active",
"replicas":{
  "core_node3":{
"state":"active",
"core":"web_shard4_replica2",
"node_name":"64.251.14.47:1982_solr",
"base_url":"http://64.251.14.47:1982/solr"},
  "core_node5":{
"state":"active",
"core":"web_shard4_replica1",
"node_name":"64.251.14.47:1983_solr",
"base_url":"http://64.251.14.47:1983/solr";,
"leader":"true"}}},
  "shard5":{
"range":"4ccc-7fff",
"state":"active",
"replicas":{
  "core_node8":{
"state":"active",
"core":"web_shard5_replica1",
"node_name":"64.251.14.47:1984_solr",
"base_url":"http://64.251.14.47:1984/solr"},
  "core_node10":{
"state":"active",
"core":"web_shard5_replica2",
"node_name":"64.251.14.47:1985_solr",
"base_url":"http://64.251.14.47:1985/solr";,
"leader":"true",
"router":"compositeId"},
  "News":{
"shards":{
  "shard1":{
"range":"8000-b332",
"state":"active",
"replicas":{"core_node1":{
"state":"active",
"core":"News_shard1_replica1",
"node_name":"64.251.14.47:1984_solr",
"base_url":"http://64.251.14.47:1984/solr";,
"leader":"true"}}},
  "shard2":{
"range":"b333-e665",
"state":"active",
"replicas":{"core_node3":{
"state":"active",
"core":"News_shard2_replica1",
"node_name":"64.251.14.47:1983_solr",
"base_url":"http://64.251.14.47:1983/solr&

Re: Regarding Solr Cloud issue...

2013-10-16 Thread Chris
oops, the actual url is -http://64.251.14.47:1981/solr/

Also, another issue that needs to be raised is the creation of cores from
the "core admin" section of the gui, doesnt really work well, it creates
files but then they do not work (again i am using 4.4)


On Wed, Oct 16, 2013 at 4:12 PM, Chris  wrote:

> Hi,
>
> Please find the clusterstate.json as below:
>
> I have created a dev environment on one of my servers so that you can see
> the issue live - http://64.251.14.47:1984/solr/
>
> Also, There seems to be something wrong in zookeeper, when we try to add
> documents using solrj, it works fine as long as load of insert is not much,
> but once we start doing many inserts, then it throws a lot of errors...
>
> I am doing something like -
>
> CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL);
> solrCoreCloud.setDefaultCollection("Image");
> UpdateResponse up = solrCoreCloud.addBean(resultItem);
> UpdateResponse upr = solrCoreCloud.commit();
>
>
>
> clusterstate.json ---
>
> {
>   "collection1":{
> "shards":{
>   "shard2":{
> "range":"b333-e665",
> "state":"active",
> "replicas":{"core_node4":{
> "state":"active",
> "core":"collection1",
> "node_name":"64.251.14.47:1984_solr",
> "base_url":"http://64.251.14.47:1984/solr";,
> "leader":"true"}}},
>   "shard3":{
> "range":"e666-1998",
> "state":"active",
> "replicas":{"core_node5":{
> "state":"active",
> "core":"collection1",
> "node_name":"64.251.14.47:1985_solr",
> "base_url":"http://64.251.14.47:1985/solr";,
> "leader":"true"}}},
>   "shard4":{
> "range":"1999-4ccb",
> "state":"active",
> "replicas":{
>   "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"64.251.14.47:1982_solr",
> "base_url":"http://64.251.14.47:1982/solr"},
>   "core_node6":{
> "state":"active",
> "core":"collection1",
> "node_name":"64.251.14.47:1981_solr",
> "base_url":"http://64.251.14.47:1981/solr";,
> "leader":"true"}}},
>   "shard5":{
> "range":"4ccc-7fff",
> "state":"active",
> "replicas":{"core_node3":{
> "state":"active",
> "core":"collection1",
> "node_name":"64.251.14.47:1983_solr",
> "base_url":"http://64.251.14.47:1983/solr";,
> "leader":"true",
> "router":"compositeId"},
>   "Web":{
> "shards":{
>   "shard1":{
> "range":"8000-b332",
> "state":"active",
> "replicas":{"core_node2":{
> "state":"active",
> "core":"Web_shard1_replica1",
> "node_name":"64.251.14.47:1983_solr",
> "base_url":"http://64.251.14.47:1983/solr";,
> "leader":"true"}}},
>   "shard2":{
> "range":"b333-e665",
> "state":"active",
> "replicas":{"core_node3":{
> "state":"active",
> "core":"Web_shard2_replica1",
> "node_name":"64.251.14.47:1984_solr",
> "base_url":"http://64.251.14.47:1984/solr";,
> "leader":"true"}}},
>   "shard3":{
> "range":"e666-1998",
> "state"

Re: Regarding Solr Cloud issue...

2013-10-16 Thread Chris
Also, is there any easy way upgrading to 4.5 without having to change most
of my plugins & configuration files?


On Wed, Oct 16, 2013 at 4:18 PM, Chris  wrote:

> oops, the actual url is -http://64.251.14.47:1981/solr/
>
> Also, another issue that needs to be raised is the creation of cores from
> the "core admin" section of the gui, doesnt really work well, it creates
> files but then they do not work (again i am using 4.4)
>
>
> On Wed, Oct 16, 2013 at 4:12 PM, Chris  wrote:
>
>> Hi,
>>
>> Please find the clusterstate.json as below:
>>
>> I have created a dev environment on one of my servers so that you can see
>> the issue live - http://64.251.14.47:1984/solr/
>>
>> Also, There seems to be something wrong in zookeeper, when we try to add
>> documents using solrj, it works fine as long as load of insert is not much,
>> but once we start doing many inserts, then it throws a lot of errors...
>>
>> I am doing something like -
>>
>> CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL);
>> solrCoreCloud.setDefaultCollection("Image");
>> UpdateResponse up = solrCoreCloud.addBean(resultItem);
>> UpdateResponse upr = solrCoreCloud.commit();
>>
>>
>>
>> clusterstate.json ---
>>
>> {
>>   "collection1":{
>> "shards":{
>>   "shard2":{
>> "range":"b333-e665",
>> "state":"active",
>> "replicas":{"core_node4":{
>> "state":"active",
>> "core":"collection1",
>> "node_name":"64.251.14.47:1984_solr",
>> "base_url":"http://64.251.14.47:1984/solr";,
>> "leader":"true"}}},
>>   "shard3":{
>> "range":"e666-1998",
>> "state":"active",
>> "replicas":{"core_node5":{
>> "state":"active",
>> "core":"collection1",
>> "node_name":"64.251.14.47:1985_solr",
>> "base_url":"http://64.251.14.47:1985/solr";,
>> "leader":"true"}}},
>>   "shard4":{
>> "range":"1999-4ccb",
>> "state":"active",
>> "replicas":{
>>   "core_node2":{
>> "state":"active",
>> "core":"collection1",
>> "node_name":"64.251.14.47:1982_solr",
>> "base_url":"http://64.251.14.47:1982/solr"},
>>   "core_node6":{
>> "state":"active",
>> "core":"collection1",
>> "node_name":"64.251.14.47:1981_solr",
>> "base_url":"http://64.251.14.47:1981/solr";,
>> "leader":"true"}}},
>>   "shard5":{
>> "range":"4ccc-7fff",
>> "state":"active",
>> "replicas":{"core_node3":{
>> "state":"active",
>> "core":"collection1",
>> "node_name":"64.251.14.47:1983_solr",
>> "base_url":"http://64.251.14.47:1983/solr";,
>> "leader":"true",
>> "router":"compositeId"},
>>   "Web":{
>> "shards":{
>>   "shard1":{
>> "range":"8000-b332",
>> "state":"active",
>> "replicas":{"core_node2":{
>> "state":"active",
>> "core":"Web_shard1_replica1",
>> "node_name":"64.251.14.47:1983_solr",
>> "base_url":"http://64.251.14.47:1983/solr";,
>> "leader":"true"}}},
>>   "shard2":{
>> "range":"b333-e665",
>> "state":"

Re: Regarding Solr Cloud issue...

2013-10-16 Thread Chris
oh great. Thanks Primoz.

is there any simple way to do the upgrade to 4.5 without having to change
my configurations? update a few jar files etc?


On Wed, Oct 16, 2013 at 4:58 PM,  wrote:

> >>> Also, another issue that needs to be raised is the creation of cores
> from
> >>> the "core admin" section of the gui, doesnt really work well, it
> creates
> >>> files but then they do not work (again i am using 4.4)
>
> From my experience "core admin" section of the GUI does not work well in
> SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0
> which acts much better.
>
> I would use only HTTP requests ("cores and collections API") with
> SolrCloud and would use GUI only for viewing the state of cluster and
> cores.
>
> Primoz
>
>
>


Re: Regarding Solr Cloud issue...

2013-10-16 Thread Chris
very well, i will try the same, maybe an auto update tool should be also
put on the line...just a thought ...


On Wed, Oct 16, 2013 at 6:20 PM,  wrote:

> Hm, good question. I haven't really done any upgrading yet, because I just
> reinstall and reindex everything. I would replace jars with the new ones
> (if needed - check release notes for version 4.4.0 and 4.5.0 where all the
> versions of external tools [tika, maven, etc.] are stated) and deploy the
> updated WAR file to servlet container.
>
> Primoz
>
>
>
>
> From:   Chris 
> To: solr-user 
> Date:   16.10.2013 14:30
> Subject:Re: Regarding Solr Cloud issue...
>
>
>
> oh great. Thanks Primoz.
>
> is there any simple way to do the upgrade to 4.5 without having to change
> my configurations? update a few jar files etc?
>
>
> On Wed, Oct 16, 2013 at 4:58 PM,  wrote:
>
> > >>> Also, another issue that needs to be raised is the creation of cores
> > from
> > >>> the "core admin" section of the gui, doesnt really work well, it
> > creates
> > >>> files but then they do not work (again i am using 4.4)
> >
> > From my experience "core admin" section of the GUI does not work well in
> > SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0
> > which acts much better.
> >
> > I would use only HTTP requests ("cores and collections API") with
> > SolrCloud and would use GUI only for viewing the state of cluster and
> > cores.
> >
> > Primoz
> >
> >
> >
>
>


Re: Regarding Solr Cloud issue...

2013-10-17 Thread Chris
Wow thanks for all that, i just upgraded, linked my plugins & it seems fine
so far, but i have run into another issue

while adding a document to the solr cloud it says -
org.apache.solr.common.SolrException: Unknown document router
'{name=compositeId}'

in the clusterstate.json i can see -

 "shard5":{
"range":"4ccc-7fff",
"state":"active",
"replicas":{"core_node4":{
"state":"active",
"base_url":"http://64.251.14.47:1984/solr";,
"core":"web_shard5_replica1",
"node_name":"64.251.14.47:1984_solr",
"leader":"true",
"maxShardsPerNode":"2",
"router":{"name":"compositeId"},
"replicationFactor":"1"},

I am using this to add -

   CloudSolrServer solrCoreCloud = new 
CloudSolrServer(cloudURL);
   solrCoreCloud.setDefaultCollection("web");
   UpdateResponse up = solrCoreCloud.addBean(resultItem);
   UpdateResponse upr = solrCoreCloud.commit();

Please advice.





On Wed, Oct 16, 2013 at 9:49 PM, Shawn Heisey  wrote:

> On 10/16/2013 4:51 AM, Chris wrote:
> > Also, is there any easy way upgrading to 4.5 without having to change
> most
> > of my plugins & configuration files?
>
> Upgrading is something that should be done carefully.  If you can, it's
> always recommended that you try it out on dev hardware with your real
> index data beforehand, so you can deal with any problems that arise
> without causing problems for your production cluster.  Upgrading
> SolrCloud is particularly tricky, because for a while you will be
> running different versions on different machines in your cluster.
>
> If you're using your own custom software to go with Solr, or you're
> using third-party plugins that aren't included in the Solr download,
> upgrading might take more effort than usual.  Also, if you are doing
> anything in your config/schema that changes the format of the Lucene
> index, you may find that it can't be upgraded without completely
> rebuilding the index.  Examples of this are changing the postings format
> or docValues format.  This is a very nasty complication with SolrCloud,
> because those configurations affect the entire cluster.  In that case,
> the whole index may need to be rebuilt without custom formats before
> upgrading is attempted.
>
> If you don't have any of the complications mentioned in the preceding
> paragraph, upgrading is usually a very simple process:
>
> *) Shut down Solr.
> *) Delete the extracted WAR file directory.
> *) Replace solr.war with the new war from dist/ in the download.
> **) Usually it must actually be named solr.war, which means renaming it.
> *) Delete and replace other jars copied from the download.
> *) Change luceneMatchVersion in all solrconfig.xml files. **
> *) Start Solr back up.
>
> ** With SolrCloud, you can't actually change the luceneMatchVersion
> until all of your servers have been upgraded.
>
> A full reindex is strongly recommended.  With SolrCloud, it normally
> needs to wait until all servers are upgraded.  In situations where it
> won't work at all without a reindex, upgrading SolrCloud can be very
> challenging.
>
> It's strongly recommended that you look over CHANGES.txt and compare the
> new example config/schema with the example from the old version, to see
> if there are any changes that you might want to incorporate into your
> own config.  As with luceneMatchVersion, if you're running SolrCloud,
> those changes might need to wait until you're fully upgraded.
>
> Side note: When upgrading to a new minor version, config changes aren't
> normally required.  They will usually be required when upgrading major
> versions, such as 3.x to 4.x.
>
> If you *do* have custom plugins that aren't included in the Solr
> download, you may have to recompile them for the new version, or wait
> for the vendor to create a new version before you upgrade.
>
> This is only the tip of the iceberg, but a lot of the rest of it depends
> greatly on your configurations.
>
> Thanks,
> Shawn
>
>


Re: Regarding Solr Cloud issue...

2013-10-17 Thread Chris
I am also trying with something like -

java -Durl=http://domainname.com:1981/solr/web/update-Dtype=application/json
-jar /solr4RA/example1/exampledocs/post.jar
/root/Desktop/web/*.json

but it is giving error -

19:06:22 ERROR SolrCore org.apache.solr.common.SolrException: Unknown
command: subDomain [12]

org.apache.solr.common.SolrException: Unknown command: subDomain [12]
at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:152)
at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)




On Thu, Oct 17, 2013 at 6:31 PM, Chris  wrote:

> Wow thanks for all that, i just upgraded, linked my plugins & it seems
> fine so far, but i have run into another issue
>
> while adding a document to the solr cloud it says -
> org.apache.solr.common.SolrException: Unknown document router
> '{name=compositeId}'
>
> in the clusterstate.json i can see -
>
>  "shard5":{
> "range":"4ccc-7fff",
> "state":"active",
> "replicas":{"core_node4":{
> "state":"active",
> "base_url":"http://64.251.14.47:1984/solr";,
> "core":"web_shard5_replica1",
> "node_name":"64.251.14.47:1984_solr",
> "leader":"true",
> "maxShardsPerNode":"2",
> "router":{"name":"compositeId"},
> "replicationFactor":"1"},
>
> I am using this to add -
>
>
>  CloudSolrServer solrCoreCloud = new 
> CloudSolrServer(cloudURL);
>
>  solrCoreCloud.setDefaultCollection("web");
>

Two easy questions...

2013-10-20 Thread Chris
Hi,

I am new to solr & have two questions -

1. how do i get an excerpt for a huge content field (would love to show
google  like excerpts, where word searched for is highlighted)

2. If i have a field - A, is it possible to get top results with only
unique values for this field in a page...?

Thanks,
Chris


Solr Index corrupted...

2013-10-22 Thread Chris
Hi,

I am running solr 4.4 & one of my collections seems to have a corrupted
index...

I tried doing -
java -cp lucene-core-4.4.0.jar -ea:org.apache.lucene...
org.apache.lucene.index.CheckIndex /solr2/example/solr/w1/data/index/ -fix

But it didnt help...gives -

ERROR: could not read any segments file in directory
java.io.FileNotFoundException:
/solr2/example/solr/w1/data/index/segments_hid (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:318)
at
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:380)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:812)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:663)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:376)
at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:382)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1854)

Please help.

Chris


character encoding issue...

2013-10-29 Thread Chris
Hi All,

I get characters like -

�� - CTA -

in the solr index. I am adding Java beans to solr by the addBean() function.

This seems to be a character encoding issue. Any pointers on how to
resolve this one?

I have seen that this occurs  mostly for japanese chinese characters.


Re: character encoding issue...

2013-10-31 Thread Chris
Hi Rajani,

I followed the steps exactly as in
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/

However, when i send a query to this new instance in tomcat, i again get
the error -

  Scheduled Groups Maintenance
In preparation for the new release roll-out, Diigo groups won’t be
accessible on Sept 28 (Mon) around midnight 0:00 PST for several
hours.
Stay tuned to say hello to Diigo V4 soon!

location of the text  -
http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/

same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/

All text in title comes like -

 - � 

   -
� 



Can you please advice?

Chris




On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski wrote:

> Hi,
>
>If you are using Apache Tomcat Server, hope you are not missing the
> below mentioned configuration:
>
>   connectionTimeout=”2″
> redirectPort=”8443″ *URIEncoding=”UTF-8″*/>
>
> I had faced similar issue with Chinese Characters and had resolved with the
> above config.
>
> Links for reference :
>
> http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
>
> http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8
>
>
> Thanks
>
>
>
> On Tue, Oct 29, 2013 at 9:20 PM, Chris  wrote:
>
> > Hi All,
> >
> > I get characters like -
> >
> > �� - CTA -
> >
> > in the solr index. I am adding Java beans to solr by the addBean()
> > function.
> >
> > This seems to be a character encoding issue. Any pointers on how to
> > resolve this one?
> >
> > I have seen that this occurs  mostly for japanese chinese characters.
> >
>


Re: character encoding issue...

2013-11-04 Thread Chris
Sorry, was away a bit & hence the delay.

I am inserting java strings into a java bean class, and then doing a
addBean() method to insert the POJO into Solr.

When i Query using either tomcat/jetty, I get these special characters. But
I have noted, if I change output to - "Shift-JIS" encoding then those
characters appear as some japanese characters I think.

But then this solution doesn't work for all special characters as I can
still see some of them...isn't there an encoding that can cover all the
characters whatever they might be? Any ideas on what do i do?

Regards,
Chris


On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson wrote:

> The problem is there are about a dozen places where the character
> encoding can be mis-configured. The problem you're seeing above
> actually looks like a problem with the character set configured in
> your browser, it may have nothing to do with what's actually in Solr.
>
> You might write small SolrJ program and see if you can dump the contents
> in binary and examine to see...
>
> Best
> Erick
>
>
> On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski 
> wrote:
>
> > How are you extracting the text that is there in the website[1] you are
> > referring to? Apache Nutch or any other crawler? If yes, initially check
> > whether that crawler engine is giving you data in correct format before
> you
> > invoke solr index method.
> >
> > [1]http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/
> >
> > URI encoding should resolve this problem.
> >
> >
> >
> >
> > On Fri, Nov 1, 2013 at 10:50 AM, Chris  wrote:
> >
> > > Hi Rajani,
> > >
> > > I followed the steps exactly as in
> > >
> > >
> >
> http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
> > >
> > > However, when i send a query to this new instance in tomcat, i again
> get
> > > the error -
> > >
> > >   Scheduled Groups Maintenance
> > > In preparation for the new release roll-out, Diigo groups won’t be
> > > accessible on Sept 28 (Mon) around midnight 0:00 PST for several
> > > hours.
> > > Stay tuned to say hello to Diigo V4 soon!
> > >
> > > location of the text  -
> > > http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/
> > >
> > > same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/
> > >
> > > All text in title comes like -
> > >
> > >  - �
> > > 
> > > 
> > >    -
> > > � 
> > > 
> > >
> > >
> > > Can you please advice?
> > >
> > > Chris
> > >
> > >
> > >
> > >
> > > On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski  > > >wrote:
> > >
> > > > Hi,
> > > >
> > > >If you are using Apache Tomcat Server, hope you are not missing
> the
> > > > below mentioned configuration:
> > > >
> > > >   > > > connectionTimeout=”2″
> > > > redirectPort=”8443″ *URIEncoding=”UTF-8″*/>
> > > >
> > > > I had faced similar issue with Chinese Characters and had resolved
> with
> > > the
> > > > above config.
> > > >
> > > > Links for reference :
> > > >
> > > >
> > >
> >
> http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
> > > >
> > > >
> > >
> >
> http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8
> > > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > On Tue, Oct 29, 2013 at 9:20 PM, Chris  wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I get characters like -
> > > > >
> > > > > �� - CTA -
> > > > >
> > > > > in the solr index. I am adding Java beans to solr by the addBean()
> > > > > function.
> > > > >
> > > > > This seems to be a character encoding issue. Any pointers on how to
> > > > > resolve this one?
> > > > >
> > > > > I have seen that this occurs  mostly for japanese chinese
> characters.
> > > > >
> > > >
> > >
> >
>


Re: character encoding issue...

2013-11-09 Thread Chris
I tried a lot of things and almost am at my wit's end :(


Here is the code I used to get the strings -

String htmlContent = readPage(page.getWebURL().getURL());

I even tried -
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url);
String htmlContent = doc.html();

& Document doc = Jsoup.parse(htmlContent,"UTF-8");

No improvement so far, any advice for me please?



function that gets the html 
 public static String readPage(String urlString)  {
 try{

   URL url = new URL(urlString);
 DefaultHttpClient client = new DefaultHttpClient();
 client.getParams().setParameter(ClientPNames.COOKIE_POLICY,
 CookiePolicy.BROWSER_COMPATIBILITY);

 HttpGet request = new HttpGet(url.toURI());
 HttpResponse response = client.execute(request);

 if(response.getStatusLine().getStatusCode() == 200 &&
response.getEntity().getContentType().toString().contains("text/html"))
 {
 Reader reader = null;
 try {
 reader = new
InputStreamReader(response.getEntity().getContent());

 StringBuffer sb = new StringBuffer();
 {
 int read;
 char[] cbuf = new char[1024];
 while ((read = reader.read(cbuf)) != -1)
 sb.append(cbuf, 0, read);
 }

 return sb.toString();

 } finally {
 if (reader != null) {
 try {
 reader.close();
 } catch (IOException e) {
 e.printStackTrace();
 }
}
 }
 }
 else
 return "";

 }catch(Exception e){return "";}

 }

---



On Wed, Nov 6, 2013 at 2:53 AM, T. Kuro Kurosaka wrote:

> It sounds like the characters were mishandled at index build time.
> I would use Luke to see if a character that appear correctly
> when you change the output to be SHIFT JIS is actually
> stored as one Unicode. I bet it's stored as two characters,
> each having the character of the value that happened
> to be high and low bytes of the SHIFT JIS character.
>
> There are many possible cause of this. If you are indexing
> the HTML document from HTTP servers, HTTP server may
> be configured to send wrong charset= info in Content-Type
> header. If the document is directly from a file system,
> and if the document doesn't  have META header declaring
> the charset, then the system assumes a default charset,
> which is typically ISO-8859-1 or UTF-8, and misinterprets
> SHIF-JIS encoded characters.
>
> You need to debug to find out where the characters
> get corrupted.
>
>
> On 11/04/2013 11:15 PM, Chris wrote:
>
>> Sorry, was away a bit & hence the delay.
>>
>> I am inserting java strings into a java bean class, and then doing a
>> addBean() method to insert the POJO into Solr.
>>
>> When i Query using either tomcat/jetty, I get these special characters.
>> But
>> I have noted, if I change output to - "Shift-JIS" encoding then those
>> characters appear as some japanese characters I think.
>>
>> But then this solution doesn't work for all special characters as I can
>> still see some of them...isn't there an encoding that can cover all the
>> characters whatever they might be? Any ideas on what do i do?
>>
>> Regards,
>> Chris
>>
>>
>> On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson 
>> wrote:
>>
>>  The problem is there are about a dozen places where the character
>>> encoding can be mis-configured. The problem you're seeing above
>>> actually looks like a problem with the character set configured in
>>> your browser, it may have nothing to do with what's actually in Solr.
>>>
>>> You might write small SolrJ program and see if you can dump the contents
>>> in binary and examine to see...
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski 
>>> wrote:
>>>
>>>  How are you extracting the text that is there in the website[1] you are
>>>> referring to? Apache Nutch or any other crawler? If yes, initially check
>>>> whether that crawler engine is giving you data in correct format before
>>>>
>>> you
>>&g

Query Relevancy tuning...

2013-11-09 Thread Chris
  Hi Gurus,

I have a relevancy ranking questrion -

1. I have fields - title, domain, domainrank in index.
2. I am looking to maybe load a txt file of prefered domains at solr
startup & boost documents from those domains if keyword matches text in
title or domain (if it exactly matches the domain, it should rank higher,
than if it were a semi match)
3. Also, i would like to have 2-3 results per domain per page.(at the max)
4. Also, is it possible to do intersection - if all 4 words(say) matches it
should rank higher than maybe 3 word match & so on..

I would like this to be as fast as possible, so kindly suggest an optimal
way of doing this.

a few things that were tried

edismax
   
  fulltxt^0.5 title^2.0 domain^3 urlKeywords^1.5 anchorText^2.0
h1Keywords^1.5
   
   text
   100%
   *:*
   10
   *,score


Solr ranking query..

2014-02-03 Thread Chris
Hi,

I have a document structure that looks like the below. I would like to
implement something like -

(urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60 " +
 "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20 " +
  "OR (title:"+keyword+" AND domainRank:[10001 TO *] AND adultFlag:N)^2 " +
  "OR (fulltxt:"+keyword+") ");


In case we have multiple words in keywords - "A B C D" then for the
documents that have all the words should rank highest (Group1), then 3
words(Group2), then 2 words(Group 3) etc
AND - Within each group (Group1, 2, 3) I would want the ones with the
lowest domain rank value to rank higher (but within the group)

How can i do this in a single query? and please advice on the fastest way
possible,
(open to implementing fq & other techniques to speed it up)

Please advice.


Document Structure in XML -

 
www
ncoah.com
/links.html
http://www.ncoah.com/links.html
North Carolina Office of Administrative Hearings
- Links

  North Carolina Office of Administrative Hearings - Links

 - http://www.ncoah.com/links.html";  title="Hearings">Hearings
- http://www.ncoah.com/links.html";  title="Rules">Rules -
http://www.ncoah.com/links.html";  title="Civil Rights">Civil
Rights - http://www.ncoah.com/links.html";
title="Welcome">Welcome - http://www.ncoah.com/links.html";  title="General
Information">General Information - http://www.ncoah.com/links.html";  title="Directions to
OAH">Directions to OAH - http://www.ncoah.com/links.html";
 title="Establishment of OAH">Establishment of OAH - http://www.ncoah.com/links.html";  title="G.S. 150B">G.S.
150B - http://www.ncoah.com/links.html";
title="Forms">Forms - http://www.ncoah.com/links.html";
title="Links">Links - http://www.nc.gov/";  title="Visit
the North Carolina State web portal">Visit the North Carolina State
web portal - http://ncinfo.iog.unc.edu/library/counties.html";  title="North
Carolina Counties">North Carolina Counties - http://ncinfo.iog.unc.edu/library/cities.html";  title="North
Carolina Cities & Towns">North Carolina Cities & Towns - http://www.nccourts.org/";  title="Administrative Office of the
Courts">Administrative Office of the Courts - http://www.ncleg.net/";  title="North Carolina General
Assembly">North Carolina General Assembly - http://www.doa.state.nc.us/";  title="Department of
Administration">Department of Administration - http://www.ncagr.com/";  title="Department of
Agriculture">Department of Agriculture - http://www.nccommerce.com";  title="Department of
Commerce">Department of Commerce - http://www.doc.state.nc.us/";  title="Department of
Correction">Department of Correction - http://www.nccrimecontrol.org/";  title="Department of Crime
Control & Public Safety">Department of Crime Control & Public
Safety - http://www.ncdcr.gov/";  title="Department of
Cultural Resources">Department of Cultural Resources - http://www.ncdenr.gov/";  title="Department of Environment and
Natural Resources">Department of Environment and Natural Resources
- http://www.dhhs.state.nc.us";  title="Department of Health
and Human Services">Department of Health and Human Services - http://www.ncdoi.com/";  title="Department of
Insurance">Department of Insurance - http://www.ncdoj.com/";  title="Department of Justice">Department
of Justice - http://www.juvjus.state.nc.us/";
title="Department of Juvenile Justice and Delinquency
Prevention">Department of Juvenile Justice and Delinquency
Prevention - http://www.nclabor.com/";  title="Department
of Labor">Department of Labor - http://www.dpi.state.nc.us/";  title="Department of Public
Instruction">Department of Public Instruction - http://www.dor.state.nc.us/";  title="Department of
Revenue">Department of Revenue - http://www.treasurer.state.nc.us/";  title="Department of State
Treasurer">Department of State Treasurer - http://www.ncdot.org/";  title="Department of
Transportation">Department of Transportation - http://www.secstate.state.nc.us/";  title="Department of the
Secretary of State">Department of the Secretary of State - http://www.osp.state.nc.us/";  title="Office of State
Personnel">Office of State Personnel - http://www.governor.state.nc.us/";  title="Office of the
Governor">Office of the Governor - http://www.ltgov.state.nc.us/";  title="Office of the Lt.
Governor">Office of the Lt. Governor - http://www.ncauditor.net/";  title="Office of the State
Auditor">Office of the State Auditor - http://www.osc.nc.gov/";  title="Office of the State
Controller">Office of the State Controller - http://www.ncbar.org/";  title="North Carolina Bar
Association">North Carolina Bar Association - http://www.ncbar.com/index.asp";  title="North Carolina State
Bar">North Carolina State Bar - http://ncrules.state.nc.us/ncadministrativ_/default.htm";
title="North Carolina Administrative Code">North Carolina
Administrative Code - http://www.ncoah.com/rules/register/";  title="North Carolina
Register">North Carolina Register - http://www.g

Re: Solr ranking query..

2014-02-04 Thread Chris
Dear Varun,

Thank you for your replies, I managed to get point 1 & 2 done, but for the
boost query, I am unable to figure it out. Could you be kind enough to
point me to an example or maybe advice a bit more on that one?

Thanks for your help,
Chris


On Tue, Feb 4, 2014 at 3:14 PM, Varun Thacker wrote:

> Hi Chris,
>
> I think what you are looking for could be solved using the eDismax query
> parser.
>
> https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
>
> 1. Your Query Fields ( qf ) would be -  "urlKeywords^60 title^40 fulltxt^1"
> 2. To check on adultFlag:N you could use  &fq=adultFlag:N
> 3. For Lowest Domain Rank within the same group to rank higher you could
> use the "boost" parameter and use a recip (
> http://wiki.apache.org/solr/FunctionQuery#recip ) function query to
> achieve
> this.
>
> Hope this works for you
>
>
> On Tue, Feb 4, 2014 at 12:19 PM, Chris  wrote:
>
> > Hi,
> >
> > I have a document structure that looks like the below. I would like to
> > implement something like -
> >
> > (urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60
> "
> > +
> >  "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20
> " +
> >   "OR (title:"+keyword+" AND domainRank:[10001 TO *] AND adultFlag:N)^2
> " +
> >   "OR (fulltxt:"+keyword+") ");
> >
> >
> > In case we have multiple words in keywords - "A B C D" then for the
> > documents that have all the words should rank highest (Group1), then 3
> > words(Group2), then 2 words(Group 3) etc
> > AND - Within each group (Group1, 2, 3) I would want the ones with the
> > lowest domain rank value to rank higher (but within the group)
> >
> > How can i do this in a single query? and please advice on the fastest way
> > possible,
> > (open to implementing fq & other techniques to speed it up)
> >
> > Please advice.
> >
> >
> > Document Structure in XML -
> >
> >  
> > www
> > ncoah.com
> > /links.html
> > http://www.ncoah.com/links.html
> > North Carolina Office of Administrative Hearings
> > - Links
> > 
> >   North Carolina Office of Administrative Hearings - Links
> > 
> >  -  > href="http://www.ncoah.com/links.html";  title="Hearings">Hearings
> > - http://www.ncoah.com/links.html";  title="Rules">Rules -
> > http://www.ncoah.com/links.html";  title="Civil Rights">Civil
> > Rights - http://www.ncoah.com/links.html";
> > title="Welcome">Welcome -  > href="http://www.ncoah.com/links.html";  title="General
> > Information">General Information -  > href="http://www.ncoah.com/links.html";  title="Directions to
> > OAH">Directions to OAH - http://www.ncoah.com/links.html";
> >  title="Establishment of OAH">Establishment of OAH -  > href="http://www.ncoah.com/links.html";  title="G.S. 150B">G.S.
> > 150B - http://www.ncoah.com/links.html";
> > title="Forms">Forms - http://www.ncoah.com/links.html";
> > title="Links">Links - http://www.nc.gov/";  title="Visit
> > the North Carolina State web portal">Visit the North Carolina State
> > web portal -  > href="http://ncinfo.iog.unc.edu/library/counties.html";  title="North
> > Carolina Counties">North Carolina Counties -  > href="http://ncinfo.iog.unc.edu/library/cities.html";  title="North
> > Carolina Cities & Towns">North Carolina Cities & Towns -  > href="http://www.nccourts.org/";  title="Administrative Office of the
> > Courts">Administrative Office of the Courts -  > href="http://www.ncleg.net/";  title="North Carolina General
> > Assembly">North Carolina General Assembly -  > href="http://www.doa.state.nc.us/";  title="Department of
> > Administration">Department of Administration -  > href="http://www.ncagr.com/";  title="Department of
> > Agriculture">Department of Agriculture -  > href="http://www.nccommerce.com";  title="Department of
> > Commerce">Department of Commerce -  > href="http://www.doc.state.nc.us/";  title="Department of
> > Correction">Department of Correction -  > href="http://www.nccrimecontrol.org/"; 

Solr the right thing for me?

2011-02-15 Thread Chris

Hello all,

I'm searching for a possibility to:

- Receive an email when a site changed/was added to a web.
- Only index sites, that contain a reg exp in the content.
- Receive the search results in machine readable way (RSS/SOAP/..)

This should be possible to organize in sets.
(set A with 40 Websites, set B with 7 websites)

Does it sound possible with SOLR?
Do I have to expect custom development? If so, how much?

Thank you in advance
Bye, Chris


Capabilities of solr

2008-09-19 Thread Chris
Hello,

We currently have a ton of documents that we would like to index and
make search-able. I came across solr and it seems like it offers a lot
of nice features and would suite our needs.

The documents are in similar structure to java code, blocks
representing functions, variables, comment blocks etc.

We would also like to provide our users the ability to "tag" a line,
or multiple lines of the document with comments that would be stored
externally, for future reference or notes for enhancements. These
documents are also updated frequently.

I also noticed in the examples that XML documents are used to import
documents into solr. If we have code like documents vs. for example
products is there any specific way to define the solr schema for these
types of documents?

Currently we maintain these documents as flat files and in MySQL.

Does solr sound like a good option for what we are looking to do? If
so, could anybody provide some starting points for my research?

Thank you


Re: slowdown after 15K queries

2008-06-02 Thread Chris
Maybe you jetty need to turning
how many memory in your system ?
Can you show the processes information with the java processes ?

above
Chris

2008/6/2 Bram de Jong <[EMAIL PROTECTED]>:

> Hello all,
>
>
> Still running tests on solr using the example jetty container. I've
> been getting nice performance. However, suddenly between 15400 and
> 15600 queries, I get a very serious drop in performance, and this
> every time I run my test, independent of what I'm searching for. The
> performance STAYS low and doesn't come up again until I restart
> Jerry/Solr.
>
> This what I'm getting. The number between parentheses is the nr of
> queries done until "now".
>
> 
> average query time this batch ( 2798 ) : 21.7171502113
> average query time this batch ( 2998 ) : 21.556429863
> average query time this batch ( 3197 ) : 20.7244367456
> average query time this batch ( 3397 ) : 20.9529149532
> average query time this batch ( 3597 ) : 21.7199647427
> 
>
> Then sudenly around 14K my average time goes up 3 fold:
>
> 
> average query time this batch ( 15183 ) : 22.5312757732
> average query time this batch ( 15383 ) : 27.6089298725 <-
> average query time this batch ( 15583 ) : 66.8137800694 <-
> average query time this batch ( 15783 ) : 67.5224089622 <-
> average query time this batch ( 15983 ) : 68.210555315 <-
> 
>
> I tried taking another set of searches (I'm replaying searches done on
> our website), but exactly the same pattern occurs. The cumulative
> evictions for all caches is 0 before and after the slowdown, so my
> initial thought (i.e. full cache) was not it. I did some further
> investigating, and it looks like only once every few searches becomes
> slow. This is the batch of searches for block nr 15583. Every search
> string is mentioned and then the query_time as reported by Solr:
>
> ('electricity', 19), ('killed', 16), ('radio static', 179), ('monster
> killed', 15), ('heavy machinery', 16), ('killed', 179), ('chimes',
> 17), ('video games', 17), ('sword', 16), ('construction machine', 17),
> ('graveyard', 15), ('people', 16), ('yard', 179), ('horn', 15),
> ('bugle', 14), ('trumpet', 17), ('grass walking', 177), ('walking',
> 17), ('horn', 15), ('clunk', 14), ('hydraulic', 178), ('jet landing',
> 16), ('o fortuna', 14), ('large crowd', 180), ('dj', 15), ('hallway',
> 15), ('scrach', 13), ('jet tires screeching', 177), ('tires
> screeching', 14), ('ambient', 16), ('electricity', 180), ('tires',
> 15), ('hospital', 15), ('chimes', 17), ('win chimes', 178), ('wind
> chimes', 15), ('sex', 15), ('SPACE', 17), ('river', 180), ('thunder
> storms', 16), ('crash', 15), ('boss', 16), ('thunder', 16), ('car
> braking', 16), ('vocal', 17), ('vocal dance', 182), ('vocal', 16),
> ('stream', 16), ('whale', 14), ('space ambient', 183), ('animal', 16),
> ('pad', 15), ('body', 16), ('crickets', 180), ('fall', 16), ('camera
> flash bulb', 15), ('arctic', 178), ('flash bulb', 13), ('camera', 15),
> ('drawn', 16), ('next level', 180), ('timbale', 14), ('navigation',
> 14), ('bass', 14), ('blips', 179), 
>
>
> Any hints of things I can try would be superb. Having to wait 15000 *
> 25ms every time I want to try something else to fix this is becoming a
> bit annoying :)
>
>
>  - Bram
>
> PS: if you are curious: the things people search for are sounds :)
> hence the very varied set of search strings.
>
> --
> http://freesound.iua.upf.edu
> http://www.smartelectronix.com
> http://www.musicdsp.org
>



-- 
Chris Lin
[EMAIL PROTECTED]
Taipei , Taiwan.
---


Re: CURL command problem on Solr

2018-05-29 Thread chris
HTTP headers are case insensitive





 Original message From: simon  Date: 
5/29/18  12:17 PM  (GMT-05:00) To: solr-user  
Subject: Re: CURL command problem on Solr 
Could it be that the header should be 'Content-Type' (which is what I see
in the relevant RFC) rather than 'Content-type' as shown in your email ? I
don't know if headers are case-sensitive, but it's worth checking.

-Simon

On Tue, May 29, 2018 at 11:02 AM, Roee Tarab  wrote:

> Hi ,
>
> I am having some troubles with pushing a features file to solr while
> building an LTR model. I'm trying to upload a JSON file on windows cmd
> executable from an already installed CURL folder, with the command:
>
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store'
> --data-binary "@/path/myFeatures.json" -H 'Content-type:application/json'.
>
> I am receiving the following error massage:
>
> {
>   "responseHeader":{
> "status":500,
> "QTime":7},
>   "error":{
> "msg":"Bad Request",
> "trace":"Bad Request (400) - Invalid content type
> application/x-www-form-urlencoded; only application/json is
> supported.\r\n\tat org.apache.solr.rest.RestManager$ManagedEndpoint.
> parseJsonFromRequestBody(RestManager.java:407)\r\n\tat
> org.apache.solr.rest.
> RestManager$ManagedEndpoint.put(RestManager.java:340) 
>
> This is definitely a technical issue, and I have not been able to overcome
> it for 2 days.
>
> Is there another option of uploading the file to our core? Is there
> something we are missing in our command?
>
> Thank you in advance for any help,
>


Best practice for saving state of large cluster?

2019-06-28 Thread chris
I have a cluster of 100 shards on 100 nodes, with solr 7.5, running in AWS.The 
use case is read-dominant, with ingestion performed about once per week. There 
are about 84 billion documents in the cluster. It is unused on weekends and 
only used during normal business hours M-F.What I do now is after each round of 
ingestion, create a new set of AMIs, then terminate each instance.The next 
morning, the cluster is restarted by creating a new set of spot requests, using 
the most recent AMIs. At the end of the day, the cluster is turned off by 
terminating the instances (if no data was changed), or by creating a new set of 
AMIs and then terminating the instances.Is there a better way to do this? I'm 
not facing any real problems with this setup, but I want to make sure I'm not 
missing something obvious.Thanks,Chris

Re: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Chris Hostetter
: 
: Indeed, it's a doc problem. A long time ago in a Solr far away, there
: was a bunch of effort to use the "default" collection (collection1).
: When that was changed, this documentation didn't get updated.
: 
: We'll update it in a few, thanks for reporting!

Fixed on erick's behalf because he had to run to a meeting...

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests

...i also wen't ahead to shift the examples to more emphasize using shard 
Ids since that's probably safer/cleaner for most people.



-Hoss
http://www.lucidworks.com/


Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execut e(Runnable)

2016-01-11 Thread Chris Hostetter

: Not sure I'm onboard with the first proposed solution, but yes, I'd open a
: JIRA issue to discuss.

we should standardize the context keys to use use fully 
qualified (org.apache.solr.*) java class name prefixes -- just like we do 
with the logger names themselves.

: 
: - Mark
: 
: On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith 
: wrote:
: 
: > Hi,
: >
: > I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
: > MDC-Parameters even include Line-Breaks.
: > It seems, that Solr takes _all_ MDC parameters and puts them into the
: > Thread-Name, see
: >
: > 
org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).
: >
: > When there is some logging of Solr, the log gets cluttered:
: >
: > [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
: > [zkCallback-14-thread-1-processing-My
: > Custom
: > MDC
: > Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
: > common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
: > [user=SANDHO]: zkClient received AuthFailed
: >
: > (some of my MDC-Parameters are only active in Email-Logs and are not
: > included in the file-log)
: >
: > I think this is a Bug. Solr should only put its own MDC-Parameter into the
: > Thread-Name.
: >
: > Possible Solution: Since all (as far as i can check) invocations in Solr of
: > MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
: > "CloudSolrClient" etc., it would be possible to put a check into
: > MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
: > Prefixes.
: >
: > Should i open a Jira-Issue for this?
: >
: > Thanks,
: >
: > Konstantin
: >
: > Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
: > all jars are in WEB-INF/lib.
: >
: -- 
: - Mark
: about.me/markrmiller
: 

-Hoss
http://www.lucidworks.com/


Re: Boost does not appear in solr debug explain debug

2016-01-12 Thread Chris Hostetter

: ((attr_search:8 attr_search:gb)~2^5.0)
: 
: I hope to be right, but I expect to find a boost in both the values
: matches.

1) "boost" information should show up as a detail of the "queryWeight", 
which is itself a detail of the "weight" of term clauses -- in the output 
you've included below, you've replaced those details with "..." so it's 
not clear if they are missing or not.

2) the query syntax you posted, as far as i know, is not a valid syntax 
for any of the parsers Solr ships with out of the box -- for the default 
parser it produces a syntax error, for parsers like dismax and edismax i'm 
fairly certain what you are getting is a query for the term "2" that has a 
boost of 5.0 on it.  hard to be sure since you didn't give us the full 
details of your request/config...

3) based on the output you provided, you are using a custom 
"EpriceSimilarity" similarity class ... custom similarities are heavily 
involved in both the score, and score explaination generation -- so even 
if the query syntax is valid, and meaningful for whatever query parser you 
are using, it's possible that the EpriceSimilarity is doing something odd 
with that boost info at query time.




: Now, I don't understand why, even if both the terms matches, I don't see
: the boost in the explain.
: 
: 
: true
: 0.10516781
: sum of:
: 
: 
: true
: 0.06489531
: 
: weight(attr_search:8 in 927) [EpriceSimilarity], result of:
: 
: ...
: 
: 
: true
: 0.040272504
: 
: weight(attr_search:gb in 927) [EpriceSimilarity], result of:
: 
: ...
: 
: 
: 
: 
: I suppose to find something like: attr_search:8^5 and attr_search:gb^5 in
: the explain.
: or something that tells I have both the matches so there is a boost
: somewhere.
: What's wrong in my assumption? What's I'm missing?
: 
: 
: -- 
: Vincenzo D'Amore
: email: v.dam...@gmail.com
: skype: free.dev
: mobile: +39 349 8513251
: 

-Hoss
http://www.lucidworks.com/


Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Chris Hostetter

: What I’m finding is that now and then base_url for the replica in 
: state.json is set to the internal IP of the AWS node. i.e.:
: 
: "base_url":"http://10.29.XXX.XX:8983/solr”,
: 
: On other attempts it’s set to the public DNS name of the node:
: 
: "base_url":"http://ec2_host:8983/solr”,
: 
: In my /etc/defaults/solr.in.sh I have:
: 
: SOLR_HOST=“ec2_host”
: 
: which I thought is what I needed to get the public DNS name set in base_url. 

i believe you are correct.  the "now and then" part of your question is 
weird -- it seems to indicate that sometimes the "correct" thing is 
happening, and other times it is not.  

/etc/defaults/solr.in.sh isn't the canonical path for solr.in.sh 
according to the docs/install script for running a production solr 
instance...

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-ServiceInstallationScript

...how *exactly* are you running Solr on all of your nodes?

because my guess is that you've got some kind of inconsistent setup where 
sometimes when you startup (or restart) a new node it does refer to your 
solr.in.sh file, and other times it does not -- so sometimes solr never 
sees your SOLR_HOST option.  In those cases, when it regesters itself with 
ZooKeeper it uses the current IP as a fallback, and then that info gets 
backed into the metadata for the replicas that get created on that node 
at that point in time.

FWIW, you should be able to spot check that the SOLR_HOST is being applied 
correctly by looking at the java process command line args (using PS, or 
loading the SOlr UI in your browser) and checking for the "-Dhost=..." 
option -- if it's not there, then your solr.in.sh probably wasn't read in 
correctly



-Hoss
http://www.lucidworks.com/

Re: Boost query vs function query in edismax query

2016-01-19 Thread Chris Hostetter

: Boost Query (bq) accepts lucene queries. E.g. bq=price:[50 TO 100]^100
: boost and bf parameters accept Function queries, e.g. boost=log(popularity) 

while these statements are both true, they don't tell the full story.

for example you can also specify a function as a query using the 
appropriate parser:  bq={!func}log(popularity)

or turn any query into a function that produces values according to the 
query score:  boost=query({!lucene v='price:[50 TO 100]^100'})


The fundemental difference between bq & boost is:

  "bq" causes a an additional 'boost query' clause to be *added* to your 
  original query 

  "boost" causes the scores for each doc from your original to 
  be *multiplied* by the results of the specified function evaluated 
  against the same doc.

(in both cases "original query" refers to your "q" param 
parsed with respects to qf, pf, etc...)

So a query like this...

   q={!edismax}bar & qf=foo & bq=x:y

...is roughly equivilent to:  q={!lucene}+foo:bar x:y 


While a query like this...

   q={!edismax}bar & qf=foo & boost=query({!lucene v='x:y'})

...is roughly equivilent to... 

   q={!func}prod(query({!edismax qf='foo' v='bar'}), query({!lucene v='x:y'}))


because of how they affect final scores, the 'boost' param is almost 
always what you really want and is really nothing more then shorthand for 
wrapping your entire query in a "BoostQParser" ...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser





-Hoss
http://www.lucidworks.com/


Re: Solr trying to auto-update schema.xml

2016-01-19 Thread Chris Hostetter

: Thanks, very helpful.  I think I'm on the right track now, but when I do a
: post now and my UpdateRequestProcessor extension tries to add a field to a
: document, I get:
: 
: RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1]
: Error adding field 'myField'='2234543'
: 
: The call I'm making is SolrInputDocument.addField(field name, value).  Is
: that trying to add a field to the schema.xml?  The field (myField) is
: already defined in schema.xml.  By calling SolrInputDocument.addField(), my
: goal is to add the field to the document and give it a value.

what is the full stack trace of that error in your logs?

it's not indicating that it's trying to add a *schema* field named 
"myField", it's saying that it's trying to add a *document* field with the 
name 'myField' and the value '2234543' and some soert of problem is 
occurring -- it may be because the schema doesn't have that field, or 
because the FieldType of myField complained that the value wasn't valid 
for tha type, etc...

the stack trace has the answers.


-Hoss
http://www.lucidworks.com/


Re: SearchComponent does not handle negative fq ???

2016-01-22 Thread Chris Hostetter

Concrete details are crucial -- what exactly are you trying, what results 
are you getting, how do those results differ from what you expect?

https://wiki.apache.org/solr/UsingMailingLists

Normally, even when someone only gives a small subset of the crucial 
details needed to answer their question, there are at least some loose 
threads of terms that help other folks make guesses as to what the 
question is about -- but in your case i really can't even begin to 
imagine...   "SearchComponent" is an abstract class implemented by doxens 
of concrete classes that do everything under teh sun.

what aspect of SearchComponent do you think causes you problems with 
negative fq clauses?  or are you trying to ask a question about some 
specific SearchComponent that you didn't mention by name?



: Date: Fri, 22 Jan 2016 15:53:20 -0700 (MST)
: From: vitaly bulgakov 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: SearchComponent does not handle negative fq ???
: 
: From my experiments looks like SearchComponent does not handle negative fq
: correctly.
: Does anybody have have such experience ?
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/SearchComponent-does-not-handle-negative-fq-tp4252688.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/


Near Duplicate Documents, "authorization"? tf/idf implications, spamming the index?

2016-02-15 Thread Chris Morley
Hey Solr people:
  
 Suppose that we did not want to break up our document set into separate 
indexes, but had certain cases where many versions of a document were not 
relevant for certain searches.
  
 I guess this could be thought of as a "authorization" class of problem, 
however it is not that for us.  We have a few other fields that determine 
relevancy to the current query, based on what page the query is coming 
from.  It's kind of like authorization, but not really.
  
 Anyway, I think the answer for how you would do it for authorization would 
solve it for our case too.
  
 So I guess suppose you had 99 users and 100 documents and Document 1 
everybody could see it the same, but for the 99 documents, there was a 
slightly different document, and it was unique for each of 99 users, but 
not "very" unique.  Suppose for instance that the only thing different in 
the text of the 99 different documents was that it was watermarked with the 
users name.  Aren't you spamming your tf/idf at that point?  Is there a way 
around this?  Is there a way to say, hey, group these 99 documents together 
and only count 1 of them for tf/idf purposes?
  
 When doing queries, each user would only ever see 2 documents, Document 1 
, plus whichever other document they specifically owned.
  
 If there are web pages or book chapters I can read or re-read that address 
this class of problem, those references would be great.
  
  
 -Chris.
  
  



Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Chris Hostetter

: I'm testing this on Windows, so that maybe a factor too (the OS is not
: releasing file handles?!)

specifically: Windows won't let Solr delete files on disk that have open 
file handles...

https://wiki.apache.org/solr/FAQ#Why_doesn.27t_my_index_directory_get_smaller_.28immediately.29_when_i_delete_documents.3F_force_a_merge.3F_optimize.3F



-Hoss
http://www.lucidworks.com/


Re: Retrieving 1000 records at a time

2016-02-17 Thread Chris Hostetter

: I have a requirement where I need to retrieve 1 to 15000 records at a
: time from SOLR.
: With 20 or 100 records everything happens in milliseconds.
: When it goes to 1000, 1  it is taking more time... like even 30 seconds.

so far all you've really told us about your setup is that some 
queries with "rows=1000" are slow -- but you haven't really told us 
anything else we can help you with -- for example it's not obvious if you 
mean that you are using start=0 in all ofthose queries andthey are slow, 
or if you mean you are paginating through results (ie: increasing start 
param) 1000 at a time nad it starts getting slow as you page deeply.

you also haven't told us anything about the fields you are returning -- 
how many are there?, what data types are they? are they large string 
values?

how are you measuring the time? are you sure network lag, or client side 
processing of the data as solr returns it isn't the bulk of the time you 
are measuring?  what does the QTime in the solr responses for these slow 
queries say?

my best guesses are that either: you are doing deep paging and conflating 
the increased response time for deep results with an increase in response 
time for large rows params (because you are getting "deeper" faster with a 
large rows#) or you are seeing an increase in processing time on the 
client due ot the large volume of data being returned -- possibly even 
with SolrJ which is designed to parse the entire response into java 
data structures by default before returning to the client.

w/o more concrete information, it's hard to give you advice beyond 
guesses.


potentially helpful links...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html



-Hoss
http://www.lucidworks.com/


[ANNOUNCE] Apache Solr 5.5.0 and Reference Guide for 5.5 available

2016-02-23 Thread Chris Hostetter



Solr is the popular, blazing fast, open source NoSQL search platform from 
the Apache Lucene project. Its major features include powerful full-text 
search, hit highlighting, faceted search, dynamic clustering, database 
integration, rich document (e.g., Word, PDF) handling, and geospatial 
search. Solr is highly scalable, providing fault tolerant distributed 
search and indexing, and powers the search and navigation features of many 
of the world's largest internet sites.


Solr 5.5.0 is available for immediate download at:

  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:

  https://lucene.apache.org/solr/5_5_0/changes/Changes.html

This is expected to be the last 5.x feature release before Solr 6.0.0.

Solr 5.5 Release Highlights:

 * The schema version has been increased to 1.6, and Solr now returns
   non-stored doc values fields along with stored fields

 * The PERSIST CoreAdmin action has been removed

 * The  element is deprecated in favor of a similar
element, in solrconfig.xml

 * CheckIndex now works on ?HdfsDirectory

 * RuleBasedAuthorizationPlugin now allows wildcards in the role, and
   accepts an 'all' permission

 * Users can now choose compression mode in SchemaCodecFactory

 * Solr now supports Lucene's XMLQueryParser

 * Collections APIs now have async support

 * Uninverted field faceting is re-enabled, for higher performance on
   rarely changing indices

Further details of changes are available in the change log available at: 
http://lucene.apache.org/solr/5_5_0/changes/Changes.html


Also available is the Solr Reference Guide for Solr 5.5. This PDF serves 
as the definitive user's manual for Solr 5.5. It can be downloaded from 
the Apache mirror network: https://s.apache.org/Solr-Ref-Guide-PDF


Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html)


Note: The Apache Software Foundation uses an extensive mirroring network 
for distributing releases. It is possible that the mirror you are using 
may not have replicated the release yet. If that is the case, please try 
another mirror. This also applies to Maven access.



-Hoss
http://www.lucidworks.com/


RE: Solr debug 'explain' values differ from the Solr score

2016-03-15 Thread Chris Hostetter

Sounds like a mismatch in the way the BooleanQuery explanation generation 
code is handling situations where there is/isn't a coord factor involved 
in computing the score itself.  (the bug is almost certainly in the 
"explain" code, since that is less rigorously tested in most cases, and 
the score itself is probably correct)

I tried to trivially reproduce the symptoms you described using the 
techproducts example and was unable to generate a discrepency using a 
simple boolean query w/a fuzzy clause...

http://localhost:8983/solr/techproducts/query?q=ipod~%20belkin&fl=id,name,score&debug=query&debug=results&debug.explain.structured=true

...can you distill one of your problematic queries down to a 
shorter/simpler reproducible example, and/or provide us with the field & 
fieldType details for all of the fields used in your example?

(i'm guessing it probably relates to your firstName_phonetic field?)



: Date: Tue, 15 Mar 2016 13:17:04 -0700
: From: Rick Sullivan 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: RE: Solr debug 'explain' values differ from the Solr score
: 
: After some digging and experimentation, here are some more details on the 
issue I'm seeing.
: 
: 
: 1. The adjusted documents' scores are always exactly (debug_score/N), where N 
is the number of OR items in the query. 
: 
: For example, `&q=firstName:gabby~ firstName_phonetic:gabby 
firstName_tokens:(gabby)` will result in some of the documents with 
firstName==GABBY receiving a score 1/3 of the score of other GABBY documents, 
even though the debug explanation shows that they generated the same score.
: 
: 
: 2. This doesn't appear to be a brand new issue, or an issue with SolrCloud.
: 
: I've tested the problem using SolrCloud 5.5.0, Solr 5.5.0 (not cloud), and 
Solr 5.4.1.
: 
: 
: Anyone have any ideas?
: 
: Thanks,
: -Rick
: 
: From: r...@ricksullivan.net
: To: solr-user@lucene.apache.org
: Subject: Solr debug 'explain' values differ from the Solr score
: Date: Thu, 10 Mar 2016 08:34:30 -0800
: 
: Hi,
: 
: I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the 
debug response don't always correspond with the scores Solr assigns to the 
matched documents.
: 
: For example, here is the top-level debug information for two documents 
matched by a query:
: 
: 114628: Object
:   description: "sum of:"
:   details: Array[2]
:   match: true
:   value: 20.542768
: 
: 357547: Object
:   description: "sum of:"
:   details: Array[2]
:   match: true
:   value: 26.517654
: 
: But they have scores
: 
: 114628: 20.542767
: 357547: 13.258826
: 
: I expect the second document to be the most relevant for my query, and the 
debug values seem to agree. However, in the final score I receive, that 
document's score has been adjusted down.
: 
: The relevant debug response information can be found here: 
http://apaste.info/mju
: 
: Does anyone have an idea why the Solr score may differ from the debug value?
: 
: Thanks,
: -Rick   

-Hoss
http://www.lucidworks.com/


Re: Explain style json? Without using wt=json...

2016-03-19 Thread Chris Hostetter

: We are using Solrj to query our solr server, and it works great. 
: However, it uses the binary format wt=javabin, and now when I'm trying 
: to get better debug output, I notice a problem with this. The thing is, 
: I want to include the explain data for each search result, by adding 
: "[explain]" as a field for the fl parameter. And when using [explain 
: style=nl] combined with wt=json, the explain output is proper and valid 
: json. However, when I use something other than wt=json, the explain 
: output is not proper json.

Forget about qt param and/or the responsewriter/response parser used by 
SolrJ for a minute.

When you use "fl=[explain style=nl]" what's happening is that *structured* 
named list containing hte explanation metadata is included in each 
document in the response -- as opposed to "style=text" in which a simple 
whitespace indented string representation of the score explanation is 
included.

Now lets think about the "wt" param -- that controls how *structured* data 
is written over the wirte -- so with "fl=[explain style=nl]" the 
structurred score explanation is written over the wire as json with 
wt=json, or xml with wt=xml, or in solr's custom binary protocol with 
wt=javabin.

As a SolrJ user, regardless of what "wt" value is used by SolrJ under the 
covers, SolrJ will use an appropriate response parser to recreate the 
structure of the data in your client application.  So what you get in your 
java application when you access that psuedo-field on each document is 
going to depend on the (effective) "style" value of that transformer -- 
not the "wt" used.

So for "style=text" your client code will find a java.lang.String 
containing the same simple string representation mentioned above (just 
like you see in your browser with wt=json or wt=xml).  

for "style=nl" you're going to get back and 
org.apache.solr.common.NamedList object (with the same structure as what 
you would see in your browser with wt=xml or wt=json) which you can 
traverse using the appropriate java methods to pull out the various keys & 
values that you want.  if you simply call toString() on this object you're 
going to get a basic dump of the data which might look like broken JSON, 
but is relaly just an attempt at returning some useful toString() info for 
debugging.

I suspect that "NamedList.toString()" output is what's confusing 
you...

: And, the reason I want to explain segment in proper json format, is that 
: I want to turn it into a JSONObject, in order to get proper indentation 
: for easier reading. Because the regular output doesn't have proper 
: indentation.

in your java code, you can walk the NamedList structure you get back 
recursively, and call the appropraite methods to get the list of key=>val 
pairs to add them to convert it to a JSONObject.  There is no server side 
option to write the explain data back as a "String contianing a JSON 
representation of the structured data" which will then be passed as a raw 
string all the way back to the client.



-Hoss
http://www.lucidworks.com/


Re: BCE dates on solr TrieDateField

2016-03-21 Thread Chris Hostetter

BCE dates have historically been problematic because of ambiguity in both  
the ISO format that we use for representing dates as well as the internal 
java representation, more details...

https://issues.apache.org/jira/browse/SOLR-1899

..the best work around I can suggest is to use simple numeric fields to 
represent your dates -- either as millis since whatever epoch you want, or 
as distinct year, month, day fields.


: Date: Mon, 21 Mar 2016 12:53:50 -0400
: From: jude mwenda 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: BCE dates on solr TrieDateField
: 
: Hey,
: 
: I hope this email finds you well. I have a solr.TrieDateField and I am
: trying to send -ve dates to this field. Does the TrieDateField allow for
: -ve dates? when I push the date -1600-01-10 to solr i get 1600-01-10 as the
: date registered. Please advise.
: 
: -- 
: Regards,
: 
: Jude Mwenda
: 

-Hoss
http://www.lucidworks.com/


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Chris Hostetter

: What I'm wondering is, what should one do to fix this issue when it 
: happens. Is there a way to recover? after the WARN appears.

It's just a warning that you have a sub-optimal situation from a 
performance standpoint -- either committing too fast, or warming too much.  
It's not a failure, and Solr will continue to serve queries and process 
updates -- but meanwhile it's detected that the situation it's in involves 
wasted CPU/RAM.

: In my observation, this WARN comes when I hit frequent hard commits or 
: hit re-load config. I'm not planning on to hit frequent hard commits, 
: however sometimes accidently it happens. And when it happens the 
: collection crashes without a recovery.

If you're seeing a crash, then that's a distinct problem from the WARN -- 
it might be related tothe warning, but it's not identical -- Solr doesn't 
always (or even normally) crash in the "Overlapping onDeckSearchers" 
sitaution

So if you are seeing  crashes, please give us more detials about these 
crashes: namely more details about everything you are seeing in your logs 
(on all the nodes, even if only one node is crashing)

https://wiki.apache.org/solr/UsingMailingLists



-Hoss
http://www.lucidworks.com/


Re: Is there any JIRA changed the stored order of multivalued field?

2016-03-28 Thread Chris Hostetter

: We do POST to add data to Solr v4.7 and Solr v5.3.2 respectively. The
: attachmentnames are in 789, 456, 123 sequence:
...  
: And we do GET to select data from solr v4.7 and solr v5.3.2 respectively:
: http://host:port/solr/collection1/select?q=id:1&wt=json&indent=true
...
: Is there any JIRA fixed making this order changed? Thanks!

https://issues.apache.org/jira/browse/SOLR-5777

The bug wasn't in returning stored fields, it was in how the JSON was 
parsed when a field name was specified multiple times (instead of a single 
time with an array of values) when adding a document.





-Hoss
http://www.lucidworks.com/


Re: Solr response error 403 when I try to index medium.com articles

2016-03-30 Thread Chris Hostetter

403 means "forbidden" 

Something about the request Solr is sending -- or soemthing about the IP 
address Solr is connecting from when talking to medium.com -- is causing 
hte medium.com web server to reject the request.

This is something that servers may choose to do if they detect (via 
headers, or missing headers, or reverse ip lookup, or other 
distinctive nuances of how the connection was made) that the 
client connecting to their server isn't a "human browser" (ie: firefox, 
chrome, safari) and is a Robot that they don't want to cooperate with (ie: 
they might be happy toserve their pages to the google-bot crawler, but not 
to some third-party they've never heard of.

The specifics of how/why you might get a 403 for any given url are hard to 
debug -- it might literally depend on how many requests you've sent tothat 
domain in the past X hours.

In general Solr's ContentStream indexing from remote hosts isn't inteded 
to be a super robust solution for crawling arbitrary websites on the web 
-- if that's your goal, then i would suggest you look into running a more 
robust crawler (nutch, droids, Lucidworks Fusion, etc...) that has more 
features and debugging options (notably: rate limiting) and use that code 
to feath the content, then push it to Solr.


: Date: Tue, 29 Mar 2016 20:54:52 -0300
: From: Jeferson dos Anjos 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Solr response error 403 when I try to index medium.com articles
: 
: I'm trying to index some pages of the medium. But I get error 403. I
: believe it is because the medium does not accept the user-agent solr. Has
: anyone ever experienced this? You know how to change?
: 
: I appreciate any help
: 
: 
: 500
: 94
: 
: 
: 
: Server returned HTTP response code: 403 for URL:
: 
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
: 
: 
: java.io.IOException: Server returned HTTP response code: 403 for URL:
: 
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
: at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown
: Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
: Source) at java.lang.reflect.Constructor.newInstance(Unknown Source)
: at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
: at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
: at java.security.AccessController.doPrivileged(Native Method) at
: sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown
: Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
: Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
: Source) at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
: Source) at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87)
: at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
: at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
: at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
: at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
: at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at
: 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
: at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
: at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
: at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
: at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
: at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
: at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
: at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
: at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
: at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
: at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
: at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
: at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
: at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
: at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
: at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
: at org.eclipse.jetty.server.Server.handle(Server.java:368) at
: 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
: at 
org.eclipse.jetty.server.Bl

Re: Load Resource from within Solr Plugin

2016-03-30 Thread Chris Hostetter
: 
: 

1) as a general rule, if you have a  delcaration which includes 
"WEB-INF" you are probably doing something wrong.

Maybe not in this case -- maybe "search-webapp/target" is a completley 
distinct java application and you are just re-using it's jars.  But 9 
times out of 10, when people have a  WEB-INF path they are trying to load 
jars from, it's because they *first* added their jars to Solr's WEB_INF 
directory, and then when that didn't work they added the path to the 
WEB-INF dir as a  ... but now you've got those classes being loaded 
twice, and you've multiplied all of your problems.

2) let's ignore the fact that your path has WEB-INF in it, and just 
assume it's some path to somewhere where on disk that has nothing to 
do with solr, and you want to load those jars.

great -- solr will do that for you, and all of those classes will be 
available to plugins.

Now if you wnat to explicitly do something classloader related, you do 
*not* want to be using Thread.currentThread().getContextClassLoader() ... 
because the threads that execute everything in Solr are a pool of worker 
threads that is created before solr ever has a chance to parse your  directive.

You want to ensure anything you do related to a Classloader uses the 
ClassLoader Solr sets up for plugins -- that's available from the 
SolrResourceLoader.

You can always get the SolrResourceLoader via 
SolrCore.getSolrResourceLoader().  from there you can getClassLoader() if 
you really need some hairy custom stuff -- or if you are just trying to 
load a simple resource file as an InputStream, use openResource(String 
name) ... that will start by checking for it in the conf dir, and will 
fallback to your jar -- so you can have a default resource file shipped 
with your plugin, but allow users to override it in their collection 
configs.


-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter

1) The hard coded implicit default sort order is "score desc" 

2) Whenever a sort results in ties, the final ordering of tied documents 
is non-deterministic

3) currently the behavior is that tied documents are returned in "index 
order" but that can change as segments are merged

4) if you wish to change the beahvior when there is a tie, just add 
additional deterministic sort clauses to your sort param.  This can be 
done at the request level, or as a user specified "default" for the 
request handler...

https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig


: Date: Mon, 4 Apr 2016 13:34:27 -0400
: From: Steven White 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Sort order for *:* query
: 
: Hi everyone,
: 
: When I send Solr the query *:* the result I get back is sorted based on
: Lucene's internal DocID which is oldest to most recent (can someone correct
: me if I get this wrong?)  Given this, the most recently added / updated
: document is at the bottom of the list.  Is there a way to reverse this sort
: order?  If so, how can I make this the default in Solr's solrconfig.xml
: file?
: 
: Thanks
: 
: Steve
: 

-Hoss
http://www.lucidworks.com/


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread Chris Hostetter

: When I attempt the second example, of loading the *.xml files, I receive an
: error back.  I tried just one of the XMLs and receive the same error.  

Yeah ... there's a poor assumption here in the tutorial.  note in 
particular this paragraph...

--SNIP--
Solr's install includes a handful of Solr XML formatted files with example 
data (mostly mocked tech product data). NOTE: This tech product data has a 
more domain-specific configuration, including schema and browse UI. The 
bin/solr script includes built-in support for this by running bin/solr 
start -e techproducts which not only starts Solr but also then indexes 
this data too (be sure to bin/solr stop -all before trying it out). 
However, the example below assumes Solr was started with bin/solr start -e 
cloud to stay consistent with all examples on this page, and thus the 
collection used is "gettingstarted", not "techproducts".
--SNIP--

If you use "bin/solr start -e techproducts" (or explicitly create a solr 
collection using the "sample_techproducts" config set) then those 
documents will index just fine -- but the assumption written here in the 
tutorial that you can index those tech product documents to the same 
gettingstarted collection you've been indexing to earlier in the tutorial 
is definitely flawed -- the fieldtype deduction logic that's applied for 
the gettingstarted collection (and the specific type deduced from the 
earlier docs) won't neccessarily apply to the sample tech product 
documents.

https://issues.apache.org/jira/browse/SOLR-8943


-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter

: You can sort like this (I believe that _version_ is the internal id/index
: number for the document, but you might want to verify)

that is not true, and i strongly advise you not to try to sort on the 
_version_ field ... for some queries/testing it may deceptively *look* 
like it's sorting by the order the documents are added, but it will not 
actaully sort in any useful way -- two documents added in sequence A, B 
may have version values that are not in ascending sequence (depending on 
the hash bucket their uniqueKeys fall in for routing purposes) so sorting 
on that field will not give you any sort of meaningful order

If you want to sort by "recency" or "date added you need to add a 
date based field to capture this.  see for example the 
TimestampUpdateProcessorFactory...

https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html



-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter
: 
: Not sure I understand... _version_ is time based and hence will give
: roughly the same accuracy as something like
: TimestampUpdateProcessorFactory that you recommend below.  Both

Hmmm... last time i looked, i thought _version_ numbers were allocated & 
incremented on a per-shard basis and "time" was only used for initial 
seeding when the leader started up -- so in a stable system running for 
a long time, if shardA gets signifcantly more updates then shardB the 
_version_ numbers can get skewed and a new doc in shardB might be updated 
with a _version_ less then the _version_ of a document added to shardA 
well before that.

But maybe I'm remembering wrong?



-Hoss
http://www.lucidworks.com/


Re: Complex Sort

2016-04-04 Thread Chris Hostetter

: I am not sure how to use "Sort By Function" for Case.
: 
: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
: 
: Can you tell how to fetch 40 when input is 10.

Something like...

if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))

But i suspect there may be a much better way to achieve your ultimate goal 
if you tell us what it is.  what do these fields represent? what makes 
these numeric valuessignificant? do you know which values are significant 
when indexing, or do they vary for every query?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss
http://www.lucidworks.com/


Re: How to use TZ parameter in a query

2016-04-06 Thread Chris Hostetter

Please note the exact description of hte property on the URL you 
mentioned..

"The TZ parameter can be specified to override the default TimeZone (UTC) 
used for the purposes of adding and rounding in date math"

The newer ref guide docs for this param also explain...

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

"By default, all date math expressions are evaluated relative to the UTC 
TimeZone, but the TZ parameter can be specified to override this 
behaviour, by forcing all date based addition and rounding to be relative 
to the specified time zone."


The TZ param does not change the *format* of the response in XML or JSON, 
which is an ISO standard format that always uses UTC for rendering as a 
string, because it is unambiguious regardless of the client parsing it.   
Just because you might want "date range faceting by day according to 
localtime in denver" doesn't mean your python or perl or javascript code 
for parsing the response will suddenly realize that the string responses 
are sometimes GMT-7 and sometimes GMT-8 (depending on the local daylight 
savings rules in colorado)



-Hoss
http://www.lucidworks.com/


Re: Solr update fails with “Could not initialize class sun.nio.fs.LinuxNativeDispatcher”

2016-04-07 Thread Chris Hostetter

hat's a strainge error to get.

I can't explain why LinuxFileSystem can't load LinuxNativeDispatcher, but 
you can probably bypass hte entire situation by explicitly configuring 
ConcurrentMergeScheduler with defaults so that it doesn't try determine 
wether you are using an SSD or "spinning" disk...

http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/ConcurrentMergeScheduler.html
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments

Something like this in your indexConfig settings...


  42
  7


...will force those specific settings, instead of trying to guess 
defaults.

I haven't tested this, but in theory you can also use something like to 
indicate definitively that you are using a spinning disk (or not) but let 
it pick the appropriate default values for the merge count & 
threads accordingly ...


  true




: Date: Thu, 7 Apr 2016 22:56:54 +
: From: David Moles 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: Solr update fails with “Could not initialize class
: sun.nio.fs.LinuxNativeDispatcher”
: 
: Hi folks,
: 
: New Solr user here, attempting to apply the following Solr update command via 
curl
: 
: curl 'my-solr-server:8983/solr/my-core/update?commit=true' \
:   -H 'Content-type:application/json' -d \
:   
'[{"my_id_field":"some-id-value","my_other_field":{"set":"new-field-value"}}]'
: 
: I'm getting an error response with a stack trace that reduces to:
: 
: Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
sun.nio.fs.LinuxNativeDispatcher
: at sun.nio.fs.LinuxFileSystem.getMountEntries(LinuxFileSystem.java:81)
: at sun.nio.fs.LinuxFileStore.findMountEntry(LinuxFileStore.java:86)
: at sun.nio.fs.UnixFileStore.(UnixFileStore.java:65)
: at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:44)
: at 
sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:51)
: at 
sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:39)
: at 
sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
: at java.nio.file.Files.getFileStore(Files.java:1461)
: at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:528)
: at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:483)
: at org.apache.lucene.util.IOUtils.spins(IOUtils.java:472)
: at org.apache.lucene.util.IOUtils.spins(IOUtils.java:447)
: at 
org.apache.lucene.index.ConcurrentMergeScheduler.initDynamicDefaults(ConcurrentMergeScheduler.java:371)
: at 
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:457)
: at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1817)
: at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2761)
: at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2866)
: at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2833)
: at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:586)
: at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
: at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
: at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
: at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
: at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
: at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
: at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
: at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
: at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
: at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
: at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
: ... 22 more
: 
: It looks like sun.nio.fs can't find its own classes, which seems odd. Solr is 
running with OpenJDK 1.8.0_77 on Amazon Linux AMI release 2016.03.
: 
: Does anyone know what might be going on here? Is it an OpenJDK / Amazon Linux 
problem?
: 
: --
: David Moles
: UC Curation Center
: California Digital Library
: 
: 
: 

-Hoss
http://www.lucidworks.com/

Re: Range filters: inclusive?

2016-04-11 Thread Chris Hostetter

: When I perform a range query of ['' TO *] to filter out docs where a
: particular field has a value, this does what I want, but I thought using the
: square brackets was inclusive, so empty-string values should actually be
: included?

I'm not sure i understand your question ... if you are dealing with 
something like a StrField, then the empty string (ie: 0 byte long string: 
"") is in fact a real term.  you are inclusively including that term in 
what you match on.

That is differnet from matching docs that do not have any values at all 
-- ie: they do not contain a signle term.



-Hoss
http://www.lucidworks.com/


Re: Range filters: inclusive?

2016-04-11 Thread Chris Hostetter
: > When I perform a range query of ['' TO *] to filter out docs where a
: > particular field has a value, this does what I want, but I thought using the
: > square brackets was inclusive, so empty-string values should actually be
: > included?
: 
: They should be.  Are you saying that zero length values are not
: included by the range query above?

Oh ... maybe i missread the question ... are you are saying that when you 
add a document you explicitly include the empty string as a field value, 
but later when yoy search for ['' TO *] those documents do not get 
returned?

what exactly is the field type you are using, and what update processors 
do you have configured?

If you are using a StrField (w/o any special processors) then the literal 
value "" should exist a a term -- but if you are using a TextField w/some 
analyzer then the analyzer may be throwing that input away.  

Likewise there are update processors that do this explicitly: 

https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html

-Hoss
http://www.lucidworks.com/


Re: How to set multivalued false, using SolrJ

2016-04-11 Thread Chris Hostetter

:  Can you do me a favour, I use solrJ to index, but I get all the 
:  Field is multivalued. How can I set my Field to not 
:  multivalued, can you tell me how to setting use solrJ.

If you are using a "Managed Schema" (which was explicitly configured in 
most Solr 5.x exampleconfigs, and is now the implicit default in Solr 6) 
you can use the Schema API to make these changes.  There is also a 
"SchemaRequest" convinience class for this if you are a SolrJ user...

https://cwiki.apache.org/confluence/display/solr/Schema+API
https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.html

SolrClient client = ...;
SchemaRequest req = new SchemaRequest.ReplaceField(...);
...
req.process(client)




-Hoss
http://www.lucidworks.com/


Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from file: null

2016-04-11 Thread Chris Hostetter

: I'm upgrading a plugin and use the AbstractSolrTestCase for tests. My tests
: work fine in 5.X but when I upgraded to 6.X the tests sometimes throw an
: error during initialization. Basically it says,
: "org.apache.solr.common.SolrException: Error instantiating
: shardHandlerFactory class
: [org.apache.solr.handler.component.HttpShardHandlerFactory]: Unable to
: build KeyStore from file: null"

Ugh.  and of course there are no other details to troubleshoot that 
because the stupid error handling doesn't wrap the original exception -- 
it just throws it away.

I'm pretty sure the problem you are seeing (unfortunately manifested in 
a really confusing way) is that SolrTestCaseJ4 (and AbstractSolrTestCase 
which subclasses it) has randomized the use of SSL for a while, but at 
some point it also started randomizing the use of client auth -- but this 
randomization happens very infrequently.

(for details, check out the SSLTestConfig and it's usage in 
SolrTestCaseJ4)

The bottom line is, in order for the (randomized) clientAuth stuff to 
work, SolrTestCaseJ4 assumes it can find an 
"../etc/test/solrtest.keystore" realtive to ExternalPaths.SERVER_HOME.

If you don't have that in your test setup, bad things happen.

I believe the quickest way for you to resolve this failure in your own 
usage of AbstractSolrTestCase is to just add the @SupressSSL annotation to 
your tests -- assuming you don't care about randomly testing your plugin 
with SSL authentication (for 99.999% of solr plugins, wether solr is being 
used over http or https shouldn't matter for test purposes)

If you do want to include randomized SSL testing, then you need to make 
sure your that when/how you run your tests, ExternalPaths.SERVER_HOME 
resolves to the correct place, and "../etc/test/solrtest.keystore" 
resolves to a real file solr can use as the keystore.

I'll file some Jiras to try and improve the error handline in these 
situations.



-Hoss
http://www.lucidworks.com/


Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from file: null

2016-04-11 Thread Chris Hostetter

https://issues.apache.org/jira/browse/SOLR-8970
https://issues.apache.org/jira/browse/SOLR-8971

: Date: Mon, 11 Apr 2016 20:35:22 -0400
: From: Joe Lawson 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from
:  file: null
: 
: Thanks for the insight. I figured that it was something like that and
: perhaps I has thread contention on a resource that wasn't really thread
: safe.
: 
: I'll give your suggestions a shot tomorrow.
: 
: Regards,
: 
: Joe Lawson
: On Apr 11, 2016 8:24 PM, "Chris Hostetter"  wrote:
: 
: >
: > : I'm upgrading a plugin and use the AbstractSolrTestCase for tests. My
: > tests
: > : work fine in 5.X but when I upgraded to 6.X the tests sometimes throw an
: > : error during initialization. Basically it says,
: > : "org.apache.solr.common.SolrException: Error instantiating
: > : shardHandlerFactory class
: > : [org.apache.solr.handler.component.HttpShardHandlerFactory]: Unable to
: > : build KeyStore from file: null"
: >
: > Ugh.  and of course there are no other details to troubleshoot that
: > because the stupid error handling doesn't wrap the original exception --
: > it just throws it away.
: >
: > I'm pretty sure the problem you are seeing (unfortunately manifested in
: > a really confusing way) is that SolrTestCaseJ4 (and AbstractSolrTestCase
: > which subclasses it) has randomized the use of SSL for a while, but at
: > some point it also started randomizing the use of client auth -- but this
: > randomization happens very infrequently.
: >
: > (for details, check out the SSLTestConfig and it's usage in
: > SolrTestCaseJ4)
: >
: > The bottom line is, in order for the (randomized) clientAuth stuff to
: > work, SolrTestCaseJ4 assumes it can find an
: > "../etc/test/solrtest.keystore" realtive to ExternalPaths.SERVER_HOME.
: >
: > If you don't have that in your test setup, bad things happen.
: >
: > I believe the quickest way for you to resolve this failure in your own
: > usage of AbstractSolrTestCase is to just add the @SupressSSL annotation to
: > your tests -- assuming you don't care about randomly testing your plugin
: > with SSL authentication (for 99.999% of solr plugins, wether solr is being
: > used over http or https shouldn't matter for test purposes)
: >
: > If you do want to include randomized SSL testing, then you need to make
: > sure your that when/how you run your tests, ExternalPaths.SERVER_HOME
: > resolves to the correct place, and "../etc/test/solrtest.keystore"
: > resolves to a real file solr can use as the keystore.
: >
: > I'll file some Jiras to try and improve the error handline in these
: > situations.
: >
: >
: >
: > -Hoss
: > http://www.lucidworks.com/
: >
: 

-Hoss
http://www.lucidworks.com/


Re: Commiting with no updates

2016-04-13 Thread Chris Hostetter

the autoCommit settings initialize trackers so that they only fire after 
some updates have been made -- don't think of it as a cron that fires 
every X seconds, think of it as an update monitor that triggers timers.  
if an update comes in, and there are no timers currently active, a timer 
is created to to the commit in X seconds.

independend of autocommit, there is other intelegence lower down in solr 
to try and recognize if a redundet commit is fired but no changes will 
result in a new search, to prevent unnneccessary object churn and cache 
clearing.

: My autoSoftCommit is set to 1 minute.  Does this actually affect things if no
: documents have actually been updated/created?  Will this also affect the
: clearing of any caches?
: 
: Is this also the same for hard commits, either with autoCommit or making an
: explicit http request to commit.


-Hoss
http://www.lucidworks.com/


Re: Solr Support for BM25F

2016-04-14 Thread Chris Hostetter

: a per field basis. I understand BM25 similarity is now supported in Solr

BM25 has been supported for a while, the major change recently is that it 
is now the underlying default in Solr 6.

: but I was hoping to be able to configure k1 and b for different fields such
: as title, description, anchor etc, as they are structured documents.

What you can do in Solr is configured idff Similarity instances on a 
per-fieldType basis -- but you can have as many fieldTypes in your schema 
as you want, so you could have one type used just by your title field, and 
a diff type used just by your description field, etc...

: Current Solr Version 5.4.1

You can download the solr refrence guide for 5.4 from here...

http://archive.apache.org/dist/lucene/solr/ref-guide/

You'll want to search for Similarity and in particularly 
"SchemaSimilarityFactory" which (in 5.4) you'll have to configure 
explicitly in order to use diff BM25Similarity instances for each 
fieldType.

In 6.0, SchemaSimilarityFactory is the global default, with BM25 as 
the per-field default...

The current (draft) guide for 6.0 (not yet released) has info on that...
https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements




-Hoss
http://www.lucidworks.com/


RE: Shard ranges seem incorrect

2016-04-14 Thread Chris Hostetter

: Hi - bumping this issue. Any thoughts to share?

Shawn's response to your email seemed spot on acurate to me -- is there 
something about his answer that doesn't match up with what you're seeing? 
can you clarify/elaborate your concerns?

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c570d0a03.5010...@elyograg.org%3E


 :  
: -Original message-
: > From:Markus Jelsma 
: > Sent: Tuesday 12th April 2016 13:49
: > To: solr-user 
: > Subject: Shard ranges seem incorrect
: > 
: > Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we 
noticed something odd, the hashing ranges don't make sense (full state.json 
below):
: > shard1 Range: 8000-d554
: > shard2 Range: d555-2aa9
: > shard3 Range: 2aaa-7fff
: > 
: > We've also noticed ranges not going from 0 to  for a 5.5 create 
single shard collection. Another collection created on an older (unknown) 
release has correct shard ranges. Any idea what's going on?
: > Thanks,
: > Markus
: > 
: > {"logs":{
: > "replicationFactor":"3",
: > "router":{"name":"compositeId"},
: > "maxShardsPerNode":"9",
: > "autoAddReplicas":"false",
: > "shards":{
: >   "shard1":{
: > "range":"8000-d554",
: > "state":"active",
: > "replicas":{
: >   "core_node3":{
: > "core":"logs_shard1_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node4":{
: > "core":"logs_shard1_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node8":{
: > "core":"logs_shard1_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}},
: >   "shard2":{
: > "range":"d555-2aa9",
: > "state":"active",
: > "replicas":{
: >   "core_node1":{
: > "core":"logs_shard2_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node2":{
: > "core":"logs_shard2_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node9":{
: > "core":"logs_shard2_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}},
: >   "shard3":{
: > "range":"2aaa-7fff",
: > "state":"active",
: > "replicas":{
: >   "core_node5":{
: > "core":"logs_shard3_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node6":{
: > "core":"logs_shard3_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node7":{
: > "core":"logs_shard3_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}
: > 
: > 
: > 
: > 
: > 
: 

-Hoss
http://www.lucidworks.com/


Re: UUID processor handling of empty string

2016-04-14 Thread Chris Hostetter

I'm also confused by what exactly you mean by "doesn't work" but a general 
suggestion you can try is putting the 
RemoveBlankFieldUpdateProcessorFactory before your UUID Processor...

https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html

If you are also worried about strings that aren't exactly empty, but 
consist only of whitespace, you can put TrimFieldUpdateProcessorFactory 
before RemoveBlankFieldUpdateProcessorFactory ...

https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html


: Date: Thu, 14 Apr 2016 12:30:24 -0700
: From: Erick Erickson 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user 
: Subject: Re: UUID processor handling of empty string
: 
: What do you mean "doesn't work"? An empty string is
: different than not being present. Thee UUID update
: processor (I'm pretty sure) only adds a field if it
: is _absent_. Specifying it as an empty string
: fails that test so no value is added.
: 
: At that point, if this uuid field is also the ,
: then each doc that comes in with an empty field will replace
: the others.
: 
: If it's _not_ the , the sorting will be confusing.
: All the empty string fields are equal, so the tiebreaker is
: the internal Lucene doc ID, which may change as merges
: happen. You can specify secondary sort fields to make the
: sort predictable (the  field is popular for this).
: 
: Best,
: Erick
: 
: On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla  
wrote:
: > Hi,
: >
: > I have configured solr schema to generate unique id for a collection using
: > UUIDUpdateProcessorFactory
: >
: > I am seeing a peculiar behavior - if the unique 'id' field is explicitly
: > set as empty string in the SolrInputDocument, the document gets indexed
: > with UUID update processor generating the id.
: > However, sorting does not work if uuid was generated in this way. Also
: > cursor functionality that depends on unique id sort also does not work.
: > I guess the correct behavior would be to fail the indexing if user provides
: > an empty string for a uuid field.
: >
: > The issues do not happen if I omit the id field from the SolrInputDocument .
: >
: > SolrInputDocument
: >
: > solrDoc.addField("id", "");
: >
: > ...
: >
: > I am using schema similar to below-
: >
: > 
: >
: > 
: >
: > 
: >
: > id
: >
: > 
: > 
: > 
: >   id
: > 
: > 
: > 
: >
: >
: >  
: >
: >  uuid
: >
: > 
: >
: >
: > Thanks,
: > Susmit
: 

-Hoss
http://www.lucidworks.com/


Re: How to get stats on currency field?

2016-04-14 Thread Chris Hostetter

The thing to remember about currency fields is that even if you tend to 
only put one currency value in it, any question of interpreting the values 
in that field has to be done relative to a specific currency, and the 
exchange rates may change dynamically.

So use the currency function to get a numerical value in some explicit 
currency at the moment you execute the query, and then do stats over that 
function.

Something like this IIRC: stats.field={!func}currency(your_field,EUR)



-Hoss
http://www.lucidworks.com/


Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-15 Thread Chris Hostetter

: At first, I saw the same exception you got ... but after a little while
: I figured out that this is because I was running the program more than
: once without deleting everything in the baseDir -- so the zookeeper
: server was starting with an existing database already containing the
: solr.xml.  When MiniSolrCloudCluster is used in Solr tests, the baseDir
: is newly created for each test class, so this doesn't happen.

Yeah ... this is interesting.  I would definitely suggest that for now you 
*always* start with a clean baseDir.  I've opened an issue to figure out 
wether MiniSolrCloudCluster should fail if you don't, or make it a 
supported usecase...

https://issues.apache.org/jira/browse/SOLR-8999



-Hoss
http://www.lucidworks.com/


Re: Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5

2016-04-15 Thread Chris Hostetter

I can't explain the results you are seeing, but you also didn't provide us 
with your schema.xml (ie; how are "text" and "text_auto" defined?) or 
enough details ot try and reproduce on a small scale (ie: what does the 
source data look like in the documents where these suggestion values 
are coming from.

If i start up the "bin/solr -e techproducts" example, which is also 
configured to use DocumentDictionaryFactory, I don't see any duplicate 
suggestions...

curl 
'http://localhost:8983/solr/techproducts/suggest?suggest.dictionary=mySuggester&suggest=true&suggest.build=true&wt=json'
{"responseHeader":{"status":0,"QTime":13},"command":"build"}
curl 
'http://localhost:8983/solr/techproducts/suggest?wt=json&indent=true&suggest.dictionary=mySuggester&suggest=true&suggest.q=elec'
{
  "responseHeader":{
"status":0,
"QTime":1},
  "suggest":{"mySuggester":{
  "elec":{
"numFound":3,
"suggestions":[{
"term":"electronics and computer1",
"weight":2199,
"payload":""},
  {
"term":"electronics",
"weight":649,
"payload":""},
  {
"term":"electronics and stuff2",
"weight":279,
"payload":""}]

...can you provide us with some precises (and ideally minimal) steps to 
reproduce the problem you are describing?


For Example...

1) "Add XYZ to the 5.5 sample_techproducts_configs solrconfig.xml"
2) "Add ABC to the 5.5 sample_techproducts_configs managed-schema"
3) run this curl command to index a few sample documents...
4) run this curl command to see some suggest results that have duplicates 
in them based on the sample data from step #3


?


-Hoss
http://www.lucidworks.com/


Re: [Installation] Solr log directory

2016-05-03 Thread Chris Hostetter

: I have a question for installing solr server. Using ' 
: install_solr_service.sh' with option -d , the solr home directory can be 
: set. But the default log directory is under $SOLR_HOME/logs.
: 
: Is it possible to specify the logs directory separately from solr home 
directory during installation? 

install_solr_service.sh doesn't do anything special as far where logs 
should live  -- it just writes out a (default) 
"/etc/default/$SOLR_SERVICE.in.sh" (if 
it doesn't already exist) that specifies a (default) log directory for 
solr to use once the service starts

you are absolutely expected to overwrite that "$SOLR_SERVICE.in.sh" file 
with your own specific settings -- in fact you *must* to configure things 
like ZooKeeper or SSL -- after the installation script finishes, and you 
are welcome to change the SOLR_LOGS_DIR setting to anything you want.



-Hoss
http://www.lucidworks.com/


Re: OOM script executed

2016-05-04 Thread Chris Hostetter

: You could, but before that I'd try to see what's using your memory and see
: if you can decrease that. Maybe identify why you are running OOM now and
: not with your previous Solr version (assuming you weren't, and that you are
: running with the same JVM settings). A bigger heap usually means more work
: to the GC and less memory available for the OS cache.

FWIW: One of the bugs fixed in 6.0 was regarding the fact that the 
oom_killer wasn't being called properly on OOM -- so the fact that you are 
getting OOMErrors in 6.0 may not actually be a new thing, it may just be 
new that you are being made aware of them by the oom_killer

https://issues.apache.org/jira/browse/SOLR-8145

That doesn't negate Tomás's excelent advice about trying to determine
what is causing the OOM, but i wouldn't get too hung up on "what changed" 
between 5.x and 6.0 -- possibly nothing other then "now you know about 
it."



: 
: Tomás
: 
: On Sun, May 1, 2016 at 11:20 PM, Bastien Latard - MDPI AG <
: lat...@mdpi.com.invalid> wrote:
: 
: > Hi Guys,
: >
: > I got several times the OOM script executed since I upgraded to Solr6.0:
: >
: > $ cat solr_oom_killer-8983-2016-04-29_15_16_51.log
: > Running OOM killer script for process 26044 for Solr on port 8983
: >
: > Does it mean that I need to increase my JAVA Heap?
: > Or should I do anything else?
: >
: > Here are some further logs:
: > $ cat solr_gc_log_20160502_0730:
: > }
: > {Heap before GC invocations=1674 (full 91):
: >  par new generation   total 1747648K, used 1747135K [0x0005c000,
: > 0x00064000, 0x00064000)
: >   eden space 1398144K, 100% used [0x0005c000, 0x00061556,
: > 0x00061556)
: >   from space 349504K,  99% used [0x00061556, 0x00062aa2fc30,
: > 0x00062aab)
: >   to   space 349504K,   0% used [0x00062aab, 0x00062aab,
: > 0x00064000)
: >  concurrent mark-sweep generation total 6291456K, used 6291455K
: > [0x00064000, 0x0007c000, 0x0007c000)
: >  Metaspace   used 39845K, capacity 40346K, committed 40704K, reserved
: > 1085440K
: >   class spaceused 4142K, capacity 4273K, committed 4368K, reserved
: > 1048576K
: > 2016-04-29T21:15:41.970+0200: 20356.359: [Full GC (Allocation Failure)
: > 2016-04-29T21:15:41.970+0200: 20356.359: [CMS:
: > 6291455K->6291456K(6291456K), 12.5694653 secs]
: > 8038591K->8038590K(8039104K), [Metaspace: 39845K->39845K(1085440K)],
: > 12.5695497 secs] [Times: user=12.57 sys=0.00, real=12.57 secs]
: >
: >
: > Kind regards,
: > Bastien
: >
: >
: 

-Hoss
http://www.lucidworks.com/

Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-04 Thread Chris Hostetter

: Thanks, Nick. Do we know any suggested # for file descriptor limit with
: Solr6?  Also wondering why i haven't seen this problem before with Solr 5.x?

are you running Solr6 on the exact same host OS that you were running 
Solr5 on?

even if you are using the "same OS version" on a diff machine, that could 
explain the discrepency if you (or someone else) increased the file 
descriptor limit on the "old machine" but that neverh appened on the 'new 
machine"



: On Wed, May 4, 2016 at 4:54 PM, Nick Vasilyev 
: wrote:
: 
: > It looks like you have too many open files, try increasing the file
: > descriptor limit.
: >
: > On Wed, May 4, 2016 at 3:48 PM, Susheel Kumar 
: > wrote:
: >
: > > Hello,
: > >
: > > I am trying to setup 2 node Solr cloud 6 cluster with ZK 3.4.8 and used
: > the
: > > install service to setup solr.
: > >
: > > After launching Solr Admin Panel on server1, it looses connections in few
: > > seconds and then comes back and other node server2 is marked as Down in
: > > cloud graph. After few seconds its loosing the connection and comes back.
: > >
: > > Any idea what may be going wrong? Has anyone used Solr 6 with ZK 3.4.8.
: > > Have never seen this error before with solr 5.x with ZK 3.4.6.
: > >
: > > Below log from server1 & server2.  The ZK has 3 nodes with chroot
: > enabled.
: > >
: > > Thanks,
: > > Susheel
: > >
: > > server1/solr.log
: > >
: > > 
: > >
: > >
: > > 2016-05-04 19:20:53.804 INFO  (qtp1989972246-14) [   ]
: > > o.a.s.c.c.ZkStateReader path=[/collections/collection1]
: > > [configName]=[collection1] specified config exists in ZooKeeper
: > >
: > > 2016-05-04 19:20:53.806 INFO  (qtp1989972246-14) [   ]
: > o.a.s.s.HttpSolrCall
: > > [admin] webapp=null path=/admin/collections
: > > params={action=CLUSTERSTATUS&wt=json&_=1462389588125} status=0 QTime=25
: > >
: > > 2016-05-04 19:20:53.859 INFO  (qtp1989972246-19) [   ]
: > > o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params
: > > action=LIST&wt=json&_=1462389588125 and sendToOCPQueue=true
: > >
: > > 2016-05-04 19:20:53.861 INFO  (qtp1989972246-19) [   ]
: > o.a.s.s.HttpSolrCall
: > > [admin] webapp=null path=/admin/collections
: > > params={action=LIST&wt=json&_=1462389588125} status=0 QTime=2
: > >
: > > 2016-05-04 19:20:57.520 INFO  (qtp1989972246-13) [   ]
: > o.a.s.s.HttpSolrCall
: > > [admin] webapp=null path=/admin/cores
: > > params={indexInfo=false&wt=json&_=1462389588124} status=0 QTime=0
: > >
: > > 2016-05-04 19:20:57.546 INFO  (qtp1989972246-15) [   ]
: > o.a.s.s.HttpSolrCall
: > > [admin] webapp=null path=/admin/info/system
: > > params={wt=json&_=1462389588126} status=0 QTime=25
: > >
: > > 2016-05-04 19:20:57.610 INFO  (qtp1989972246-13) [   ]
: > > o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params
: > > action=LIST&wt=json&_=1462389588125 and sendToOCPQueue=true
: > >
: > > 2016-05-04 19:20:57.613 INFO  (qtp1989972246-13) [   ]
: > o.a.s.s.HttpSolrCall
: > > [admin] webapp=null path=/admin/collections
: > > params={action=LIST&wt=json&_=1462389588125} status=0 QTime=3
: > >
: > > 2016-05-04 19:21:29.139 INFO  (qtp1989972246-5980) [   ]
: > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException)
: > caught
: > > when connecting to {}->http://server2:8983: Too many open files
: > >
: > > 2016-05-04 19:21:29.139 INFO  (qtp1989972246-5983) [   ]
: > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException)
: > caught
: > > when connecting to {}->http://server2:8983: Too many open files
: > >
: > > 2016-05-04 19:21:29.139 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException)
: > caught
: > > when connecting to {}->http://server2:8983: Too many open files
: > >
: > > 2016-05-04 19:21:29.141 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983
: > >
: > > 2016-05-04 19:21:29.141 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException)
: > caught
: > > when connecting to {}->http://server2:8983: Too many open files
: > >
: > > 2016-05-04 19:21:29.142 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983
: > >
: > > 2016-05-04 19:21:29.142 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient I/O exception (java.net.SocketException)
: > caught
: > > when connecting to {}->http://server2:8983: Too many open files
: > >
: > > 2016-05-04 19:21:29.142 INFO  (qtp1989972246-5984) [   ]
: > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983
: > >
: > > 2016-05-04 19:21:29.140 INFO  (qtp1989972246-5983) [   ]
: > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983
: > >
: > > 2016-05-04 19:21:29.140 INFO  (qtp1989972246-5980) [   ]
: > > o.a.h.i.c.DefaultHttpClient Retrying connect to {}->http://server2:8983
: > >
: > > 2016-05-04 19:21:29.143 INFO  (qtp1989972246-598

Re: shareSchema property unknown in new solr.xml format

2015-07-20 Thread Chris Hostetter

: > I’m getting this error on startup:
: > 
: >  section of solr.xml contains 1 unknown config parameter(s): 
[shareSchema]

Pretty sure that's because it was never a supported property of the 
 section -- even in the old format of solr.xml.

it's just a top level property -- ie: create a child node for it 
directly under , outside of .


Ah ... i see, this page is giving an incorrect example...

https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format

...I'll fix that.




-Hoss
http://www.lucidworks.com/

Re: set the param [facet.offset] for EVERY [facet.pivot]

2015-07-20 Thread Chris Hostetter

: HI All:I need a pagenigation with facet offset.

: There are two or more fields in [facet.pivot], but only one value 
: for [facet.offset], eg: facet.offset=10&facet.pivot=field_1,field_2. 
: In this condition, field_2 is 10's offset and then field_1 is 10's 
: offset. But what I want is field_2 is 1's offset and field_1 is 10's 
: offset. How can I fix this problem or try another way to complete?

As noted in the ref guide...

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.offsetParameter

...facet.offset supports per field overriding, just like like most (all?) 
facet options...

   facet.pivot=field_1,field_2
   f.field_2.facet.offset=10

...or using localparams (in case you are using field_2 in another 
facet.pivot param...

   facet.pivot={!key=pivot2}field_0,field_2
   facet.pivot={!key=pivot1 f.field_2.facet.offset=10}field_1,field_2
   

-Hoss
http://www.lucidworks.com/


Re: Multiple boost queries on a specific field

2015-07-20 Thread Chris Hostetter

: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/
: My first results have provider A.

: ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​B​^​1.5 
: My​ first results have provider B. Good!


: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​(​A^2.0​ B^1.5)​/
: Then my first results have provider B. It's not logical.

Why is that not logical?

If you provide us with the details from your schema about the 
provider field, and the debug=true output from your query showing the 
score explanations for the top doc of that query (and for the first "provider 
A" 
doc so we can compare) then we might be able to help explain why a "B" doc 
sows up before an "A" doc -- but you haven't provided near enough info for 
anyhitng other then a wild guess...

https://wiki.apache.org/solr/UsingMailingLists


...my best wild guess is that it has to do with either the IDF of those 
two terms, or the lengthNorm of the "provider" field for the various docs.


Most likely "bq" isn't even remotely what you want however, since it's an 
*additive* boost, and will be affected by the overall queryNorm of the 
query it's a part of -- so even if you get things dialled in just like you 
want them with a "*:*" query, you might find yourself with totlaly 
differnet results once you start using a "real" query.

Assuming every document has at most 1 "provider" then what would probably 
work best for you is to use (edismax with) something like this...

boost=max(prod(2.0, termfreq(provider,'A')),
  prod(1.5, termfreq(provider,'B')),
  prod(..., termfreq(provider,...)),
  ...)

...or if you want use edismax, then instead wrap the "boost" QParser 
arround your dismax query...

  q={!boost b=$boost v=$qq defType=dismax}
  qq=...whatever your normal dismax query is...
  ...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser
https://cwiki.apache.org/confluence/display/solr/Function+Queries

What that will give you (in either case) is a *multiplicitive* boost by 
each of those values depending on which of those terms exists in the 
provier field -- the "prod" function multiples each value by "1" if the 
corrisponding provider string is in the term once, or "0" if that provider 
isn't in the field (hence the assumption of "at most 1 provider") and then 
the max function just picks one.

Depending on the specifics of your usecase, you could alterantive 
use sum(...) instead of max if some docs are from multiple providers, 
etc...


But the details of *why* you are currently getting the results you are 
getting, and what you consider illogical about them, are a huge factor in 
giving you good advice to move forward.



-Hoss
http://www.lucidworks.com/

Re: custom function for multivalued fields

2015-07-29 Thread Chris Hostetter

Thanks to the SortedSetDocValues this is in fact possible -- in fact i 
just uploaded a patch for SOLR-2522 that you can take a look at to get an 
idea of how to make it work (the main class you're probably going 
to want to look at is SortedSetSelector: you're going to want a similar 
"SortedDocValues proxy" class on top of SortedSetDocValues -- but instead 
of picking a single value, you want to pick your new synthetic value based 
on your custom function logic.

https://issues.apache.org/jira/browse/SOLR-2522

: I have a requirement where i want to maintain a multivalued field. However,
: at query time, i want to query on only one value we store in  multivalued
: field. That one value should be output of a custom function which should
: execute on all values of multivalued field at query time.
: Can we write such function and plug into solr.


-Hoss
http://www.lucidworks.com/


Re: SolrNet and deep pagination

2015-08-10 Thread Chris Hostetter

: Has anyone worked with deep pagination using SolrNet? The SolrNet 
: version that I am using is v0.4.0.2002. I followed up with this article, 
: https://github.com/mausch/SolrNet/blob/master/Documentation/CursorMark.md 
: , however the version of SolrNet.dll does not expose the a StartOrCursor 
: property in the QueryOptions class.


I don't know anything about SolrNet, but i do know that the URL you list 
above is for the documentation on the master branch.  If i try to look at 
the the same document on the 0.4.x branch, that document doesn't exist -- 
suggesting the feature isn't supported in the version of SolrNet you are 
using...

https://github.com/mausch/SolrNet/blob/0.4.x/Documentation/CursorMark.md
https://github.com/mausch/SolrNet/tree/0.4.x/Documentation

In fact, if i search the repo for "StartOrCursor" i see a file named 
"StartOrCursor.cs" exists on the master branch, but not on the 0.4.x 
branch...

https://github.com/mausch/SolrNet/blob/master/SolrNet/StartOrCursor.cs
https://github.com/mausch/SolrNet/blob/0.4.x/SolrNet/StartOrCursor.cs

...so it seems unlikely that this (class?) is supported in the release you 
are using.

Note: according to the docs, there is a SolrNet google group where this 
question is probably the most appopriate: 

https://github.com/mausch/SolrNet/blob/master/Documentation/README.md
https://groups.google.com/forum/#!forum/solrnet




-Hoss
http://www.lucidworks.com/


Re: date field in the schema causing a problem

2015-08-10 Thread Chris Hostetter

: 
: 
: 
: Most documents have a correctly formatted date string and I would like to keep
: that data available for search on the date field.
...
: I realize it is complaining because the date string isn't matching the
: data_driven_schema file. How can I coerce it into allowing the non-standard
: date strings while still using the correctly formatted ones?

If you want to preserve all of the data, and don't care about doing Date 
operations (ie: date range queries, date faceting, etc...) on the field, 
then you could always just define these fields to use a String based field 
type.

If you want to only preserve the data that can be cleanly parsed as a 
Date, then one workarround would be probably be to configure something 
like this *after* the ParseDateFieldUpdateProcessorFactory...

  
   solr.TrieDateField
   .*
   
   true
  
  
   solr.TrieDateField
  

...that should work because the RegexReplaceProcessorFactor will only 
operate on _string_ values in the incoming docs -- if 
ParseDateFieldUpdateProcessorFactory has already been able to parse the 
string into a Date object, it will be ignored.

If you want *both* (ie: to do Date specific operations on docs that can be 
parsed, but also know when docs provide other non-Date values in those 
fields) you'll need to use more then one field -- 
CloneFieldUpdateProcessor can handle that for you...

https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html
https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html
https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html

https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html



-Hoss
http://www.lucidworks.com/


Re: Solr MLT with stream.body returns different results on each shard

2015-08-11 Thread Chris Hostetter

: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
: indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
: MorelikeThisHandler with content stream I'm getting different results per
: shard.

I haven't looked at the code recently but i'm 99% certain that the MLT 
handler in general doesn't work with distributed (ie: sharded) queries.  
(unlike the MLT component and the recently added MLT qparser)

I suspect that in the specific case of stream.body, what you are seeing is 
that the interesting terms are being computed relative the local tf/idf 
stats for that shard, and then only local results from that shard are 
being returned.

: I also looked at using a standard MLT query, but I need to be able to
: stream in a fairly large block of text for comparison that is not in the
: index (different type of document). A standard MLT  query

Until/unless the MLT parser supports arbitrary text (there's some mention 
of this in SOLR-7639 but i'm not sure what the status of that is) you 
might find that just POSTing all of your text as a regular query (q) using 
dismax or edismax is suitable for your needs -- that's essentially the 
equivilent of what MLTHandler does with a stream.body, except it tries to 
only focus on "interesting terms" based on tf/idf, but if your fields 
are all configured with stopword files anyway, then the results and 
performance may be similar.


-Hoss
http://www.lucidworks.com/


Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Chris Hostetter

: > I am getting following exception for the query :
: > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
: > exception is not seen once the cardinality is set to 0.9 or less.
: > The field is *docValues enabled* and *indexed=false*. The same exception
: > I tried to reproduce on non docValues field but could not. Please help me
: > resolve the issue.

Hmmm... this is a weird error ... but you haven't really given us enough 
information to really guess what the root cause is 

- What was the datatype of the field(s)? 
- Did you have the exact same data in both fields?
- Are these multivalued fields?
- Did your "real" query actually compute stats on the same field you had 
  done your main term query on?

I know we have some tests of this bsaic siuation, and i tried to do ome 
more manual testing to spot check, but i can't reproduce.

If you can please provide a full copy of the data (as csv o xml or 
whatever) to build your index along with all solr configs and the exact 
queries to reproduce that would really help get to the bottom of this -- 
if you can't provide all the data, then can you at least reproduce with a 
small set of sample data?

either way: please file a new jira issue and attach as much detail as you 
can -- this URL has a lot of great tips on the types of data we need to be 
able to get to the bottom of bugs...

https://wiki.apache.org/solr/UsingMailingLists





: > ERROR - 2015-08-11 12:24:00.222; [core]
: > org.apache.solr.common.SolrException;
: > null:java.lang.ArrayIndexOutOfBoundsException: 3
: > at
: > 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
: > at
: > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
: > at net.agkn.hll.HLL.toBytes(HLL.java:917)
: > at net.agkn.hll.HLL.toBytes(HLL.java:869)
: > at
: > 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
: > at
: > 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
: > at
: > 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
: > at
: > 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
: > at
: > 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
: > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
: > at
: > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
: > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
: > at
: > 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
: > at
: > 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
: > at
: > 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
: > at
: > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
: > at
: > 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
: > at
: > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
: > at
: > 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
: > at
: > 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
: > at
: > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
: > at
: > 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
: > at
: > 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
: > at
: > 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
: > at
: > 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
: > at
: > 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
: > at
: > 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
: > at org.eclipse.jetty.server.Server.handle(Server.java:497)
: > at
: > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
: > at
: > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
: > at
: > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
: > at
: > 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
: > at
: > 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
: > at java.lang.Thread.run(Thread.java:745)
: >
: > Kindly let me know if I need to ask this on any of the related jira issue.
: >
: > Thanks,
: > Modassar
: >
: 

-Hoss
http://www.lucidworks.com/


Re: Query term matches

2015-08-18 Thread Chris Hostetter

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


: Message-ID: <55d0c940.4060...@tnstaafl.net>
: Subject: Query term matches
: References:
: 
:  
: In-Reply-To: 


-Hoss
http://www.lucidworks.com/


Re: pre-loaded function-query?

2015-08-18 Thread Chris Hostetter

: My current expansion expands from the
:user-query
: to the
:+user-query favouring-query-depending-other-params overall-favoring-query
: (where the overall-favoring-query could be computed as a function).
: With the boost parameter, i'd do:
:(+user-query favouring-query-depending-other-params)^boost-function
: 
: Not exactly the same or?

w/o more specifics it's hard to be certain, but nothing you've described 
so far sounds like you really need custom code.

just use things like the "boost" QParser in conjunction with other nested 
parsers.

ie: instead of users sending q=user-query have hte user send qq=user-query 
and write your main query something like...

?q={!boost b=boost-function v=$x}
&x=(+{!query v=$qq} {!query v=$favor})
&favor=favouring-query-depending-other-params
&qq=user-query

See also...

https://people.apache.org/~hossman/ac2012eu/
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630



-Hoss
http://www.lucidworks.com/


Re: Exception while using {!cardinality=1.0}.

2015-08-21 Thread Chris Hostetter

: - Did you have the exact same data in both fields?
: Both the field are string type.

that's not the question i asked.

is the data *in* these fields (ie: the actual value of each field for each 
document) the same for both of the fields?  This is important to figuring 
out if the root problem that having docValues (or not having docValues) 
causes a problem, or is the root problem that having certain kinds of 
*data* in a string field (regardless of docValues) can cause this problem.

Skimming the sample code you posted to SOLR-7954 you are definitley 
putting differnet data into "field" then you put into "field1" so it's 
still not clear what the problem is.

: - Did your "real" query actually compute stats on the same field you had
:   done your main term query on?
: I did not get the question but as much I understood and verified in the
: Solr log the stat is computed on the field given with
: stats.field={!cardinality=1.0}field.

the question is sepcific to the example query you mentioned before and 
again in your descripion in SOLR-7954.  They show that the same field 
name you are computing stats on ("field") is also used in your main query 
as a constraint on the documents ("q=field:query") which is an odd and 
very special edge case that may be pertinant to the problem you are 
seeing.  Depending on what data you index, that might easily only match 1 
document -- in the case of the test code you put in jira, exactly 0 
documents since you never index the text "query" into field "field" for 
any document)


I haven't had a chance to review the jira in depth or actaully run your 
code with those configs -- but if you get a chance before i do, please 
re-review the code & configs you posted and see if you can reproduce using 
the *exact* same data in two different fields, and if the choice of query 
makes a differnce in the behavior you see.


: 
: Regards,
: Modassar
: 
: On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather 
: wrote:
: 
: > Ahmet/Chris! Thanks for your replies.
: >
: > Ahmet I think "net.agkn.hll.serialization" is used by hll() function
: > implementation of Solr.
: >
: > Chris I will try to create sample data and create a jira ticket with
: > details.
: >
: > Regards,
: > Modassar
: >
: >
: > On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter  > wrote:
: >
: >>
: >> : > I am getting following exception for the query :
: >> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
: >> : > exception is not seen once the cardinality is set to 0.9 or less.
: >> : > The field is *docValues enabled* and *indexed=false*. The same
: >> exception
: >> : > I tried to reproduce on non docValues field but could not. Please
: >> help me
: >> : > resolve the issue.
: >>
: >> Hmmm... this is a weird error ... but you haven't really given us enough
: >> information to really guess what the root cause is
: >>
: >> - What was the datatype of the field(s)?
: >> - Did you have the exact same data in both fields?
: >> - Are these multivalued fields?
: >> - Did your "real" query actually compute stats on the same field you had
: >>   done your main term query on?
: >>
: >> I know we have some tests of this bsaic siuation, and i tried to do ome
: >> more manual testing to spot check, but i can't reproduce.
: >>
: >> If you can please provide a full copy of the data (as csv o xml or
: >> whatever) to build your index along with all solr configs and the exact
: >> queries to reproduce that would really help get to the bottom of this --
: >> if you can't provide all the data, then can you at least reproduce with a
: >> small set of sample data?
: >>
: >> either way: please file a new jira issue and attach as much detail as you
: >> can -- this URL has a lot of great tips on the types of data we need to be
: >> able to get to the bottom of bugs...
: >>
: >> https://wiki.apache.org/solr/UsingMailingLists
: >>
: >>
: >>
: >>
: >>
: >> : > ERROR - 2015-08-11 12:24:00.222; [core]
: >> : > org.apache.solr.common.SolrException;
: >> : > null:java.lang.ArrayIndexOutOfBoundsException: 3
: >> : > at
: >> : >
: >> 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
: >> : > at
: >> : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
: >> : > at net.agkn.hll.HLL.toBytes(HLL.java:917)
: >> : > at net.agkn.hll.HLL.toBytes(HLL.java:869)
: >>

Re: Exception while using {!cardinality=1.0}.

2015-08-24 Thread Chris Hostetter

: Can you please explain how having the same field for query and stat can
: cause some issue for my better understanding of this feature?

I don't know if it can, it probably shouldn't, but in terms of trying ot 
udnerstand the bug and reproduce it, any pertinant facts may be relivant - 
particularly the unusual ones.

if no one else has ever seen a bug in X, but you were doing something 
unusual with X, and you get a bug 100% of the time, then that suggests 
that your unusual usecase would be a very important place to start looking 
-- so when you posted an example that looks weird nad unusual and unlike 
any typical usecase of field stats, i wanted ot understand what exactly 
you were doing and how much of that example was "real" and how much was 
just you munging your "real" query to hide something yo udidn't wnat to 
share.


-Hoss
http://www.lucidworks.com/


Re: Solr relevancy score order

2015-08-24 Thread Chris Hostetter

: A follow up question.  Is the sub-sorting on the lucene internal doc IDs
: ascending or descending order?  That is, do the most recently index doc

you can not make any generic assumptions baout hte order of the internal 
lucene doc IDS -- the secondary sort on the internal IDs is stable (and 
FWIW: ascending) for static indexes, but as mentioned before: the *actual* 
order hte the IDS changes as the index changes -- if there is an index 
merge, the ids can be totally different and docs can be re-arranged into a 
diff order...

: > However, internal Lucene Ids can change when index changes. (merges,
: > updates etc).

...

: show up first in this set of docs that have tied score?  If not, who can I
: have the most recent be first?  Do I have to sort on lucene's internal doc

add a "timestamp" or "counter" field when you index your documents that 
means whatevery you want it to mean (order added, order updated, order 
according to some external sort criteria from some external system) and 
then do an explicit sort on that.


-Hoss
http://www.lucidworks.com/


Re: Unknown query parser 'terms' with TermsComponent defined

2015-08-25 Thread Chris Hostetter

1) The "terms" Query Parser (TermsQParser) has nothing to do with the 
"TermsComponent" (the first is for quering many distinct terms, the 
later is for requesting info about low level terms in your index)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component

2) TermsQParser (which is what you are trying to use with the "{!terms..." 
query syntax) was not added to Solr until 4.10

3) based on your example query, i'm pretty sure what you want is the 
TermQParser: "term" (singular, no "s") ...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser

{!term f=id}ft849m81z


: We've encountered a strange situation, I'm hoping someone might be able to
: shed some light. We're using Solr 4.9 deployed in Tomcat 7.
...
:   'q'=>'_query_:"{!raw f=has_model_ssim}Batch" AND ({!terms
f=id}ft849m81z)',
...
: 'msg'=>'Unknown query parser \'terms\'',
: 'code'=>400}}

...

: The terms component is defined in solrconfig.xml:
: 
:   

-Hoss
http://www.lucidworks.com/


Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Chris Hostetter


: updates?  i can't do this because i have delta-import queries which also
: should be able to assign uuid when it is  needed

You really need to give us a full and complete picture of what exactly you 
are currently doing, what's working, what's not working, and when it's not 
working what is it doing and how is that differnet from what you expect.


example: you mentioned you have "requesthandler with name "/update" which 
contains uuid update срфшт" (presumably you mean the processor) but you 
haven't shown us your configs, or any of your logs, so we can see how 
exactly it's configured, or if/how it's being used.


If UUIDUpdateProcessorFactory is in place, then it should only generate a 
new UUID if the document doesn't already have one -- if you are using DIH 
to add documents to the index, and the uuid you are using/generating 
isn't also the uniqueKey field, then the UUIDUpdateProcessorFactory 
doens't have any way of magically knowing when a "new" document is 
actually a replacement for an old document.


(If you are using Atomic Updates, then registering 
UUIDUpdateProcessorFactory *after* the DistributedUpdateProcessorFactory 
can help -- but that doesn't sound like it's relevant if you are using DIH 
detla updates)




Please review this page and give us *all* the details about your 
current setup, your goal, and the specific problem you are facing...


https://wiki.apache.org/solr/UsingMailingLists



-Hoss
http://www.lucidworks.com/

Re: find documents based on specific term frequency

2015-08-26 Thread Chris Hostetter

: "Is there a way to search for documents that have a word appearing more 
: than a certain number of times? For example, I want to find documents 
: that only have more than 10 instances of the word "genetics" …"

Try...

q=text:genetics&fq={!frange+incl=false+l=10}termfreq('text','genetics')

Note: the q=text:genetics isn't neccessary -- you could do any query and 
then filter on the numeric function range of the termfreq() function, or 
use that {!frange} as your main query (in which case all matchin docs will 
have identical scores).  i just included that in the example to show how 
you can search & sort by the "normal" style scoring (which takes into 
account full TF-IDF and length normalization) while filtering on the TF 
using a function query.

You can also request the termfreq() as a psuedo field for each doc in the 
the results, and parameterize the details to eliminate redundency in 
the request params...


...&fq={!frange+incl=false+l=10+v=$tf}&fl=*,$tf&tf=termfreq('text','genetics')

Is the same as...

...&fq={!frange+incl=false+l=10}termfreq('text','genetics')&fl=*,termfreq('text','genetics')


A big caveat to this however is that the termfreq function operates on the 
*RAW* underlying term values -- no query time analyzer is used -- so 
if you do stemming, or lowercasing in your index analyzer, you have to 
pass the stemmed/lowercased values to the function  (Although i just filed 
SOLR-7981 since it occurs to me we can make this automatic in the future 
with a new function argument)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser
https://cwiki.apache.org/confluence/display/solr/Function+Queries



-Hoss
http://www.lucidworks.com/

Re: "no default request handler is registered"

2015-08-27 Thread Chris Hostetter

Thats... strange.

Looking at hte code it appears to be a totally bogus and missleading 
warning -- but it also shouldn't affect anything.

You can feel free to ignore it for now...

https://issues.apache.org/jira/browse/SOLR-7984



: Date: Thu, 27 Aug 2015 15:10:18 -0400
: From: Scott Hollenbeck 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: "no default request handler is registered"
: 
: I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of
: the Apache Solr Search module for Drupal. Things seem to be working fine,
: except that this warning message appears in the Solr admin logging window
: and in the server log:
: 
: "no default request handler is registered (either '/select' or 'standard')"
: 
: Looking at the solrconfig.xml file that comes with the Drupal module I see a
: requestHandler named "standard":
: 
:   
:  
:content
:explicit
:true
:  
:   
: 
: I also see a handler named pinkPony with a "default" attribute set to
: "true":
: 
:   
:   
: 
:   edismax
:   content
:   explicit
:   true
:   0.01
:   
:   ${solr.pinkPony.timeAllowed:-1}
:   *:*
: 
:   
:   false
:   
:   true
:   false
:   
:   1
: 
: 
:   spellcheck
:   elevator
: 
:   
: 
: So it seems like there are both standard and default requestHandlers
: specified. Why is the warning produced? What am I missing?
: 
: Thank you,
: Scott
: 
: 

-Hoss
http://www.lucidworks.com/


Re: "no default request handler is registered"

2015-08-27 Thread Chris Hostetter

I just want to clarify: all of Shawn's points below are valid and good -- 
but they stll don't explain the warning messgae you are getting.  it makes 
no sense as the code is currently written, and doesn't do anything to help 
encourage people to transition to path based handler names.



: Date: Thu, 27 Aug 2015 13:50:51 -0600
: From: Shawn Heisey 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: "no default request handler is registered"
: 
: On 8/27/2015 1:10 PM, Scott Hollenbeck wrote:
: > I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of
: > the Apache Solr Search module for Drupal. Things seem to be working fine,
: > except that this warning message appears in the Solr admin logging window
: > and in the server log:
: > 
: > "no default request handler is registered (either '/select' or 'standard')"
: > 
: > Looking at the solrconfig.xml file that comes with the Drupal module I see a
: > requestHandler named "standard":
: > 
: >   
: >  
: >content
: >explicit
: >true
: >  
: >   
: > 
: > I also see a handler named pinkPony with a "default" attribute set to
: > "true":
: 
: 
: 
: > So it seems like there are both standard and default requestHandlers
: > specified. Why is the warning produced? What am I missing?
: 
: I think the warning message may be misworded, or logged in incorrect
: circumstances, and might need some attention.
: 
: The solrconfig.xml that you are using (which I assume came from the
: Drupal project) is geared towards a 3.x version of Solr prior to 3.6.x
: (the last minor version in the 3.x line).
: 
: Starting in the 3.6 version, all request handlers in examples have names
: that start with a forward slash, like "/select", none of them have the
: "default" attribute, and the handleSelect parameter found elsewhere in
: the solrconfig.xml is false.
: 
: You should bring this up with the Drupal folks and ask them to upgrade
: their config/schema and their code for modern versions of Solr.  Solr
: 3.6.0 (which deprecated their handler naming convention and the
: "default" attribute) was released over three years ago.
: 
: More info than you probably wanted to know:  The reason this change was
: made is security-related.  With the old way of naming request handlers
: and handling /select indirectly, you could send a query to /select,
: include a qt=/update parameter, and change the index via a handler
: intended only for queries.
: 
: Thanks,
: Shawn
: 
: 

-Hoss
http://www.lucidworks.com/


Re: Sorting by function

2015-08-28 Thread Chris Hostetter

: I have a "country" field in my index, with values like 'US', 'FR', 'UK',
: etc...
: 
: Then I want our users to be able to define the order of their preferred
: countries so that grouped results are sorted according to their preference.
...
: Is there any other function that would allow me to map from a predefined
: String constant into an Integer that I can sort on ?

Because of how they evolved, and most of the common usecases for them, 
there aren't a lot of functions that operate on "strings".

Assuming your "country" field is a single valued (indexed) string field, 
then what you want can be done fairly simply using the the "termfreq()" 
function.

termfreq(country,US) will return the (raw integer) term frequency for 
"Term(country,US)" for each doc -- assuming it's single valued (and not 
tokenized) that means for every doc it will be either a 0 or a 1.

so you can either modify your earlier attempt at using "map" on the string 
values to do a map over the termfreq output, or you can simplify things to 
just multiply take the max value -- where max is just a short hand for 
"the non 0 value" ...

max(mul(9,termfreq(country,US)),
mul(8,termfreq(country,FR)),
mul(7,termfreq(country,UK)),
...)

Things get more interesting/complicated if the field isn't single valued, 
or is tokenized -- then individual values (like "US") might have a 
termfreq that is greater then 1, or a doc might have more then one value, 
and you have to decide what kind of math operation you want to apply over 
those...

  * ignore termfreqs and ony look at if term exists? 
- wrap each termfreq in map to force value to either 0 or 1
  * want to sort by sum of (weights * termfreq) for each term?
- change max to sum in above example
  * ignore all but the "main" term that has hte highest freq for each doc?
- not easy at query time - best to figure out the "main" term at index 
  time and put in it's own field.


-Hoss
http://www.lucidworks.com/


Re: which solrconfig.xml

2015-09-02 Thread Chris Hostetter
: various $HOME/solr-5.3.0 subdirectories.  The documents/tutorials say to edit
: the solrconfig.xml file for various configuration details, but they never say
: which one of these dozen to edit.  Moreover, I cannot determine which version

can you please give us a specific examples (ie: urls, page numbers & 
version of the ref guide, etc...) of documentation that tell you to edit 
the solrconfig.xml w/o being explicit about where to to find it so that we 
can fix the docs?

FWIW: The official "Quick Start" tutorial does not mention editing 
solrconfig.xml at all...

http://lucene.apache.org/solr/quickstart.html



-Hoss
http://www.lucidworks.com/


Re: Local Params for Stats field

2015-09-03 Thread Chris Hostetter

: I'm trying to use localparams for stats component on Solr 4.4, exact query:
: q=*:*&core=hotel_reviews&collection=hotel_reviews&fq=checkout_date:[* TO
: *]&fq={!tag=period1}checkout_date:[2011-12-25T00:00:00.000Z TO
: 
2012-01-02T00:00:00.000Z}&fq={!tag=period2}checkout_date:[2011-12-25T00:00:00.000Z
: TO
: 
2012-01-02T00:00:00.000Z}&rows=0&stats=true&stats.field={!ex=period2}checkout_date
: 
: and it fails with error "unknown field" checkout_date.
: Should localparams for stats field be supported for v. 4.4?
: If I run same query for v.4.8 -- it returns result w/o error

what is the exact error message you get? specifically what shows up in 
your logs (with stack trace) so we can understand what piece of code is 
complaining about hte "unknown field" (you are asking here about stats 
component, but you are using "checkout_date" in several places in your 
query, we have no way of knowing for sure if the problem is coming from 
stats -- you haven't given us any examples of queries that *do* work (or 
details about how your checkout_date field is defined)

https://wiki.apache.org/solr/UsingMailingLists

are you absolutely certain this collection has a checkout_date in your 4.4 
solr instance?





-Hoss
http://www.lucidworks.com/


Re: Position of Document in Listing (Search Result)

2015-09-03 Thread Chris Hostetter

: Write a PostFilter which takes in a document id. It lets through all
: documents until it sees that document id. Once it sees it, it stops
: letting them through.
: 
: Thus, the total count of documents would be the position of your queried
: car.

Sorry guys, that won't work.

PostFilter's can be used to collect & filter the documents returned as the 
result of a query, after the main query logic (so you can delay expensive 
filter checks) but they still happen before any sorting -- they have to in 
order to in order for the sorting logic to know *whic* documents 
should be added to the priority queue.

- - -

I can only think of two appraoches to this general problem: 

1) 2 queries with frange filter on score.

this solution is only applicable in situations where:
  a) you are only sorting on scores
  b) the position information can be aproximate as far as other docs with 
identical scores (ie: you can say "X documents have a higher score" 
instead of "exactly X documents come before this one")

The key is to first do a query on whever where you filter (fq) on the doc 
id(s) you are interested in so you can get them back along with the 
scores, then you do another query where you do something like...

?rows=0&q=whatever&fq={!frange incl=false l=THE_SCORE v=$q}

...so that you post filter and ignore any doc that doesn't have a higher 
score and look at hte total numFound.

if there are multiple docs you need to get info about at one time, instead 
of filtering you can use facet.query the same way

  rows=0
  q=whatever
  facet=true
  facet.query={!frange key=doc1 incl=false l=DOC1_SCORE v=$q}
  facet.query={!frange key=doc2 incl=false l=DOC2_SCORE v=$q}
  facet.query={!frange key=doc3 incl=false l=DOC3_SCORE v=$q}
  ...etc...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser

2) cursor deep paging

this solution will work regardless of the number of docs you are 
interested in, and regardless of how complex your sort options are -- just 
use the cursorMark param to iterate over all the results in your client 
until you've found all the unqiueKeys you are looking for, counting the 
docs found as you go.

The various docs on deep paging and using cursors go into some background 
hwich may help you understand why what you are asking for in general is a 
hard problem, and why suggestion #1 only works with a simple sort on 
score, and for anything more complex you really have to go the cursor 
route...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

-Hoss
http://www.lucidworks.com/


Re: Why is Process Total Time greater than Elapsed Time?

2015-09-03 Thread Chris Hostetter

depends on where you are reading "Process Total Time" from.  that 
terminology isn't something i've ever sen used in the context of solr 
(fairly certain nothing in solr refers to anything that way)

QTime is the amount of time spent processing a request before it starts 
being written out over the wire to the client, so it is almost garunteed 
to be *less* then the total elapsed (wall clock) time witnessed by your 
solrJ client ... but i have no idea what "Process Total Time" is if you 
are seeing it greater then wall clock.

: From what I can tell, each component processes the request sequentially. So
: how can I see an Elapsed Time of 750ms (SolrJ client) and a Process Total
: Time of 1300ms? Does the Process Total Time add up the amount of time each
: leaf reader takes, or some other concurrent things?


-Hoss
http://www.lucidworks.com/


RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Chris Hostetter
: Strange enough, the following code gives different errors:
: 
: assertQ(

I'm not sure what exactly assertQ will do in a distributed test like this 
... probably nothing good.   you'll almost certainly want to stick with 
the distributed indexDoc() and query* methods and avoid assertU and 
assertQ


: [TEST-TestComponent.test-seed#[EA2ED1E118114486]] ERROR 
org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: 
xpath=//result/doc[1]/str[@name='id'][.='1']
: xml response was: 
...
: 
: 

...i'm guessing that's because assertQ is (probably) querying the "local" 
core from the TestHarness, not any of the distributed cores setup by 
BaseDistributedSearchTestCase and your docs didn't get indexed there.

: And, when i forcfully add distrib=true, i get a NPE in SearchHandler!

which is probably becaues since you (manually) added the debug param, but 
didn't add a list of shards to query, you triggered some slopy code in 
SearchHandler that should be giving you a nice error about shards not 
being specified.  (i bet you can manually reproduce this in a single-node 
solr setup by adding distrib=true to any query thta doesn't have a 
"shards" param, if so please file a bug that it should produce a sane 
error message)

if you use something like BaseDistributedSearchTestCase.query on the other 
hand, it takes care of adding hte correct distrib related request 
params for the shards it creates under the covers.

(allthough at this point, in general, i would strongly suggest that you 
instead consider using AbstractFullDistribZkTestBase instead of 
BaseDistributedSearchTestCase -- assuming of course that your goal is good 
tests of how some distributed queries behave in a modern solr cloud setup.  
if your goal is to test solr under manual sharding/distributed queries, 
BaseDistributedSearchTestCase still makes sense.)


As to your first question (which applies to both old school and 
cloud/zk related tests)...

: > Executing the above text either results in a: IOException occured when 
talking to server at: https://127.0.0.1:44761//collection1

That might be due ot a timing issue of the servers not completley starting 
up before you start sending requests to them? not really sure ... would 
need to see the logs.

: > Or it fails with a curous error: .response.maxScore:1.0!=null
: > 
: > The score correctly changes according to whatever value i set for parameter 
q.

that has to do with teh way the BaseDistributedSearchTestCase plumbing 
tries to help ensure that a distribute query returns the same results as a 
single shard query by "diffing" the responses (note: this is why 
BaseDistributedSearchTestCase.indexDoc adds your doc to both a random 
shard *and* to a "control collection").  But there are some legacy quirks 
about how things like "maxScore" are handled: notably SOLR-6612 
(historically, because of the possibility of filter optimizations, solr 
only kept track of the scores if it needed to.  in single core, this was 
if you asked for "fl=score,..." but in a distributed query it might also 
compute scores (and maxScore) if you are sorting on scores (which is the 
default)

they way to indicate that you don't want BaseDistributedSearchTestCase's 
response diff checking to freak out over the max score is using the 
(horribly undocumented) "handle" feature...

handle.put("maxScore", SKIPVAL);

...that's not the default in all tests because it could hide errors in 
situations where tests *are* expecting the maxScore to be the same.


the same mechanism can be used to ignore things like the _version_ 
field, or timestamp fields which are virtually garunteed not to be the 
same between two differnet collections.  (see uses of the "handle" Map in 
existing test cases for examples).



-Hoss
http://www.lucidworks.com/


Re: how to parse json response from Solr Term Vector Component (5.3.0)

2015-09-15 Thread Chris Hostetter
: 
: how to parse json response from Solr Term Vector Component?
: 
: I got following json structure from response when testing Solr 5.3.0
: tvComponent:
... 
: Is it correct ? Why solr makes the json response for term vector
: information so difficult to extract from the client side ? why it use list
: to encode rather than dictionary?

What you're seeing has to do with how the general purpose datastructures 
used in the response are serialized into JSON. By default, solr's 
"NamedLIst" datastructure (which can support hte same key associated with 
multiple values) is modeled in JSON as a list of alternativing key value 
pairs for simplicity, but you can add a "json.nl=map" to force these to be 
a Map (in which case your parsing code has to decide what to do if/when a 
key is specified multiple times) or "json.nl=arrarr" (for an array of 
array pairs)

http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters



-Hoss 
http://www.lucidworks.com/




Re: firstSearcher cache warming with own QuerySenderListener

2015-09-29 Thread Chris Hostetter

You haven't really provided us enough info to make any meaningful 
suggestions.

You've got at least 2 custom plugins -- but you don't give us any idea 
what the implementations of those plugins look like, or how you've 
configured them.  Maybe there is a bug in your code?  maybe it's 
misconfigured?

You said that initial queries seem a little faster when you use your 
custom plugin(s) but not as fast as if you manual warm those queries from 
a browser first -- what do the queries look like? how fast is fast? ... 

w/o specifics it's impossible to guess where the added time (or added time 
savings when using hte browser to warm them) may be coming from ... and 
again: maybe the issue is that the code in your custom only is only 
partially right? maybe it's giving you a slight bit of warming just by 
executing a query to get some index data strucutres into ram, but it's 
actaully executing the wrong query?

Show us the details single query, and tell us how *exactly* does the 
timing comapare between: no warming; warming just that query with your 
custom plugin; warming just that query from your browser?

show us the *logs* from solr in all of those cases as well so we can see 
what is actaully getting executed under the hood.


As far as caching goes: all of the cache statistics are easily available 
from the plugin UI / handler...

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604180
https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler

what do you see in terms of insertions/hits/misses on all of the caches in 
each of the above scenerios?



: Date: Fri, 25 Sep 2015 17:31:30 +0200
: From: Christian Reuschling 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: firstSearcher cache warming with own QuerySenderListener
: 
: Hey all,
: 
: we want to avoid cold start performance issues when the caches are cleared 
after a server restart.
: 
: For this, we have written a SearchComponent that saves least recently used 
queries. These are
: written to a file inside a closeHook of a SolrCoreAware at server shutdown.
: 
: The plan is to perform these queries at server startup to warm up the caches. 
For this, we have
: written a derivative of the QuerySenderListener and configured it as 
firstSearcher listener in
: solrconfig.xml. The only difference to the origin QuerySenderListener is that 
it gets it's queries
: from the formerly dumped lru queries rather than getting them from the config 
file.
: 
: It seems that everything is called correctly, and we have the impression that 
the query response
: times for the dumped queries are sometimes slightly better than without this 
warming.
: 
: Nevertheless, there is still a huge difference against the times when we 
manually perform the same
: queries once, e.g. from a browser. If we do this, the second time we perform 
these queries they
: respond much faster (up to 10 times) than the response times after the 
implemented warming.
: 
: It seems that not all caches are warmed up during our warming. And because of 
these huge
: differences, I doubt we missed something.
: 
: The index has about 25M documents, and is splitted into two shards in a cloud 
configuration, both
: shards are on the same server instance for now, for testing purposes.
: 
: Does anybody have an idea? I tried to disable lazy field loading as a 
potential issue, but with no
: success.
: 
: 
: Cheers,
: 
: Christian
: 
: 

-Hoss
http://www.lucidworks.com/


Re: How can I get a monotonically increasing field value for docs?

2015-09-29 Thread Chris Hostetter


You're basically re-implementing Solr' cursors.

you can change your system of reading docs from the old collection to 
use...

cursorMark=*&sort=timestamp+asc,id+asc

...and then instead of keeping track of the last timestamp & id values and 
constructing a filter, you can just keep track of the nextCursorMark and 
pass it the next time you want to check for newer documents...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results





: Date: Mon, 21 Sep 2015 21:32:33 +0300
: From: Gili Nachum 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: How can I get a monotonically increasing field value for docs?
: 
: Thanks for the indepth explanation!
: 
: The secondary sort by uuid would allow me to read a series of docs with
: identical time over multiple batches by specifying filtering
: time>timeOnLastReadDoc or (time=timeOnLastReadDoc and
: uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to
: track progress over.
: On Sep 21, 2015 19:56, "Shawn Heisey"  wrote:
: 
: > On 9/21/2015 9:01 AM, Gili Nachum wrote:
: > > TimestampUpdateProcessorFactory takes place only on the leader shard, or
: > on
: > > each shard replica?
: > > if on each replica then I would get different values on each replica.
: > >
: > > My alternative would be to perform secondary sort on a UUID to ensure
: > order.
: >
: > If the update chain is configured properly, it runs on the leader, so
: > all replicas get the same timestamp.
: >
: > Without SolrCloud, the way to create an "indexed at" time field is in
: > the schema -- specify a default value of NOW on the field definition and
: > don't send the field when indexing.  The old master/slave replication
: > copies the actual index contents, so the indexed values in all replicas
: > are the same.
: >
: > The problem with NOW in the schema when running SolrCloud is that each
: > replica indexes the document independently, so each replica can have a
: > different timestamp.  This is why the timestamp update processor exists
: > -- to set the timestamp to a specific value before the document is
: > duplicated to each replica, eliminating the problem.
: >
: > FYI, secondary sort parameters affect the order when the primary sort
: > field is identical between two documents.  It may not do what you are
: > intending because of that.
: >
: > Thanks,
: > Shawn
: >
: >
: 

-Hoss
http://www.lucidworks.com/


  1   2   3   4   5   6   7   8   9   10   >