date:20161003

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Markko Legonkov

Thanks for quick response

Here is what i tried

complement(
search(
products,
qt="/export",
q="*:*",
fq="product_id_i:15940162",
fl="id, product_id_i, product_name_s,sale_price_d",
sort="product_id_i asc"
),
select(
search(
products,
qt="/export",
q="*:*",
fq="product_id_i:15940162",
fl="id, product_id_i, product_name_s,sale_price_d",
sort="product_id_i asc"
),
id as c_id,
product_id_i as c_product_id_i,
product_name_s as c_product_name_s,
sale_price_d as c_sale_price_d
),
on="product_id_i=c_product_id_i,sale_price_d=c_sale_price_d"
)

but still
{
  "result-set": {
"docs": [
  {
"EXCEPTION": "org.apache.solr.client.solrj.io.comp.FieldComparator
cannot be cast to
org.apache.solr.client.solrj.io.comp.MultipleFieldComparator",
"EOF": true
  }
]
  }
}

On Sun, Oct 2, 2016 at 1:26 AM, Joel Bernstein  wrote:

> Also you'll probably need to specify the /export handler in the search
> expressions, so you get the entire result set.
>
> qt="/export"
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, Oct 1, 2016 at 6:08 PM, Joel Bernstein  wrote:
>
> > Ok, I took a closer look at the expression. I believe this is not
> > supported:
> >
> > sale_price_d!=c_sale_price_d
> >
> > Possibly the complement expression might accomplish what you're trying to
> > do.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sat, Oct 1, 2016 at 5:59 PM, Joel Bernstein 
> wrote:
> >
> >> Hi can you attach the stack traces in the logs? I'd like to see where
> >> this exception coming, this appears to be a bug.
> >>
> >> I'll also need to dig into your expression and see if there is an issue
> >> with the syntax.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Sat, Oct 1, 2016 at 2:29 PM, Markko Legonkov 
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> What i would like to achieve is that i want to filter out all products
> >>> which have different prices on two given dates.
> >>> Here is a sample expression
> >>>
> >>> leftOuterJoin(
> >>>   search(
> >>> products,
> >>> q="*:*",
> >>> fq="product_id_i:1 AND product_name_s:test",
> >>> fl="id, product_id_i, product_name_s,sale_price_d",
> >>> sort="product_id_i asc"
> >>>   ),
> >>>   select(
> >>> search(
> >>>   products,
> >>>   q="product_id_i:1 AND product_name_s:Test",
> >>>   fl="id, product_id_i, product_name_s,sale_price_d",
> >>>   sort="product_id_i asc"
> >>> ),
> >>> id as c_id,
> >>> product_id_i as c_product_id_i,
> >>> product_name_s as c_product_name_s,
> >>> sale_price_d as c_sale_price_d
> >>>   ),
> >>>   on="product_id_i=c_product_id_i, sale_price_d!=c_sale_price_d"
> >>> )
> >>>
> >>> I am using solr 6.2.0
> >>> And the result i get from solr is:
> >>> {
> >>>   "result-set": {
> >>> "docs": [
> >>>   {
> >>> "EXCEPTION": "org.apache.solr.client.solrj.
> >>> io.comp.FieldComparator
> >>> cannot be cast to
> >>> org.apache.solr.client.solrj.io.comp.MultipleFieldComparator",
> >>> "EOF": true
> >>>   }
> >>> ]
> >>>   }
> >>> }
> >>>
> >>> Do i have to configure something in solr that it knows it has to use
> >>> MultipleFieldComparator?
> >>>
> >>> Regards
> >>> Max
> >>>
> >>
> >>
> >
>

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Markko Legonkov

here is the stacktrace

java.io.IOException: Unable to construct instance of
org.apache.solr.client.solrj.io.stream.ComplementStream
at 
org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:323)
at 
org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructStream(StreamFactory.java:185)
at 
org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructStream(StreamFactory.java:178)
at 
org.apache.solr.handler.StreamHandler.handleRequestBody(StreamHandler.java:185)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:316)
... 33 more
Caused by: java.lang.ClassCastException:
org.apache.solr.client.solrj.io.comp.FieldComparator cannot be cast to
org.apache.solr.client.solrj.io.comp.MultipleFieldComparator
at 
org.apache.solr.client.solrj.io.eq.MultipleFieldEqualitor.isDerivedFrom(MultipleFieldEqualitor.java:99)
at 
org.apache.solr.client.solrj.io.stream.ReducerStream.init(ReducerStream.java:132)
at 
org.apache.solr.client.solrj.io.stream.ReducerStream.(ReducerStream.java:69)
at 
org.apache.solr.client.solrj.io.stream.UniqueStream.init(UniqueStream.java:83)
at 
org.apache.solr.client.solrj.io.stream.UniqueStream.(UniqueStream.java:55)
at 
org.apache.solr.client.solrj.io.stream.ComplementStream.init(ComplementStream.java:81)
at 
org.apache.solr.client.solrj.io.stream.ComplementStream.(ComplementStream.java:73)
... 38 more

On Sun, Oct 2, 2016 at 12:59 AM, Joel Bernstein  wrote:

> Hi can you attach the stack traces in the logs? I'd like to see where this
> exception coming, this appears to be a bug.
>
> I'll also need to dig into your expression and see if there is an issue
> with the syntax.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Joel Bernstein

Ok, I'll test this out.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Oct 3, 2016 at 4:40 AM, Markko Legonkov  wrote:

> here is the stacktrace
>
> java.io.IOException: Unable to construct instance of
> org.apache.solr.client.solrj.io.stream.ComplementStream
> at org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> createInstance(StreamFactory.java:323)
> at org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> constructStream(StreamFactory.java:185)
> at org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> constructStream(StreamFactory.java:178)
> at org.apache.solr.handler.StreamHandler.handleRequestBody(
> StreamHandler.java:185)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:154)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
> at org.apache.solr.servlet.HttpSolrCall.execute(
> HttpSolrCall.java:652)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:459)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:257)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:208)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1668)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:581)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:511)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1092)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:213)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:308)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:244)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)
> at org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:654)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> createInstance(StreamFactory.java:316)
> ... 33 more
> Caused by: java.lang.ClassCastException:
> org.apache.solr.client.solrj.io.comp.FieldComparator cannot be cast to
> org.apache.solr.client.solrj.io.comp.MultipleFieldComparator
> at org.apache.solr.client.solrj.io.eq.MultipleFieldEqualitor.
> isDerivedFrom(MultipleFieldEqualitor.java:99)
> at org.apache.solr.client.solrj.io.stream.ReducerStream.init(
> ReducerStream.java:132)
> at org.apache.solr.client.solrj.io.stream.ReducerStream.
> (ReducerStream.java:69)
> at org.apache.solr.client.solrj.io.stream.UniqueStream.init(
> UniqueStream.java:83)
> at org.apache.solr.client.solrj.io.stream.UniqueStream.(
> UniqueStream.java:55)
> at org.apache.solr.client.solrj.io.stream.ComplementStream.
> init(ComplementStream.java:81)
> at org.apache.solr.client.solrj.io.stream.ComplementStream.<
> init>(ComplementStream.java:73)
> ... 38 mo

Re: JSON Facet "allBuckets" behavior

2016-10-03 Thread Karthik Ramachandran

So if i cannot use allBuckets since its not filtering, how can I achieve
this?

On Fri, Sep 30, 2016 at 7:19 PM, Yonik Seeley  wrote:

> On Tue, Sep 27, 2016 at 12:20 PM, Karthik Ramachandran
>  wrote:
> > While performing json faceting with "allBuckets" and "mincount", I not
> sure if I am expecting a wrong result or there is bug?
> >
> > By "allBucket" definition the response, representing the union of all of
> the buckets.
> [...]
> > I was wonder, why the result is not this, since I have "mincount:2"
>
> allBuckets means all of the buckets before limiting or filtering (i.e.
> mincount filtering).
>
> -Yonik
>

Multi-level nesting query inconsistency

2016-10-03 Thread Juan Botero

I am fairly new to Solr, so is possible I am writing the query wrong (I have 
Solr 4.10)

On this data:
[{
"id": -1666,
"otype": "ao",
"parent_id": -1,
"parent_type": "root",
"name": "JOSHUA N AARON MD PA",
"account_number": "002812300",
"tax_id": "50042772325",
"group_npi": 134630688333,
"taxonomy": "364SP0808AAX",
"start_date": "2001-04-01T00:00:00.00Z",
"end_date": "2139-12-31T00:00:00.00Z",
"_childDocuments_": [{
"otype": "p",
"parent_id": -1666,
"parent_type": "ao",
"id": 271,
"plan_id": "IBC",
"plan_url_identifier": "ibc"
}, {
"otype": "a",
"parent_id": -1666,
"parent_type": "ao",
"id": -88,
"line1": "216 E PULASKI STE 235",
"city": "Elkton",
"state": "MD",
"zip_code": "21921",
"_childDocuments_": [{
"otype": "ph",
"parent_id": -88,
"parent_type": "a",
"id": 1,
"number": "5556201984"
}, {
"otype": "ph",
"parent_id": -88,
"parent_type": "a",
"id": 2,
"number":"5558696114"
}]
}, {
"id": -1988,
"otype": "ap",
"parent_id": -1666,
"parent_type": "ao",
"plan_provider_id": "00283621227",
"is_pcp": false,
"is_specialist": false,
"start_date": "2001-04-01T00:00:00.00Z",
"end_date": "2014-05-01T00:00:00.00Z",
"_childDocuments_": [{
"id": -819,
"otype": "pf",
"parent_id": -1988,
"parent_type": "ap",
"npi": 139670334111,
"_childDocuments_": [{
"otype": "n",
"parent_id": -819,
"parent_type": "pf",
"id": 1,
"prefix": "Dr.",
"first": "Frank",
"middle": "N",
"last": "Aaron"
}],
"organization_name": "Frank N Aaron",
"date_of_birth": "1963-03-18T00:00:00.00Z",
"gender_code": "M",
"is_individual": true
}]
}, {
"id": -1987,
"otype": "ap",
"parent_id": -1666,
"parent_type": "ao",
"plan_provider_id": "00283621007",
"is_pcp": false,
"is_specialist": false,
"start_date": "2001-04-01T00:00:00.00Z",
"end_date": "2014-05-01T00:00:00.00Z",
"_childDocuments_": [{
"id": -815,
"otype": "pf",
"parent_id": -1987,
"parent_type": "ap",
"npi": 139670335001,
"_childDocuments_": [{
"otype": "n",
"parent_id": -815,
"parent_type": "pf",
"id": 1,
"prefix": "Dr.",
"first": "Joshua",
"middle": "N",
"last": "Aaron"
}],
"organization_name": "Joshua N Aaron",
"date_of_birth": "1963-03-18T00:00:00.00Z",
"gender_code": "M",
"is_individual": true
}]
}]
}]

This query:
q=otype:pf&fl=*,[docid],[child parentFilter=otype:pf limit=500]

returns different documents in the childocuments array for each doc in result, 
2 oype:pf in total. But they have the same structure in the datasource. I 
expected each to have one child,  the otype:n document, but each has a 
different set of children and both more than intended, why?

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

Re: Multi-level nesting query inconsistency

2016-10-03 Thread Mikhail Khludnev

Hello,
you can strip grandchildren with
[child parentFilter=otype:pf limit=500 childFilter='otype:(a p ap)']
If you need to get three level nesting you might probably check [subquery],
but I suppose it's easier to recover hierarchy from what you have rigth
now.

On Mon, Oct 3, 2016 at 7:38 PM, Juan Botero 
wrote:

> I am fairly new to Solr, so is possible I am writing the query wrong (I
> have Solr 4.10)
>
> On this data:
> [{
> "id": -1666,
> "otype": "ao",
> "parent_id": -1,
> "parent_type": "root",
> "name": "JOSHUA N AARON MD PA",
> "account_number": "002812300",
> "tax_id": "50042772325",
> "group_npi": 134630688333,
> "taxonomy": "364SP0808AAX",
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2139-12-31T00:00:00.00Z",
> "_childDocuments_": [{
> "otype": "p",
> "parent_id": -1666,
> "parent_type": "ao",
> "id": 271,
> "plan_id": "IBC",
> "plan_url_identifier": "ibc"
> }, {
> "otype": "a",
> "parent_id": -1666,
> "parent_type": "ao",
> "id": -88,
> "line1": "216 E PULASKI STE 235",
> "city": "Elkton",
> "state": "MD",
> "zip_code": "21921",
> "_childDocuments_": [{
> "otype": "ph",
> "parent_id": -88,
> "parent_type": "a",
> "id": 1,
> "number": "5556201984"
> }, {
> "otype": "ph",
> "parent_id": -88,
> "parent_type": "a",
> "id": 2,
> "number":"5558696114"
> }]
> }, {
> "id": -1988,
> "otype": "ap",
> "parent_id": -1666,
> "parent_type": "ao",
> "plan_provider_id": "00283621227",
> "is_pcp": false,
> "is_specialist": false,
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2014-05-01T00:00:00.00Z",
> "_childDocuments_": [{
> "id": -819,
> "otype": "pf",
> "parent_id": -1988,
> "parent_type": "ap",
> "npi": 139670334111,
> "_childDocuments_": [{
> "otype": "n",
> "parent_id": -819,
> "parent_type": "pf",
> "id": 1,
> "prefix": "Dr.",
> "first": "Frank",
> "middle": "N",
> "last": "Aaron"
> }],
> "organization_name": "Frank N Aaron",
> "date_of_birth": "1963-03-18T00:00:00.00Z",
> "gender_code": "M",
> "is_individual": true
> }]
> }, {
> "id": -1987,
> "otype": "ap",
> "parent_id": -1666,
> "parent_type": "ao",
> "plan_provider_id": "00283621007",
> "is_pcp": false,
> "is_specialist": false,
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2014-05-01T00:00:00.00Z",
> "_childDocuments_": [{
> "id": -815,
> "otype": "pf",
> "parent_id": -1987,
> "parent_type": "ap",
> "npi": 139670335001,
> "_childDocuments_": [{
> "otype": "n",
> "parent_id": -815,
> "parent_type": "pf",
> "id": 1,
> "prefix": "Dr.",
> "first": "Joshua",
> "middle": "N",
> "last": "Aaron"
> }],
> "organization_name": "Joshua N Aaron",
> "date_of_birth": "1963-03-18T00:00:00.00Z",
> "gender_code": "M",
> "is_individual": true
> }]
> }]
> }]
>
> This query:
> q=otype:pf&fl=*,[docid],[child parentFilter=otype:pf limit=500]
>
> returns different documents in the childocuments array for each doc in
> result, 2 oype:pf in total. But they have the same structure in the
> datasource. I expected each to have one child,  the otype:n document, but
> each has a different set of children and both more than intended, why?
>
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the
> intended recipient and may contain information that is privileged,
> confidential or exempt from disclosure under applicable law. If you are not
> the intended recipient, any disclosure, distribution or other use of this
> e-mail message or attachments is prohibited. If you have received this
> e-mail message in error, please delete and notify the sender immediately.
> Thank you.
>



-- 
Sincerely yours
Mikhail Khludnev

EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende

Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
with a core where the dataDir is located somewhere outside of where the
config is located.

I'd like to do this without system properties, and all through Java code.

In Solr 5.x I was able to do this with the following code:

CoreContainer coreContainer = new CoreContainer(solrHome);
coreContainer.load();

Properties props = new Properties();
props.setProperty("dataDir", dataDir + "/" + coreName);

CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
new File(coreHome, coreName).getAbsolutePath(), props);

SolrCore solrCore = coreContainer.create(descriptor);
new EmbeddedSolrServer(coreContainer, coreName);


The CoreContainer API changed a bit in 6.x and you can no longer pass in a
descriptor. I've tried a couple of things with the current API, but haven't
been able to get it working.

Any ideas are appreciated.

Thanks,

Bryan

RE: Multi-level nesting query inconsistency

2016-10-03 Thread Juan Botero

Hi, thank you.

1. So why do I get those back? They are not even 'legitimate' grandchildren
2. If I do 
localhost:8983/solr/nested_object_testing/query?debug=query&q=otype:pf&fl=*,[docid],[child
 parentFilter=otype:pf limit=500 childFilter='otype:(a p ap)'] --> I get other 
children except name document, but if I do 
localhost:8983/solr/nested_object_testing/query?debug=query&q=otype:pf&fl=*,[docid],[child
 parentFilter=otype:pf limit=500 childFilter=otype:n] --> I do just get name 
documents, should 'childFilter' just include what I need?
3. I have not been able to write a working [subquery], could you help me out 
and write anyone for me with this data.

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: Monday, October 03, 2016 1:11 PM
To: solr-user
Subject: Re: Multi-level nesting query inconsistency

Hello,
you can strip grandchildren with
[child parentFilter=otype:pf limit=500 childFilter='otype:(a p ap)'] If you 
need to get three level nesting you might probably check [subquery], but I 
suppose it's easier to recover hierarchy from what you have rigth now.

On Mon, Oct 3, 2016 at 7:38 PM, Juan Botero 
wrote:

> I am fairly new to Solr, so is possible I am writing the query wrong
> (I have Solr 4.10)
>
> On this data:
> [{
> "id": -1666,
> "otype": "ao",
> "parent_id": -1,
> "parent_type": "root",
> "name": "JOSHUA N AARON MD PA",
> "account_number": "002812300",
> "tax_id": "50042772325",
> "group_npi": 134630688333,
> "taxonomy": "364SP0808AAX",
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2139-12-31T00:00:00.00Z",
> "_childDocuments_": [{
> "otype": "p",
> "parent_id": -1666,
> "parent_type": "ao",
> "id": 271,
> "plan_id": "IBC",
> "plan_url_identifier": "ibc"
> }, {
> "otype": "a",
> "parent_id": -1666,
> "parent_type": "ao",
> "id": -88,
> "line1": "216 E PULASKI STE 235",
> "city": "Elkton",
> "state": "MD",
> "zip_code": "21921",
> "_childDocuments_": [{
> "otype": "ph",
> "parent_id": -88,
> "parent_type": "a",
> "id": 1,
> "number": "5556201984"
> }, {
> "otype": "ph",
> "parent_id": -88,
> "parent_type": "a",
> "id": 2,
> "number":"5558696114"
> }]
> }, {
> "id": -1988,
> "otype": "ap",
> "parent_id": -1666,
> "parent_type": "ao",
> "plan_provider_id": "00283621227",
> "is_pcp": false,
> "is_specialist": false,
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2014-05-01T00:00:00.00Z",
> "_childDocuments_": [{
> "id": -819,
> "otype": "pf",
> "parent_id": -1988,
> "parent_type": "ap",
> "npi": 139670334111,
> "_childDocuments_": [{
> "otype": "n",
> "parent_id": -819,
> "parent_type": "pf",
> "id": 1,
> "prefix": "Dr.",
> "first": "Frank",
> "middle": "N",
> "last": "Aaron"
> }],
> "organization_name": "Frank N Aaron",
> "date_of_birth": "1963-03-18T00:00:00.00Z",
> "gender_code": "M",
> "is_individual": true
> }]
> }, {
> "id": -1987,
> "otype": "ap",
> "parent_id": -1666,
> "parent_type": "ao",
> "plan_provider_id": "00283621007",
> "is_pcp": false,
> "is_specialist": false,
> "start_date": "2001-04-01T00:00:00.00Z",
> "end_date": "2014-05-01T00:00:00.00Z",
> "_childDocuments_": [{
> "id": -815,
> "otype": "pf",
> "parent_id": -1987,
> "parent_type": "ap",
> "npi": 139670335001,
> "_childDocuments_": [{
> "otype": "n",
> "parent_id": -815,
> "parent_type": "pf",
> "id": 1,
> "prefix": "Dr.",
> "first": "Joshua",
> "middle": "N",
> "last": "Aaron"
> }],
> "organization_name": "Joshua N Aaron",
> "date_of_birth": "1963-03-18T00:00:00.00Z",
> "gender_code": "M",
> "is_individual": true
> }]
> }]
> }]
>
> This query:
> q=otype:pf&fl=*,[docid],[child parentFilter=otype:pf limit=500]
>
> returns different documents in the childocuments array for each doc in
> result, 2 oype:pf in total. But they have the same structure in the
> datasource. I expected each to have one child,  the otype:n document,
> but each has a different set of children and both more than intended, why?
>
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the
> intended recipient and may contain information that is privileged,
> confidential or exempt from disclosure under applicable law. If you
> are not the intended recipient, any disclosure, distribution or other
> use of this e-mail message or attachments is prohibited. If you have
> received this e-mail message in error, please delete and notify the sender 
> immediately.
> Thank you.
>



--
Sincerely yours
Mikhail Khludnev
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you h

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward

This should work:

SolrCore solrCore 
= coreContainer.create(coreName, Paths.get(coreHome).resolve(coreName), 
Collections.emptyMap());


Alan Woodward
www.flax.co.uk


> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> 
> Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
> with a core where the dataDir is located somewhere outside of where the
> config is located.
> 
> I'd like to do this without system properties, and all through Java code.
> 
> In Solr 5.x I was able to do this with the following code:
> 
> CoreContainer coreContainer = new CoreContainer(solrHome);
> coreContainer.load();
> 
> Properties props = new Properties();
> props.setProperty("dataDir", dataDir + "/" + coreName);
> 
> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> new File(coreHome, coreName).getAbsolutePath(), props);
> 
> SolrCore solrCore = coreContainer.create(descriptor);
> new EmbeddedSolrServer(coreContainer, coreName);
> 
> 
> The CoreContainer API changed a bit in 6.x and you can no longer pass in a
> descriptor. I've tried a couple of things with the current API, but haven't
> been able to get it working.
> 
> Any ideas are appreciated.
> 
> Thanks,
> 
> Bryan

SOLR Sizing

2016-10-03 Thread Vasu Y

Hi,
 I am trying to estimate disk space requirements for the documents indexed
to SOLR.
I went through the LucidWorks blog (
https://lucidworks.com/blog/2011/09/14/estimating-memory-and-storage-for-lucenesolr/)
and using this as the template. I have a question regarding estimating
"Avg. Document Size (KB)".

When calculating Disk Storage requirements, can we use the Java Types
sizing (
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html) &
come up average document size?

Please let know if the following assumptions are correct.

 Data Type   Size
 --  --
 long   8 bytes
 tint   4 bytes
 tdate 8 bytes (Stored as long?)
 string 1 byte per char for ASCII chars and 2 bytes per char for
Non-ASCII chars (Double byte chars)
 text   1 byte per char for ASCII chars and 2 bytes per char for
Non-ASCII (Double byte chars) (For both with & without norm?)
 ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)
 boolean 1 bit?

 Thanks,
 Vasu

CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Rishabh Patel

Hello,

My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test
fails to run. However, without Kerberos, I am able to run the test with no
issues.

I ran the following command:

java -cp
"./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
-ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
hdfs://:8020/apps/solr/data/ExampleCollection/core_node1/data/index

The error is:

ERROR: could not open hdfs directory "
hdfs://:8020/apps/solr/data/ExampleCollection/core_node1/data/index
";
exiting 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
Does this error message imply that the test cannot run with Kerberos
enabled?

For reference, I followed this blog
http://yonik.com/solr-5-5/

-- 
Regards,
*Rishabh Patel*

Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Kevin Risden

You need to have the hadoop pieces on the classpath. Like core-site.xml and
hdfs-site.xml. There is an hdfs classpath command that would help but it
may have too many pieces. You may just need core-site and hdfs-site so you
don't get conflicting jars.

Something like this may work for you:

java -cp
"$(hdfs classpath):./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
-ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
hdfs://:8020/apps/solr/data/ExampleCollection/
core_node1/data/index

Kevin Risden

On Mon, Oct 3, 2016 at 1:38 PM, Rishabh Patel <
rishabh.mahendra.pa...@gmail.com> wrote:

> Hello,
>
> My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test
> fails to run. However, without Kerberos, I am able to run the test with no
> issues.
>
> I ran the following command:
>
> java -cp
> "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
> ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
> -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
>
> The error is:
>
> ERROR: could not open hdfs directory "
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
> ";
> exiting org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.
> AccessControlException):
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
> Does this error message imply that the test cannot run with Kerberos
> enabled?
>
> For reference, I followed this blog
> http://yonik.com/solr-5-5/
>
> --
> Regards,
> *Rishabh Patel*
>

Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Whelan, Andy

Hello,
I am guessing that what I am looking for is probably going to require extending 
StandardTokenizerFactory or ClassicTokenizerFactory. But I thought I would ask 
the group here before attempting this. We are indexing documents from an 
eclectic set of sources. There is, however, a heavy interest in computing and 
social media sources. So computer terminology and social media terms (terms 
beginning with hashes (#), @ symbols, etc.) are terms that we would like to 
have searchable.

We are considering the ClassicTokenizerFactory because we like the fact that it 
does not use the Unicode standard annex 
UAX#29 word boundary rules. 
It preserves email addresses, internet domain names, etc.  We would also like 
to use it as the tokenizer element of index and query analyzers that would 
preserve @< rest of token > or # patterns.

I have seen examples where folks are replacing the StandardTokenizerFactory in 
their analyzer with stream combinations made up of charFilters,  
WhitespaceTokenizerFactory, etc. as in the following article 
http://www.prowave.io/indexing-special-terms-using-solr/ to remedy such 
problems.

Example:
 
 
 
 
 
 
 
 
 
 
 
 


I am just wondering if anyone knew of a smart way (without extending classes) 
to actually preserve most of the ClassicTokenizerFactory functionality without 
getting rid of leading special characters? The "Solr In Action" book (page 179) 
claims that it is hard to extend the StandardTokenizerFactory. I'm assuming 
this is the same for ClassicTokenizerFactory.

Thanks
-Andrew

RE: SOLR Sizing

2016-10-03 Thread Allison, Timothy B.

This doesn't answer your question, but Erick Erickson's blog on this topic is 
invaluable:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

-Original Message-
From: Vasu Y [mailto:vya...@gmail.com] 
Sent: Monday, October 3, 2016 2:09 PM
To: solr-user@lucene.apache.org
Subject: SOLR Sizing

Hi,
 I am trying to estimate disk space requirements for the documents indexed to 
SOLR.
I went through the LucidWorks blog (
https://lucidworks.com/blog/2011/09/14/estimating-memory-and-storage-for-lucenesolr/)
and using this as the template. I have a question regarding estimating "Avg. 
Document Size (KB)".

When calculating Disk Storage requirements, can we use the Java Types sizing (
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html) & 
come up average document size?

Please let know if the following assumptions are correct.

 Data Type   Size
 --  --
 long   8 bytes
 tint   4 bytes
 tdate 8 bytes (Stored as long?)
 string 1 byte per char for ASCII chars and 2 bytes per char for
Non-ASCII chars (Double byte chars)
 text   1 byte per char for ASCII chars and 2 bytes per char for
Non-ASCII (Double byte chars) (For both with & without norm?)  
ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)  boolean 1 
bit?

 Thanks,
 Vasu

Re: Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Ahmet Arslan

Hi Andy,

WordDelimeterFilter has "types" option. There is an example file named 
wdftypes.txt in the source tree that preserves #hashtags and @mentions. If you 
follow this path, please use Whitespace tokenizer.

Ahmet



On Monday, October 3, 2016 9:52 PM, "Whelan, Andy"  wrote:
Hello,
I am guessing that what I am looking for is probably going to require extending 
StandardTokenizerFactory or ClassicTokenizerFactory. But I thought I would ask 
the group here before attempting this. We are indexing documents from an 
eclectic set of sources. There is, however, a heavy interest in computing and 
social media sources. So computer terminology and social media terms (terms 
beginning with hashes (#), @ symbols, etc.) are terms that we would like to 
have searchable.

We are considering the ClassicTokenizerFactory because we like the fact that it 
does not use the Unicode standard annex 
UAX#29 word boundary rules. 
It preserves email addresses, internet domain names, etc.  We would also like 
to use it as the tokenizer element of index and query analyzers that would 
preserve @< rest of token > or # patterns.

I have seen examples where folks are replacing the StandardTokenizerFactory in 
their analyzer with stream combinations made up of charFilters,  
WhitespaceTokenizerFactory, etc. as in the following article 
http://www.prowave.io/indexing-special-terms-using-solr/ to remedy such 
problems.

Example:
 
 
 
 
 
 
 
 
 
 
 
 


I am just wondering if anyone knew of a smart way (without extending classes) 
to actually preserve most of the ClassicTokenizerFactory functionality without 
getting rid of leading special characters? The "Solr In Action" book (page 179) 
claims that it is hard to extend the StandardTokenizerFactory. I'm assuming 
this is the same for ClassicTokenizerFactory.

Thanks
-Andrew

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User

Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variancemaybe network, etc.

Thoughts?


Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr
Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted
Docs578735787317695859369459369457873578735787357873Segment
Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario
#119821014518619020820921020610914273701601098385Scenario
#29288596258727077746873636166545251




On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>

Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Rishabh Patel

Thanks Kevin, this worked for me.

On Mon, Oct 3, 2016 at 11:48 AM, Kevin Risden 
wrote:

> You need to have the hadoop pieces on the classpath. Like core-site.xml and
> hdfs-site.xml. There is an hdfs classpath command that would help but it
> may have too many pieces. You may just need core-site and hdfs-site so you
> don't get conflicting jars.
>
> Something like this may work for you:
>
> java -cp
> "$(hdfs classpath):./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
> ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
> -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
>
> Kevin Risden
>
> On Mon, Oct 3, 2016 at 1:38 PM, Rishabh Patel <
> rishabh.mahendra.pa...@gmail.com> wrote:
>
> > Hello,
> >
> > My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex
> test
> > fails to run. However, without Kerberos, I am able to run the test with
> no
> > issues.
> >
> > I ran the following command:
> >
> > java -cp
> > "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
> > ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
> > -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
> > hdfs://:8020/apps/solr/data/ExampleCollection/
> > core_node1/data/index
> >
> > The error is:
> >
> > ERROR: could not open hdfs directory "
> > hdfs://:8020/apps/solr/data/ExampleCollection/
> > core_node1/data/index
> > ";
> > exiting org.apache.hadoop.ipc.RemoteException(org.apache.
> hadoop.security.
> > AccessControlException):
> > SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
> > Does this error message imply that the test cannot run with Kerberos
> > enabled?
> >
> > For reference, I followed this blog
> > http://yonik.com/solr-5-5/
> >
> > --
> > Regards,
> > *Rishabh Patel*
> >
>



-- 
Sincerely,
*Rishabh Patel*

Scaling data extractor with Solr

2016-10-03 Thread Steven White

Hi everyone,

I'm up to speed about Solr on how it can be setup to provide high
availability (if one Solr server goes down, the backup one takes over).  My
question is how do I make my custom crawler to play "nice" with Solr in
this environment.

Let us say I setup Solr with 3 servers so that if one fails the other one
takes over.  Let us say I also setup my crawler with 3 servers so if one
goes down the other takes over.  But how should my crawlers work?  Can each
function unaware of each other and send the same data to Solr or must my
crawlers synchronize with each other so only 1 is active sending data to
Solr and the others are on stand-by mode?

I like to hear from others how they solved this problem so I don't end up
re-inventing it.

Thanks.

Steve

Facet+Stats+MinCount: How to use mincount filter when use facet+stats

2016-10-03 Thread Jeffery Yuan

We store some events data such as *accountId, startTime, endTime, timeSpent*
and some other searchable fields.We want to get all acountIds that spend
more than xhours between startTime and endTime and some other criteria which
are not important here.We can use facet and stats query like
below:*stats=true&stats.field={!tag=timeSpent
sum=true}timeSpent&facet=true&facet.pivot={!stats=timeSpent}accountId*This
will return how many hours all acountIds spends between startTime and
endTime, then I can do filter in our code.But this will return a lot of
acountIds which are not wanted. I am wondering whether Solr can do the
filter for us and only return accountIds that spends more than x hours?Is it
possible that I use facet function to get sum, and then use mincount to do
filter? http://yonik.com/solr-facet-functions/
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-tp4299367.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to implement a custom boost function

2016-10-03 Thread Lucas Cotta

Hello,

I'm new in Solr (4.7.2) and I was given the following requirement:

Given a query such as:

studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR
107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR
875166 OR 875151 OR 918829 OR 918808)

I want the results to be ordered by the same order the elements were
informed in the query. This would be similar to MySQL's ORDER BY FIELD(id,
3,2,5,7,8,1).

I have tried to use term boosting

in the query but that only works when I use big factors like this:
875078^10
OR 875166^1 OR 875151^1000 OR 918829^100OR 918808^10

But that would cause the query to be too big in case I have 200 ids for
instance.

So it seems I need to implement a custom FunctionQuery.
I'm a little lost on how to do that. Could someone please give me an idea?
Which classes should my custom class extend from? Where should I place this
class? Should I add to Solr project it self and regenerate the JAR?

Thanks

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende

Alan,

Thanks for the response. I will double-check, but I believe that is going
to put the data directory for the core under coreHome/coreName.

What I am trying to setup (and did a poor job of explaining) is something
like the following...

- Solr home in src/test/resources/solr
- Core home in src/test/resources/myCore
- dataDir for the myCore in target/myCore (or something not in the source
tree).

This way the unit tests can use the Solr home and core config that is under
version control, but the data from testing would be written somewhere not
under version control.

in 5.x I was specifying the dataDir through the properties object... I
would calculate the path to the target dir in Java code relative to the
class file, and then pass that as dataDir to the following:

Properties props = new Properties();
props.setProperty("dataDir", dataDir + "/" + coreName);

In 6.x it seems like Properties has been replaced with the
Map ? and I tried putting dataDir in there, but didn't seem
to do anything.

For now I have just been using RAMDirectoryFactory so that no data ever
gets written to disk.

I'll keep trying different things, but if you have any thoughts let me know.

Thanks,

Bryan

On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:

> This should work:
>
> SolrCore solrCore
> = coreContainer.create(coreName, 
> Paths.get(coreHome).resolve(coreName),
> Collections.emptyMap());
>
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> >
> > Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
> > with a core where the dataDir is located somewhere outside of where the
> > config is located.
> >
> > I'd like to do this without system properties, and all through Java code.
> >
> > In Solr 5.x I was able to do this with the following code:
> >
> > CoreContainer coreContainer = new CoreContainer(solrHome);
> > coreContainer.load();
> >
> > Properties props = new Properties();
> > props.setProperty("dataDir", dataDir + "/" + coreName);
> >
> > CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> > new File(coreHome, coreName).getAbsolutePath(), props);
> >
> > SolrCore solrCore = coreContainer.create(descriptor);
> > new EmbeddedSolrServer(coreContainer, coreName);
> >
> >
> > The CoreContainer API changed a bit in 6.x and you can no longer pass in
> a
> > descriptor. I've tried a couple of things with the current API, but
> haven't
> > been able to get it working.
> >
> > Any ideas are appreciated.
> >
> > Thanks,
> >
> > Bryan
>
>

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward

Ah, I see what you mean.  Putting the dataDir property into the Map certainly 
ought to work - can you write a test case that shows what’s happening?

Alan Woodward
www.flax.co.uk


> On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
> 
> Alan,
> 
> Thanks for the response. I will double-check, but I believe that is going
> to put the data directory for the core under coreHome/coreName.
> 
> What I am trying to setup (and did a poor job of explaining) is something
> like the following...
> 
> - Solr home in src/test/resources/solr
> - Core home in src/test/resources/myCore
> - dataDir for the myCore in target/myCore (or something not in the source
> tree).
> 
> This way the unit tests can use the Solr home and core config that is under
> version control, but the data from testing would be written somewhere not
> under version control.
> 
> in 5.x I was specifying the dataDir through the properties object... I
> would calculate the path to the target dir in Java code relative to the
> class file, and then pass that as dataDir to the following:
> 
> Properties props = new Properties();
> props.setProperty("dataDir", dataDir + "/" + coreName);
> 
> In 6.x it seems like Properties has been replaced with the
> Map ? and I tried putting dataDir in there, but didn't seem
> to do anything.
> 
> For now I have just been using RAMDirectoryFactory so that no data ever
> gets written to disk.
> 
> I'll keep trying different things, but if you have any thoughts let me know.
> 
> Thanks,
> 
> Bryan
> 
> 
> On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
> 
>> This should work:
>> 
>> SolrCore solrCore
>>= coreContainer.create(coreName, 
>> Paths.get(coreHome).resolve(coreName),
>> Collections.emptyMap());
>> 
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
>>> 
>>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
>>> with a core where the dataDir is located somewhere outside of where the
>>> config is located.
>>> 
>>> I'd like to do this without system properties, and all through Java code.
>>> 
>>> In Solr 5.x I was able to do this with the following code:
>>> 
>>> CoreContainer coreContainer = new CoreContainer(solrHome);
>>> coreContainer.load();
>>> 
>>> Properties props = new Properties();
>>> props.setProperty("dataDir", dataDir + "/" + coreName);
>>> 
>>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
>>> new File(coreHome, coreName).getAbsolutePath(), props);
>>> 
>>> SolrCore solrCore = coreContainer.create(descriptor);
>>> new EmbeddedSolrServer(coreContainer, coreName);
>>> 
>>> 
>>> The CoreContainer API changed a bit in 6.x and you can no longer pass in
>> a
>>> descriptor. I've tried a couple of things with the current API, but
>> haven't
>>> been able to get it working.
>>> 
>>> Any ideas are appreciated.
>>> 
>>> Thanks,
>>> 
>>> Bryan
>> 
>>

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende

Yea I'll try to put something together and report back.

On Mon, Oct 3, 2016 at 6:54 PM, Alan Woodward  wrote:

> Ah, I see what you mean.  Putting the dataDir property into the Map
> certainly ought to work - can you write a test case that shows what’s
> happening?
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
> >
> > Alan,
> >
> > Thanks for the response. I will double-check, but I believe that is going
> > to put the data directory for the core under coreHome/coreName.
> >
> > What I am trying to setup (and did a poor job of explaining) is something
> > like the following...
> >
> > - Solr home in src/test/resources/solr
> > - Core home in src/test/resources/myCore
> > - dataDir for the myCore in target/myCore (or something not in the source
> > tree).
> >
> > This way the unit tests can use the Solr home and core config that is
> under
> > version control, but the data from testing would be written somewhere not
> > under version control.
> >
> > in 5.x I was specifying the dataDir through the properties object... I
> > would calculate the path to the target dir in Java code relative to the
> > class file, and then pass that as dataDir to the following:
> >
> > Properties props = new Properties();
> > props.setProperty("dataDir", dataDir + "/" + coreName);
> >
> > In 6.x it seems like Properties has been replaced with the
> > Map ? and I tried putting dataDir in there, but didn't
> seem
> > to do anything.
> >
> > For now I have just been using RAMDirectoryFactory so that no data ever
> > gets written to disk.
> >
> > I'll keep trying different things, but if you have any thoughts let me
> know.
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
> >
> >> This should work:
> >>
> >> SolrCore solrCore
> >>= coreContainer.create(coreName, Paths.get(coreHome).resolve(
> coreName),
> >> Collections.emptyMap());
> >>
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> >>>
> >>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr
> 6.x,
> >>> with a core where the dataDir is located somewhere outside of where the
> >>> config is located.
> >>>
> >>> I'd like to do this without system properties, and all through Java
> code.
> >>>
> >>> In Solr 5.x I was able to do this with the following code:
> >>>
> >>> CoreContainer coreContainer = new CoreContainer(solrHome);
> >>> coreContainer.load();
> >>>
> >>> Properties props = new Properties();
> >>> props.setProperty("dataDir", dataDir + "/" + coreName);
> >>>
> >>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> >>> new File(coreHome, coreName).getAbsolutePath(), props);
> >>>
> >>> SolrCore solrCore = coreContainer.create(descriptor);
> >>> new EmbeddedSolrServer(coreContainer, coreName);
> >>>
> >>>
> >>> The CoreContainer API changed a bit in 6.x and you can no longer pass
> in
> >> a
> >>> descriptor. I've tried a couple of things with the current API, but
> >> haven't
> >>> been able to get it working.
> >>>
> >>> Any ideas are appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Bryan
> >>
> >>
>
>

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende

After some more debugging, I think putting the dataDir in the
Map of properties is actually working, but still running
into a couple of issues with the setup...

I created an example project that demonstrates the scenario:
https://github.com/bbende/solrcore-datdir-test/blob/master/src/test/java/org/apache/solr/EmbeddedSolrServerFactory.java

When calling coreContainer.create(String coreName, Path instancePath,
Map parameters)...

If instancePath is relative, then the core is loaded with no errors, but it
ends up writing a core.properties relative to Solr home, so it writes:
src/test/resources/solr/src/test/resources/exampleCollection/core.properties

If instancePath is absolute, then it fails to start up because there is
already a core.properties at
/full/path/to/src/test/resources/exampleCollection, the exception is thrown
from:

at
org.apache.solr.core.CorePropertiesLocator.create(CorePropertiesLocator.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:778)


Since everything from src/test/resources is already being put under
target/test-classes as part of the build, I'm thinking a better approach
would be to reference those paths for the Solr home and instancePath.

If I remove the core.properties from src/test/resources/exampleCollection,
then it can write a new one to target/test-classes/exampleCollection, and
will even put the dataDir there by default.



On Mon, Oct 3, 2016 at 7:00 PM, Bryan Bende  wrote:

> Yea I'll try to put something together and report back.
>
> On Mon, Oct 3, 2016 at 6:54 PM, Alan Woodward  wrote:
>
>> Ah, I see what you mean.  Putting the dataDir property into the Map
>> certainly ought to work - can you write a test case that shows what’s
>> happening?
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>> > On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
>> >
>> > Alan,
>> >
>> > Thanks for the response. I will double-check, but I believe that is
>> going
>> > to put the data directory for the core under coreHome/coreName.
>> >
>> > What I am trying to setup (and did a poor job of explaining) is
>> something
>> > like the following...
>> >
>> > - Solr home in src/test/resources/solr
>> > - Core home in src/test/resources/myCore
>> > - dataDir for the myCore in target/myCore (or something not in the
>> source
>> > tree).
>> >
>> > This way the unit tests can use the Solr home and core config that is
>> under
>> > version control, but the data from testing would be written somewhere
>> not
>> > under version control.
>> >
>> > in 5.x I was specifying the dataDir through the properties object... I
>> > would calculate the path to the target dir in Java code relative to the
>> > class file, and then pass that as dataDir to the following:
>> >
>> > Properties props = new Properties();
>> > props.setProperty("dataDir", dataDir + "/" + coreName);
>> >
>> > In 6.x it seems like Properties has been replaced with the
>> > Map ? and I tried putting dataDir in there, but didn't
>> seem
>> > to do anything.
>> >
>> > For now I have just been using RAMDirectoryFactory so that no data ever
>> > gets written to disk.
>> >
>> > I'll keep trying different things, but if you have any thoughts let me
>> know.
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
>> >
>> >> This should work:
>> >>
>> >> SolrCore solrCore
>> >>= coreContainer.create(coreName, Paths.get(coreHome).resolve(co
>> reName),
>> >> Collections.emptyMap());
>> >>
>> >>
>> >> Alan Woodward
>> >> www.flax.co.uk
>> >>
>> >>
>> >>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
>> >>>
>> >>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr
>> 6.x,
>> >>> with a core where the dataDir is located somewhere outside of where
>> the
>> >>> config is located.
>> >>>
>> >>> I'd like to do this without system properties, and all through Java
>> code.
>> >>>
>> >>> In Solr 5.x I was able to do this with the following code:
>> >>>
>> >>> CoreContainer coreContainer = new CoreContainer(solrHome);
>> >>> coreContainer.load();
>> >>>
>> >>> Properties props = new Properties();
>> >>> props.setProperty("dataDir", dataDir + "/" + coreName);
>> >>>
>> >>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer,
>> coreName,
>> >>> new File(coreHome, coreName).getAbsolutePath(), props);
>> >>>
>> >>> SolrCore solrCore = coreContainer.create(descriptor);
>> >>> new EmbeddedSolrServer(coreContainer, coreName);
>> >>>
>> >>>
>> >>> The CoreContainer API changed a bit in 6.x and you can no longer pass
>> in
>> >> a
>> >>> descriptor. I've tried a couple of things with the current API, but
>> >> haven't
>> >>> been able to get it working.
>> >>>
>> >>> Any ideas are appreciated.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Bryan
>> >>
>> >>
>>
>>
>

Re: SOLR Sizing

2016-10-03 Thread Susheel Kumar

In short, if you want your estimate to be closer then run some actual
ingestion for say 1-5% of your total docs and extrapolate since every
search product may have different schema,different set of fields, different
index vs. stored fields,  copy fields, different analysis chain etc.

If you want to just have a very quick rough estimate, create few flat json
sample files (below) with field names and key values(actual data for better
estimate). Put all the fields names which you are going to index/put into
Solr and check the json file size. This will give you average size of a doc
and then multiply with # docs to get a rough index size.

{
"id":"product12345"
"name":"productA",
"category":"xyz",
...
...
}

Thanks,
Susheel

On Mon, Oct 3, 2016 at 3:19 PM, Allison, Timothy B. 
wrote:

> This doesn't answer your question, but Erick Erickson's blog on this topic
> is invaluable:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
>
> -Original Message-
> From: Vasu Y [mailto:vya...@gmail.com]
> Sent: Monday, October 3, 2016 2:09 PM
> To: solr-user@lucene.apache.org
> Subject: SOLR Sizing
>
> Hi,
>  I am trying to estimate disk space requirements for the documents indexed
> to SOLR.
> I went through the LucidWorks blog (
> https://lucidworks.com/blog/2011/09/14/estimating-memory-
> and-storage-for-lucenesolr/)
> and using this as the template. I have a question regarding estimating
> "Avg. Document Size (KB)".
>
> When calculating Disk Storage requirements, can we use the Java Types
> sizing (
> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
> & come up average document size?
>
> Please let know if the following assumptions are correct.
>
>  Data Type   Size
>  --  --
>  long   8 bytes
>  tint   4 bytes
>  tdate 8 bytes (Stored as long?)
>  string 1 byte per char for ASCII chars and 2 bytes per char for
> Non-ASCII chars (Double byte chars)
>  text   1 byte per char for ASCII chars and 2 bytes per char for
> Non-ASCII (Double byte chars) (For both with & without norm?)
> ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)
> boolean 1 bit?
>
>  Thanks,
>  Vasu
>

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood

That approach doesn’t work very well for estimates.

Some parts of the index size and speed scale with the vocabulary instead of the 
number of documents.
Vocabulary usually grows at about the square root of the total amount of text 
in the index. OCR’ed text
breaks that estimate badly, with huge vocabularies.

Also, it is common to find non-linear jumps in performance. I’m benchmarking a 
change in a 12 million
document index. It improves the 95th percentile response time for one style of 
query from 3.8 seconds
to 2 milliseconds. I’m testing with a log of 200k queries from a production 
host, so I’m pretty sure that
is accurate.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 3, 2016, at 6:02 PM, Susheel Kumar  wrote:
> 
> In short, if you want your estimate to be closer then run some actual
> ingestion for say 1-5% of your total docs and extrapolate since every
> search product may have different schema,different set of fields, different
> index vs. stored fields,  copy fields, different analysis chain etc.
> 
> If you want to just have a very quick rough estimate, create few flat json
> sample files (below) with field names and key values(actual data for better
> estimate). Put all the fields names which you are going to index/put into
> Solr and check the json file size. This will give you average size of a doc
> and then multiply with # docs to get a rough index size.
> 
> {
> "id":"product12345"
> "name":"productA",
> "category":"xyz",
> ...
> ...
> }
> 
> Thanks,
> Susheel
> 
> On Mon, Oct 3, 2016 at 3:19 PM, Allison, Timothy B. 
> wrote:
> 
>> This doesn't answer your question, but Erick Erickson's blog on this topic
>> is invaluable:
>> 
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
>> the-abstract-why-we-dont-have-a-definitive-answer/
>> 
>> -Original Message-
>> From: Vasu Y [mailto:vya...@gmail.com]
>> Sent: Monday, October 3, 2016 2:09 PM
>> To: solr-user@lucene.apache.org
>> Subject: SOLR Sizing
>> 
>> Hi,
>> I am trying to estimate disk space requirements for the documents indexed
>> to SOLR.
>> I went through the LucidWorks blog (
>> https://lucidworks.com/blog/2011/09/14/estimating-memory-
>> and-storage-for-lucenesolr/)
>> and using this as the template. I have a question regarding estimating
>> "Avg. Document Size (KB)".
>> 
>> When calculating Disk Storage requirements, can we use the Java Types
>> sizing (
>> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
>> & come up average document size?
>> 
>> Please let know if the following assumptions are correct.
>> 
>> Data Type   Size
>> --  --
>> long   8 bytes
>> tint   4 bytes
>> tdate 8 bytes (Stored as long?)
>> string 1 byte per char for ASCII chars and 2 bytes per char for
>> Non-ASCII chars (Double byte chars)
>> text   1 byte per char for ASCII chars and 2 bytes per char for
>> Non-ASCII (Double byte chars) (For both with & without norm?)
>> ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)
>> boolean 1 bit?
>> 
>> Thanks,
>> Vasu
>>

Re: How to implement a custom boost function

2016-10-03 Thread Lucas Cotta

I actually could also use a custom similarity class that always returns 1.0
then I could use small boost factors such as ^1, ^2, ^3, etc.

But I want to do this only in some specific queries (that may contain other
fields besides studentId)

How could I do this, use the custom similarity class only for some queries?
Is it possible?

Thanks!

2016-10-03 19:49 GMT-03:00 Lucas Cotta :

> Hello,
>
> I'm new in Solr (4.7.2) and I was given the following requirement:
>
> Given a query such as:
>
> studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR
> 107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR
> 875166 OR 875151 OR 918829 OR 918808)
>
> I want the results to be ordered by the same order the elements were
> informed in the query. This would be similar to MySQL's ORDER BY
> FIELD(id, 3,2,5,7,8,1).
>
> I have tried to use term boosting
> 
> in the query but that only works when I use big factors like this: 
> 875078^10
> OR 875166^1 OR 875151^1000 OR 918829^100OR 918808^10
>
> But that would cause the query to be too big in case I have 200 ids for
> instance.
>
> So it seems I need to implement a custom FunctionQuery.
> I'm a little lost on how to do that. Could someone please give me an idea?
> Which classes should my custom class extend from? Where should I place this
> class? Should I add to Solr project it self and regenerate the JAR?
>
> Thanks
>

Listing of fields on Block Join Parent Query Parser

2016-10-03 Thread Zheng Lin Edwin Yeo

Hi,

Would like to check, how can we list out all the fields that are available
in the index?

I'm using dynamic fields, so the Schema API is not working for me, as it
will only list out things like *_s, *_f and not the full field name.

Also, as I'm using the Block Join Parent Query Parser, it will be good if
we can choose only to list out the fields in the parent or the child,
instead of listing both at one go.

I'm using Solr 6.2.1

Regards,
Edwin

firstSearcher per SolrCore or JVM?

2016-10-03 Thread Jihwan Kim

I am using external file fields with larger external files and I noticed
Solr Core Reload loads external files twice: firstSearcher and nextSearcher
event.

Does it mean the Core Reload triggers both events? What is the
benefit/reason of triggering both events at the same time?   I see this on
V. 4.10.4 and V. 6.2.0 both.

Thanks,

Upgrading from Solr cloud 4.1 to 6.2

2016-10-03 Thread Neeraj Bhatt

Hello All

We are trying to upgrade our production solr with 10 million documents from
solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2

How to upgrade the lucene index created by solr. Should I go into indexes
created by each shard and upgrade and  replicate it manually ? Also I tried
using Index upgrader in one replica of one shard as a test but it gives
error as it is looking for _4c.si file and it is not there

Any idea what is the easy way to upgrade solr cloud with a 10m repsoitory

Thanks
neeraj

Re: firstSearcher per SolrCore or JVM?

2016-10-03 Thread Erick Erickson

firstSearcher and newSeacher are definitely per core, they have to be since they
are intended to warm searchers and searchers are per core.

I don't particularly see the benefit of firing them both either. Not
sure which one makes
the most sense though.

Best,
Erick

On Mon, Oct 3, 2016 at 7:10 PM, Jihwan Kim  wrote:
> I am using external file fields with larger external files and I noticed
> Solr Core Reload loads external files twice: firstSearcher and nextSearcher
> event.
>
> Does it mean the Core Reload triggers both events? What is the
> benefit/reason of triggering both events at the same time?   I see this on
> V. 4.10.4 and V. 6.2.0 both.
>
> Thanks,

Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-03 Thread Erick Erickson

the very easiest way is to re-index. 10M documents shouldn't take
very long unless they're no longer available...

When you say you tried to use the index upgrader, which one? You'd
have to use the one distributed with 5.x to upgrade from 4.x->5.x, then
use the one distributed with 6x to go from 5.x->6.x.

Best,
Erick

On Mon, Oct 3, 2016 at 8:01 PM, Neeraj Bhatt  wrote:
> Hello All
>
> We are trying to upgrade our production solr with 10 million documents from
> solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2
>
> How to upgrade the lucene index created by solr. Should I go into indexes
> created by each shard and upgrade and  replicate it manually ? Also I tried
> using Index upgrader in one replica of one shard as a test but it gives
> error as it is looking for _4c.si file and it is not there
>
> Any idea what is the easy way to upgrade solr cloud with a 10m repsoitory
>
> Thanks
> neeraj

Re: SOLR Sizing

2016-10-03 Thread Erick Erickson

Walter:

What did you change? I might like to put that in my bag of tricks ;)

Erick

On Mon, Oct 3, 2016 at 6:30 PM, Walter Underwood  wrote:
> That approach doesn’t work very well for estimates.
>
> Some parts of the index size and speed scale with the vocabulary instead of 
> the number of documents.
> Vocabulary usually grows at about the square root of the total amount of text 
> in the index. OCR’ed text
> breaks that estimate badly, with huge vocabularies.
>
> Also, it is common to find non-linear jumps in performance. I’m benchmarking 
> a change in a 12 million
> document index. It improves the 95th percentile response time for one style 
> of query from 3.8 seconds
> to 2 milliseconds. I’m testing with a log of 200k queries from a production 
> host, so I’m pretty sure that
> is accurate.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Oct 3, 2016, at 6:02 PM, Susheel Kumar  wrote:
>>
>> In short, if you want your estimate to be closer then run some actual
>> ingestion for say 1-5% of your total docs and extrapolate since every
>> search product may have different schema,different set of fields, different
>> index vs. stored fields,  copy fields, different analysis chain etc.
>>
>> If you want to just have a very quick rough estimate, create few flat json
>> sample files (below) with field names and key values(actual data for better
>> estimate). Put all the fields names which you are going to index/put into
>> Solr and check the json file size. This will give you average size of a doc
>> and then multiply with # docs to get a rough index size.
>>
>> {
>> "id":"product12345"
>> "name":"productA",
>> "category":"xyz",
>> ...
>> ...
>> }
>>
>> Thanks,
>> Susheel
>>
>> On Mon, Oct 3, 2016 at 3:19 PM, Allison, Timothy B. 
>> wrote:
>>
>>> This doesn't answer your question, but Erick Erickson's blog on this topic
>>> is invaluable:
>>>
>>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
>>> the-abstract-why-we-dont-have-a-definitive-answer/
>>>
>>> -Original Message-
>>> From: Vasu Y [mailto:vya...@gmail.com]
>>> Sent: Monday, October 3, 2016 2:09 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: SOLR Sizing
>>>
>>> Hi,
>>> I am trying to estimate disk space requirements for the documents indexed
>>> to SOLR.
>>> I went through the LucidWorks blog (
>>> https://lucidworks.com/blog/2011/09/14/estimating-memory-
>>> and-storage-for-lucenesolr/)
>>> and using this as the template. I have a question regarding estimating
>>> "Avg. Document Size (KB)".
>>>
>>> When calculating Disk Storage requirements, can we use the Java Types
>>> sizing (
>>> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
>>> & come up average document size?
>>>
>>> Please let know if the following assumptions are correct.
>>>
>>> Data Type   Size
>>> --  --
>>> long   8 bytes
>>> tint   4 bytes
>>> tdate 8 bytes (Stored as long?)
>>> string 1 byte per char for ASCII chars and 2 bytes per char for
>>> Non-ASCII chars (Double byte chars)
>>> text   1 byte per char for ASCII chars and 2 bytes per char for
>>> Non-ASCII (Double byte chars) (For both with & without norm?)
>>> ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)
>>> boolean 1 bit?
>>>
>>> Thanks,
>>> Vasu
>>>
>

Re: Scaling data extractor with Solr

2016-10-03 Thread Erick Erickson

You can have as many clients indexing to Solr (either Cloud or
stand-alone) as you want, limited only by the load you put
on Solr. I.e. if your indexing throughput is so great that it makes
querying too slow then you have to scale back...

I know of setups with 100+ separate clients all indexing to Solr
at the same time.

Best,
Erick

On Mon, Oct 3, 2016 at 3:13 PM, Steven White  wrote:
> Hi everyone,
>
> I'm up to speed about Solr on how it can be setup to provide high
> availability (if one Solr server goes down, the backup one takes over).  My
> question is how do I make my custom crawler to play "nice" with Solr in
> this environment.
>
> Let us say I setup Solr with 3 servers so that if one fails the other one
> takes over.  Let us say I also setup my crawler with 3 servers so if one
> goes down the other takes over.  But how should my crawlers work?  Can each
> function unaware of each other and send the same data to Solr or must my
> crawlers synchronize with each other so only 1 is active sending data to
> Solr and the others are on stand-by mode?
>
> I like to hear from others how they solved this problem so I don't end up
> re-inventing it.
>
> Thanks.
>
> Steve

Re: firstSearcher per SolrCore or JVM?

2016-10-03 Thread Jihwan Kim

Thanks Eric.
FirstSearcher and newSearcher events open with two separate searchers.  For
the external file field case at least, the cache created with the
firstSearcher is not being used after the newSearcher creates another cache
(with same values)

I believe the warming is also per searcher.  So, I am still wondering why
two searcher is running during the core reload. (since the cache with the
firstSearcher is not used much)

On V 4.10.4, it seems the main (current) searcher is closed after
newSearcher event.  So, all cache is generated with a new searcher before
accepting a query with the new searcher.

On V. 6.2.0, the main (current) searcher is closed first and then the
newSearcher event is triggered.  So, the first HTTP request with an
external file field needs to wait until the cache is created with the
newSearcher.  In fact, two threads are running and I am not yet sure how
they are synchronised (or not synchronized)

Now sure, if I can attache an image here, but this is response time on V.
6.2.10

On Mon, Oct 3, 2016 at 9:00 PM, Erick Erickson 
wrote:

> firstSearcher and newSeacher are definitely per core, they have to be
> since they
> are intended to warm searchers and searchers are per core.
>
> I don't particularly see the benefit of firing them both either. Not
> sure which one makes
> the most sense though.
>
> Best,
> Erick
>
> On Mon, Oct 3, 2016 at 7:10 PM, Jihwan Kim  wrote:
> > I am using external file fields with larger external files and I noticed
> > Solr Core Reload loads external files twice: firstSearcher and
> nextSearcher
> > event.
> >
> > Does it mean the Core Reload triggers both events? What is the
> > benefit/reason of triggering both events at the same time?   I see this
> on
> > V. 4.10.4 and V. 6.2.0 both.
> >
> > Thanks,
>

Re: How to implement a custom boost function

2016-10-03 Thread Walter Underwood

How about sorting them after you get them back from Solr?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 3, 2016, at 6:45 PM, Lucas Cotta  wrote:
> 
> I actually could also use a custom similarity class that always returns 1.0
> then I could use small boost factors such as ^1, ^2, ^3, etc.
> 
> But I want to do this only in some specific queries (that may contain other
> fields besides studentId)
> 
> How could I do this, use the custom similarity class only for some queries?
> Is it possible?
> 
> Thanks!
> 
> 2016-10-03 19:49 GMT-03:00 Lucas Cotta :
> 
>> Hello,
>> 
>> I'm new in Solr (4.7.2) and I was given the following requirement:
>> 
>> Given a query such as:
>> 
>> studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR
>> 107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR
>> 875166 OR 875151 OR 918829 OR 918808)
>> 
>> I want the results to be ordered by the same order the elements were
>> informed in the query. This would be similar to MySQL's ORDER BY
>> FIELD(id, 3,2,5,7,8,1).
>> 
>> I have tried to use term boosting
>> 
>> in the query but that only works when I use big factors like this: 
>> 875078^10
>> OR 875166^1 OR 875151^1000 OR 918829^100OR 918808^10
>> 
>> But that would cause the query to be too big in case I have 200 ids for
>> instance.
>> 
>> So it seems I need to implement a custom FunctionQuery.
>> I'm a little lost on how to do that. Could someone please give me an idea?
>> Which classes should my custom class extend from? Where should I place this
>> class? Should I add to Solr project it self and regenerate the JAR?
>> 
>> Thanks
>>

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood

I did not believe the benchmark results the first time, but it seems to hold up.
Nobody gets a speedup of over a thousand (unless you are going from that
Oracle search thing to Solr).

It probably won’t help for most people. We have one service with very, very long
queries, up to 1000 words of free text. We also do as-you-type instant results,
so we have been using edge ngrams. Not using edge ngrams made the huge
speedup.

Query results cache hit rate almost doubled, which is part of the non-linear 
speedup.

We already trim the number of terms passed to Solr to a reasonable amount.
Google cuts off at 32; we use a few more.

We’re running a relevance A/B test for dropping the ngrams. If that doesn’t 
pass,
we’ll try something else, like only ngramming the first few words. Or something.

I wanted to use MLT to extract the best terms out of the long queries. 
Unfortunately,
you can’t highlight and MLT (MLT was never moved to the new component system)
and the MLT handler was really slow. Dang.

I still might do an outboard MLT with a snapshot of high-idf terms.

The queries are for homework help. I’ve only found one other search that had to
deal with this. I was talking with someone who worked on Encarta, and they had
the same challenge.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 3, 2016, at 8:06 PM, Erick Erickson  wrote:
> 
> Walter:
> 
> What did you change? I might like to put that in my bag of tricks ;)
> 
> Erick
> 
> On Mon, Oct 3, 2016 at 6:30 PM, Walter Underwood  
> wrote:
>> That approach doesn’t work very well for estimates.
>> 
>> Some parts of the index size and speed scale with the vocabulary instead of 
>> the number of documents.
>> Vocabulary usually grows at about the square root of the total amount of 
>> text in the index. OCR’ed text
>> breaks that estimate badly, with huge vocabularies.
>> 
>> Also, it is common to find non-linear jumps in performance. I’m benchmarking 
>> a change in a 12 million
>> document index. It improves the 95th percentile response time for one style 
>> of query from 3.8 seconds
>> to 2 milliseconds. I’m testing with a log of 200k queries from a production 
>> host, so I’m pretty sure that
>> is accurate.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Oct 3, 2016, at 6:02 PM, Susheel Kumar  wrote:
>>> 
>>> In short, if you want your estimate to be closer then run some actual
>>> ingestion for say 1-5% of your total docs and extrapolate since every
>>> search product may have different schema,different set of fields, different
>>> index vs. stored fields,  copy fields, different analysis chain etc.
>>> 
>>> If you want to just have a very quick rough estimate, create few flat json
>>> sample files (below) with field names and key values(actual data for better
>>> estimate). Put all the fields names which you are going to index/put into
>>> Solr and check the json file size. This will give you average size of a doc
>>> and then multiply with # docs to get a rough index size.
>>> 
>>> {
>>> "id":"product12345"
>>> "name":"productA",
>>> "category":"xyz",
>>> ...
>>> ...
>>> }
>>> 
>>> Thanks,
>>> Susheel
>>> 
>>> On Mon, Oct 3, 2016 at 3:19 PM, Allison, Timothy B. 
>>> wrote:
>>> 
 This doesn't answer your question, but Erick Erickson's blog on this topic
 is invaluable:

 https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
 the-abstract-why-we-dont-have-a-definitive-answer/

 -Original Message-
 From: Vasu Y [mailto:vya...@gmail.com]
 Sent: Monday, October 3, 2016 2:09 PM
 To: solr-user@lucene.apache.org
 Subject: SOLR Sizing

 Hi,
 I am trying to estimate disk space requirements for the documents indexed
 to SOLR.
 I went through the LucidWorks blog (
 https://lucidworks.com/blog/2011/09/14/estimating-memory-
 and-storage-for-lucenesolr/)
 and using this as the template. I have a question regarding estimating
 "Avg. Document Size (KB)".

 When calculating Disk Storage requirements, can we use the Java Types
 sizing (
 https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
 & come up average document size?

 Please let know if the following assumptions are correct.

 Data Type   Size
 --  --
 long   8 bytes
 tint   4 bytes
 tdate 8 bytes (Stored as long?)
 string 1 byte per char for ASCII chars and 2 bytes per char for
 Non-ASCII chars (Double byte chars)
 text   1 byte per char for ASCII chars and 2 bytes per char for
 Non-ASCII (Double byte chars) (For both with & without norm?)
 ICUCollationField 2 bytes per char for Non-ASCII (Double byte chars)
 boolean 1 bit?

 Thanks,
 Vasu

>>

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood

Dropping ngrams also makes the index 5X smaller on disk.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 3, 2016, at 9:02 PM, Walter Underwood  wrote:
> 
> I did not believe the benchmark results the first time, but it seems to hold 
> up.
> Nobody gets a speedup of over a thousand (unless you are going from that
> Oracle search thing to Solr).
> 
> It probably won’t help for most people. We have one service with very, very 
> long
> queries, up to 1000 words of free text. We also do as-you-type instant 
> results,
> so we have been using edge ngrams. Not using edge ngrams made the huge
> speedup.
> 
> Query results cache hit rate almost doubled, which is part of the non-linear 
> speedup.
> 
> We already trim the number of terms passed to Solr to a reasonable amount.
> Google cuts off at 32; we use a few more.
> 
> We’re running a relevance A/B test for dropping the ngrams. If that doesn’t 
> pass,
> we’ll try something else, like only ngramming the first few words. Or 
> something.
> 
> I wanted to use MLT to extract the best terms out of the long queries. 
> Unfortunately,
> you can’t highlight and MLT (MLT was never moved to the new component system)
> and the MLT handler was really slow. Dang.
> 
> I still might do an outboard MLT with a snapshot of high-idf terms.
> 
> The queries are for homework help. I’ve only found one other search that had 
> to
> deal with this. I was talking with someone who worked on Encarta, and they had
> the same challenge.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Oct 3, 2016, at 8:06 PM, Erick Erickson  wrote:
>> 
>> Walter:
>> 
>> What did you change? I might like to put that in my bag of tricks ;)
>> 
>> Erick
>> 
>> On Mon, Oct 3, 2016 at 6:30 PM, Walter Underwood  
>> wrote:
>>> That approach doesn’t work very well for estimates.
>>> 
>>> Some parts of the index size and speed scale with the vocabulary instead of 
>>> the number of documents.
>>> Vocabulary usually grows at about the square root of the total amount of 
>>> text in the index. OCR’ed text
>>> breaks that estimate badly, with huge vocabularies.
>>> 
>>> Also, it is common to find non-linear jumps in performance. I’m 
>>> benchmarking a change in a 12 million
>>> document index. It improves the 95th percentile response time for one style 
>>> of query from 3.8 seconds
>>> to 2 milliseconds. I’m testing with a log of 200k queries from a production 
>>> host, so I’m pretty sure that
>>> is accurate.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
 On Oct 3, 2016, at 6:02 PM, Susheel Kumar  wrote:
 
 In short, if you want your estimate to be closer then run some actual
 ingestion for say 1-5% of your total docs and extrapolate since every
 search product may have different schema,different set of fields, different
 index vs. stored fields,  copy fields, different analysis chain etc.
 
 If you want to just have a very quick rough estimate, create few flat json
 sample files (below) with field names and key values(actual data for better
 estimate). Put all the fields names which you are going to index/put into
 Solr and check the json file size. This will give you average size of a doc
 and then multiply with # docs to get a rough index size.
 
 {
 "id":"product12345"
 "name":"productA",
 "category":"xyz",
 ...
 ...
 }
 
 Thanks,
 Susheel
 
 On Mon, Oct 3, 2016 at 3:19 PM, Allison, Timothy B. 
 wrote:
 
> This doesn't answer your question, but Erick Erickson's blog on this topic
> is invaluable:
> 
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
> 
> -Original Message-
> From: Vasu Y [mailto:vya...@gmail.com]
> Sent: Monday, October 3, 2016 2:09 PM
> To: solr-user@lucene.apache.org
> Subject: SOLR Sizing
> 
> Hi,
> I am trying to estimate disk space requirements for the documents indexed
> to SOLR.
> I went through the LucidWorks blog (
> https://lucidworks.com/blog/2011/09/14/estimating-memory-
> and-storage-for-lucenesolr/)
> and using this as the template. I have a question regarding estimating
> "Avg. Document Size (KB)".
> 
> When calculating Disk Storage requirements, can we use the Java Types
> sizing (
> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
> & come up average document size?
> 
> Please let know if the following assumptions are correct.
> 
> Data Type   Size
> --  --
> long   8 bytes
> tint   4 bytes
> tdate 8 bytes (Stored as long?)
> string 1 byte per char for ASCII chars and 2 bytes per char for
> Non-AS

Re: How to implement a custom boost function

2016-10-03 Thread Lucas Cotta

Hi Walter, unfortunately I use pagination so that would not be possible..

Thanks

2016-10-04 0:51 GMT-03:00 Walter Underwood :

> How about sorting them after you get them back from Solr?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 3, 2016, at 6:45 PM, Lucas Cotta  wrote:
> >
> > I actually could also use a custom similarity class that always returns
> 1.0
> > then I could use small boost factors such as ^1, ^2, ^3, etc.
> >
> > But I want to do this only in some specific queries (that may contain
> other
> > fields besides studentId)
> >
> > How could I do this, use the custom similarity class only for some
> queries?
> > Is it possible?
> >
> > Thanks!
> >
> > 2016-10-03 19:49 GMT-03:00 Lucas Cotta :
> >
> >> Hello,
> >>
> >> I'm new in Solr (4.7.2) and I was given the following requirement:
> >>
> >> Given a query such as:
> >>
> >> studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR
> >> 107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR
> >> 875166 OR 875151 OR 918829 OR 918808)
> >>
> >> I want the results to be ordered by the same order the elements were
> >> informed in the query. This would be similar to MySQL's ORDER BY
> >> FIELD(id, 3,2,5,7,8,1).
> >>
> >> I have tried to use term boosting
> >>  Boosting_Ranking_Terms>
> >> in the query but that only works when I use big factors like this:
> 875078^10
> >> OR 875166^1 OR 875151^1000 OR 918829^100OR 918808^10
> >>
> >> But that would cause the query to be too big in case I have 200 ids for
> >> instance.
> >>
> >> So it seems I need to implement a custom FunctionQuery.
> >> I'm a little lost on how to do that. Could someone please give me an
> idea?
> >> Which classes should my custom class extend from? Where should I place
> this
> >> class? Should I add to Solr project it self and regenerate the JAR?
> >>
> >> Thanks
> >>
>
>

Faceting on both Parent and Child records in Block Join Query Parser

2016-10-03 Thread Zheng Lin Edwin Yeo

Hi,

Is it possible to do nested faceting on both records in parent and child in
a single query?

For example, I want to facet both author_s and book_s. Author is indexed as
a parent, whereas Book is indexed as a child.

I tried the following JSON Facet query, which is to do a facet of all the
list of author (in the parent), followed by a facet of all the list of
books (in the child) that are written by the author.

http://localhost:8983/solr/collection1/select?q=*:*
&json.facet={
   items:{
  type:terms,
  field:author_s,
 facet:{
by1:{
type:terms,
field:book_s
}
}
 }
   }
}&fl=null&rows=0


However, it only managed to return me the facet of the list of author. I
could not get any results for the list of books. Is this possible to be
done, or what could be wrong with my query?


Regards,
Edwin

40 matches

Mail list logo