from:"Pratik"

Re: solr sorting problem

2011-04-22 Thread Pratik

Were you able to get it work .. if yes how ? 
I'm having almost the same problem. 

I used the " fieldType name="alphaOnlySort" class="solr.TextField" as in
the sample schema.xml , to define a field named "alphaname". 
Then copied from one of the fields name "foodDescUS" to "alphaname". 
When i try to sort using alphaname ... i get this error :- 
The field :foodDesc present in DataConfig does not have a counterpart in
Solr Schema 

Please help 

Thanks 
Pratik 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2851229.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr sorting problem

2011-05-01 Thread Pratik

Hello, 

I got over that problem but now i am facing a new problem. 

Indexing works but search does not. 

I used the following line in the schema:- 
 
and 
 

I'm trying to use the default "alphaOnlySort" in the sample schema.xml. 
Database is MySQL, there is a column/field named ColXYZ 
My data-config looks like :- 
 
 

In which scenarios would SOLR index the records/documents but the search
won't work 

Thanks --
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2886248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr sorting problem

2011-05-02 Thread Pratik

Hi, 
Thanks for your reply. 

I'm, using "commit=true" while indexing, and it does index the records and
show the number of records indexed. 
The problem is that search yields 0 records ( numFound="0" ). 

e.g. 
00onappl 

There are some entries for spell checking in my schema too. 
e.g. 

 

The Search URL is something like:- 
http://localhost:8983/solr/select/?q=apple&indent=on 
http://localhost:8983/solr/select/?q=apple&version=2.2&start=0&rows=10&indent=on

Cache could not be a problem as it did not fetch any records from the very
begining. 

So, basically it does not fetch any documents/records whereas it does index
them.

Thanks
Pratik 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr sorting problem

2011-05-02 Thread Pratik

Hi,
Were you able to sort the results using alphaOnlySort ?
If yes what changes were made to the schema and data-config  ? 
Thanks 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889473.html
Sent from the Solr - User mailing list archive at Nabble.com.

Should I Use Solr

2015-03-12 Thread Pratik Thaker

Hi,

I am using Oracle 11g2 and we are having a schema where few tables are having 
more than 100 million rows (some of them are Varchar2 100 bytes). And we have 
to frequently do the LIKE based search on those tables. Sometimes we need to 
join the tables also. Insert / Updates are also happening very frequently for 
such tables (1000 insert / updates per second) by other applications.

So my question is, for my User Interface, should I use Apache Solr to let user 
search on these tables instead of SQL queries? I have tried SQL and it is 
really slow (considering amount of data I am having in my database).

My requirements are,

Result should come faster and it should be accurate.
It should have the latest data.
Can you suggest if I should go with Apache Solr, or another solution for my 
problem ?

Regards,
Pratik Thaker

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Pratik Patel

Hi,

I have a streaming expression which uses rollup function. My understanding
is that rollup takes an incoming stream and aggregates over given buckets.
However, with following query the result contains duplicate tuples.

Following is the streaming expression.

rollup(
fetch(
collection1,
gatherNodes(
collection1,
gatherNodes(collection1,
walk="54227b412a1c4e574f88f2bb->eventParticipantID",
gather="eventID"
),
walk="eventID->conceptid",
gather="conceptid",
trackTraversal="true", scatter="branches,leaves"
),
fl="schematype",
on="node=conceptid"
),
over="schematype",
count(schematype)
)

The result returned is as follows.

{
  "result-set": {
"docs": [
  {
"count(schematype)": 1,
"schematype": "Company"
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"
  },
  {
"count(schematype)": 1,
"schematype": "Customer"
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"  // duplicate
  },
  {
"count(schematype)": 1,
"schematype": "Employment"  // duplicate
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"
  },
  {
"count(schematype)": 4,
"schematype": "Employment"
  },..
 ]
 }

As you can see, there are more than one tuples for 'Founding
Event'/'Employment'

Am I missing something here?

Following is the content of stream which is wrapped by rollup, if it helps.

// stream on which rollup is working
{
  "result-set": {
"docs": [
  {
"node": "54227b412a1c4e574f88f2bb",
"schematype": "Company",
"collection": "collection1",
"field": "node",
"level": 0
  },
  {
"node": "543004f0c92c0a651166aea5",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae99",
"schematype": "Customer",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166aea1",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae78",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "54ee6178b54c1d65412b5f9f",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae7c",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae80",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae8a",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae94",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae9d",
"schematype": "Customer",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"EOF": true,
"RESPONSE_TIME": 38
  }
]
  }
}

If I rollup on the level field then the results are as expected but not
when the field is schematype. Any idea what's going on here?


Thanks,

Pratik

Re: Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Pratik Patel

Yes, that was the missing piece. Thanks a lot!

On Thu, Jun 22, 2017 at 5:20 PM, Joel Bernstein  wrote:

> Here is the psuedo code:
>
> rollup(sort(fetch(gatherNodes(
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 22, 2017 at 5:19 PM, Joel Bernstein 
> wrote:
>
> > You'll need to use the sort expression to sort the nodes by schemaType
> > first. The rollup expression is doing a MapReduce rollup that requires
> the
> > the records to be sorted by the "over" fields.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel 
> wrote:
> >
> >> Hi,
> >>
> >> I have a streaming expression which uses rollup function. My
> understanding
> >> is that rollup takes an incoming stream and aggregates over given
> buckets.
> >> However, with following query the result contains duplicate tuples.
> >>
> >> Following is the streaming expression.
> >>
> >> rollup(
> >> fetch(
> >> collection1,
> >> gatherNodes(
> >> collection1,
> >> gatherNodes(collection1,
> >> walk="54227b412a1c4e574f88f2bb
> >> ->eventParticipantID",
> >> gather="eventID"
> >> ),
> >> walk="eventID->conceptid",
> >> gather="conceptid",
> >> trackTraversal="true", scatter="branches,leaves"
> >> ),
> >> fl="schematype",
> >> on="node=conceptid"
> >> ),
> >> over="schematype",
> >> count(schematype)
> >> )
> >>
> >> The result returned is as follows.
> >>
> >> {
> >>   "result-set": {
> >> "docs": [
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Company"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Customer"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"  // duplicate
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Employment"  // duplicate
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"
> >>   },
> >>   {
> >> "count(schematype)": 4,
> >> "schematype": "Employment"
> >>   },..
> >>  ]
> >>  }
> >>
> >> As you can see, there are more than one tuples for 'Founding
> >> Event'/'Employment'
> >>
> >> Am I missing something here?
> >>
> >> Following is the content of stream which is wrapped by rollup, if it
> >> helps.
> >>
> >> // stream on which rollup is working
> >> {
> >>   "result-set": {
> >> "docs": [
> >>   {
> >> "node": "54227b412a1c4e574f88f2bb",
> >> "schematype": "Company",
> >> "collection": "collection1",
> >> "field": "node",
> >> "level": 0
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166aea5",
> >> "schematype": "Founding Event",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae99",
> >> "schematype": "Customer",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1

Limit for facet function of Streaming Expressions in solr cloud

2017-06-29 Thread Pratik Patel

Hey Everyone,

This is about the facet function of Streaming Expression. Is there any way
to set limit for number of facets to infinite?  The *bucketSizeLimit
parameter *seems to accept only those numbers which are greater than 0.

Thanks,
Pratik

Re: Limit for facet function of Streaming Expressions in solr cloud

2017-06-29 Thread Pratik Patel

Thanks Joel. For my use case I can switch to rollup for now which can work
with "/export" query type.

On Thu, Jun 29, 2017 at 10:11 AM, Joel Bernstein  wrote:

> Yes, I see this is hardcoded into the parameter checks. We can create a
> ticket to allow unlimited.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 29, 2017 at 10:06 AM, Pratik Patel 
> wrote:
>
> > Hey Everyone,
> >
> > This is about the facet function of Streaming Expression. Is there any
> way
> > to set limit for number of facets to infinite?  The *bucketSizeLimit
> > parameter *seems to accept only those numbers which are greater than 0.
> >
> > Thanks,
> > Pratik
> >
>

Streaming expressions and Jetty Host

2017-07-09 Thread Pratik Patel

Hi Everyone,

We are running solr 6.4.1 in cloud mode on CentOS production server.
Currently, we are using the embedded zookeeper. It is a simple set up with
one collection and one shard.

By default, Jetty server binds to all interfaces which is not safe so we
have changed the bin/solr script. We have added "-Djetty.host=127.0.0.1" in
SOLR_START_OPTS so that it looks like as follows.

 SOLR_START_OPTS=('-server' "${JAVA_MEM_OPTS[@]}" "${GC_TUNE[@]}"
"${GC_LOG_OPTS[@]}" \
"${REMOTE_JMX_OPTS[@]}" "${CLOUD_MODE_OPTS[@]}"
$SOLR_LOG_LEVEL_OPT -Dsolr.log.dir="$SOLR_LOGS_DIR" \
"-Djetty.host=127.0.0.1" "-Djetty.port=$SOLR_PORT"
"-DSTOP.PORT=$stop_port" "-DSTOP.KEY=$STOP_KEY" \
"${SOLR_HOST_ARG[@]}" "-Duser.timezone=$SOLR_TIMEZONE" \
"-Djetty.home=$SOLR_SERVER_DIR" "-Dsolr.solr.home=$SOLR_HOME"
"-Dsolr.install.dir=$SOLR_TIP" \
"${LOG4J_CONFIG[@]}" "${SOLR_OPTS[@]}")


We just found that with this change everything works fine in cloud mode
except the streaming expressions. With streaming expressions, we get
following response.

org.apache.solr.client.solrj.SolrServerException: Server refused connection
> at: http://:8081/solr/collection1_shard1_replica1


We don't get this error if we let jetty server bind to all interfaces. Any
idea about what's the problem here?

Thanks,
Pratik

Solr not preserving milliseconds precision for zero milliseconds

2017-10-05 Thread Pratik Patel

Hello Everyone,

Say I have a document like one below.


> {
> "id":"test",
> "startTime":"2013-02-10T18:36:07.000Z"
> }


I add this document to solr index using the admin UI and "update" request
handler. It gets added successfully but when I retrieve this document back
using "id" I get following.


 {
> "id":"test",
> "startTime":"2013-02-10T18:36:07Z",
> "_version_":1580456021738913792}]
>   }


As you can see, the milliseconds precision in date field "startTime" is
lost. Precision is preserved for non-zero milliseconds but it's being lost
for zero values. The field type of "startTime" field is as follows.

 docValues="true" precisionStep="0"/>


Does anyone know how I can preserve milliseconds even if its zero? Or is it
not possible at all?

Thanks,
Pratik

Re: Solr not preserving milliseconds precision for zero milliseconds

2017-10-06 Thread Pratik Patel

Thanks for the clarification. I'll change my code to accommodate this
behavior.

On Thu, Oct 5, 2017 at 6:24 PM, Chris Hostetter 
wrote:

> : > "startTime":"2013-02-10T18:36:07.000Z"
> ...
> : handler. It gets added successfully but when I retrieve this document
> back
> : using "id" I get following.
> ...
> : > "startTime":"2013-02-10T18:36:07Z",
> ...
> : As you can see, the milliseconds precision in date field "startTime" is
> : lost. Precision is preserved for non-zero milliseconds but it's being
> lost
> : for zero values. The field type of "startTime" field is as follows.
> ...
> : Does anyone know how I can preserve milliseconds even if its zero? Or is
> it
> : not possible at all?
>
> ms precision is being preserved -- but as you mentioned, the fractional
> seconds you indexed are "0" therefore they are not needed/preserved when
> writing the response to maintain ms precision.
>
> This is the correct formatting as specified in the specification for the
> time format that Solr follows...
>
> https://lucene.apache.org/solr/guide/working-with-dates.html
> https://www.w3.org/TR/xmlschema-2/#dateTime
>
> >>> 3.2.7.2 Canonical representation
> >>> ...
> >>> The fractional second string, if present, must not end in '0';
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Graph Traversal

2017-10-29 Thread Pratik Patel

For now, you can probably use Cartesian function of Streaming Expressions
which Joel implemented to solve the same problem.

https://issues.apache.org/jira/browse/SOLR-10292
http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-coming-in-solr-66.html

Regards,
Pratik

On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein  wrote:

> I don't see a jira ticket for this yet. Feel free to create it and reply
> back with the link.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Oct 27, 2017 at 9:55 AM, Kojo  wrote:
>
> > Hi, I was looking for information on Graph Traversal. More specifically,
> > support to search graph on multivalued field.
> >
> > Searching on the Internet, I found a question exactly the same of mine,
> > with an answer that what I need is not implemented yet:
> > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> >
> >
> > Is there a ticket on Jira to follow the implementation of search graph on
> > multivalued field?
> >
> > Thank you,
> >
>

Re: Graph Traversal

2017-10-30 Thread Pratik Patel

By including Cartesian function in Streaming Expression pipeline, you can
convert a tuple having one multivalued field into multiple tuples where
each tuple holds one value for the field which was originally multivalued.

For example, if you have following document.

{ id: someID, fruits: [apple, organge, banana] }   // fruits is multivalued
> field


Applying Cartesian function would give following tuples.

{ id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
> someID, fruits: banana }


Now that fruits holds single values, you can also use any Streaming
Expression functions which don't work with multivalued fields. This happens
in the Streaming Expression pipeline so you don't have to flatten your
documents in index.

On Mon, Oct 30, 2017 at 8:39 AM, Kojo  wrote:

> Hi,
> just a question, I have no deep background on Solr, Graph etc.
> This solution looks like normalizing data like a m2m table in sql database,
> is it?
>
>
>
> 2017-10-29 21:51 GMT-02:00 Pratik Patel :
>
> > For now, you can probably use Cartesian function of Streaming Expressions
> > which Joel implemented to solve the same problem.
> >
> > https://issues.apache.org/jira/browse/SOLR-10292
> > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> > coming-in-solr-66.html
> >
> > Regards,
> > Pratik
> >
> > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein 
> > wrote:
> >
> > > I don't see a jira ticket for this yet. Feel free to create it and
> reply
> > > back with the link.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo  wrote:
> > >
> > > > Hi, I was looking for information on Graph Traversal. More
> > specifically,
> > > > support to search graph on multivalued field.
> > > >
> > > > Searching on the Internet, I found a question exactly the same of
> mine,
> > > > with an answer that what I need is not implemented yet:
> > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > > >
> > > >
> > > > Is there a ticket on Jira to follow the implementation of search
> graph
> > on
> > > > multivalued field?
> > > >
> > > > Thank you,
> > > >
> > >
> >
>

Re: Graph Traversal

2017-10-30 Thread Pratik Patel

You use this in query time. Since Streaming Expressions can be pipelined,
the next stage/function of pipeline will work on the new tuples generated.

On Mon, Oct 30, 2017 at 10:09 AM, Kojo  wrote:

> Do you store this new tuples, created by Streaming Expressions, in a new
> Solr cloud collection? Or just use this tuples in query time?
>
> 2017-10-30 11:00 GMT-02:00 Pratik Patel :
>
> > By including Cartesian function in Streaming Expression pipeline, you can
> > convert a tuple having one multivalued field into multiple tuples where
> > each tuple holds one value for the field which was originally
> multivalued.
> >
> > For example, if you have following document.
> >
> > { id: someID, fruits: [apple, organge, banana] }   // fruits is
> multivalued
> > > field
> >
> >
> > Applying Cartesian function would give following tuples.
> >
> > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
> > > someID, fruits: banana }
> >
> >
> > Now that fruits holds single values, you can also use any Streaming
> > Expression functions which don't work with multivalued fields. This
> happens
> > in the Streaming Expression pipeline so you don't have to flatten your
> > documents in index.
> >
> > On Mon, Oct 30, 2017 at 8:39 AM, Kojo  wrote:
> >
> > > Hi,
> > > just a question, I have no deep background on Solr, Graph etc.
> > > This solution looks like normalizing data like a m2m table in sql
> > database,
> > > is it?
> > >
> > >
> > >
> > > 2017-10-29 21:51 GMT-02:00 Pratik Patel :
> > >
> > > > For now, you can probably use Cartesian function of Streaming
> > Expressions
> > > > which Joel implemented to solve the same problem.
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-10292
> > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> > > > coming-in-solr-66.html
> > > >
> > > > Regards,
> > > > Pratik
> > > >
> > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein 
> > > > wrote:
> > > >
> > > > > I don't see a jira ticket for this yet. Feel free to create it and
> > > reply
> > > > > back with the link.
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo  wrote:
> > > > >
> > > > > > Hi, I was looking for information on Graph Traversal. More
> > > > specifically,
> > > > > > support to search graph on multivalued field.
> > > > > >
> > > > > > Searching on the Internet, I found a question exactly the same of
> > > mine,
> > > > > > with an answer that what I need is not implemented yet:
> > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > > > > >
> > > > > >
> > > > > > Is there a ticket on Jira to follow the implementation of search
> > > graph
> > > > on
> > > > > > multivalued field?
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Pratik Patel

Roll up needs documents to be sorted by the "over" field.
Check this for more details
http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-returning-results-with-duplicate-tuples-td4342398.html

On Wed, Nov 1, 2017 at 3:41 PM, Kojo  wrote:

> Wrap cartesianProduct function with fetch function works as expected.
>
> But rollup function over cartesianProduct doesn´t aggregate on a returned
> field of the cartesianProduct.
>
>
> The field "id_researcher" bellow is a Multivalued field:
>
>
>
> This one works:
>
>
> fetch(reasercher,
>
> cartesianProduct(
> having(
> cartesianProduct(
> search(schoolarship,zkHost="localhost:9983",qt="/export",
> q="*:*",
> fl="process, area, id_reasercher",sort="process asc"),
> area
> ),
> eq(area, val(Anything))),
> id_reasercher),
> fl="name, django_id",
> on="id_reasercher=django_id"
> )
>
>
> This one doesn´t works:
>
> rollup(
>
> cartesianProduct(
> having(
> cartesianProduct(
> search(schoolarship,zkHost="localhost:9983",qt="/export",
> q="*:*",
> fl="process, area, id_researcher, status",sort="process asc"),
> area
> ),
> eq(area, val(Anything))),
> id_researcher),
> over=id_researcher,count(*)
> )
>
> If I aggregate over a non MultiValued field, it works.
>
>
> Is that correct, rollup doesn´t work on a cartesianProduct?
>

DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-02-07 Thread Pratik Thaker

Hi All,

I am using SOLR Cloud 6.0

I am receiving below exception very frequently in solr logs,

o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
RunUpdateProcessor has received an AddUpdateCommand containing a document that 
appears to still contain Atomic document update operations, most likely because 
DistributedUpdateProcessorFactory was explicitly disabled from this 
updateRequestProcessorChain
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

Can you please help me with the root cause ? Below is the snapshot of 
solrconfig,






   


  [^\w-\.]
  _





  
-MM-dd'T'HH:mm:ss.SSSZ
-MM-dd'T'HH:mm:ss,SSSZ
-MM-dd'T'HH:mm:ss.SSS
-MM-dd'T'HH:mm:ss,SSS
-MM-dd'T'HH:mm:ssZ
-MM-dd'T'HH:mm:ss
-MM-dd'T'HH:mmZ
-MM-dd'T'HH:mm
-MM-dd HH:mm:ss.SSSZ
-MM-dd HH:mm:ss,SSSZ
-MM-dd HH:mm:ss.SSS
-MM-dd HH:mm:ss,SSS
-MM-dd HH:mm:ssZ
-MM-dd HH:mm:ss
-MM-dd HH:mmZ
-MM-dd HH:mm
-MM-dd
  


  strings
  
java.lang.Boolean
booleans
  
  
java.util.Date
tdates
  
  
java.lang.Long
java.lang.Integer
tlongs
  
  
java.lang.Number
tdoubles
  


  

Regards,
Pratik Thaker


The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-02-09 Thread Pratik Thaker

Hi Friends,

Can you please try to give me some details about below issue ?

Regards,
Pratik Thaker

From: Pratik Thaker
Sent: 07 February 2017 17:12
To: 'solr-user@lucene.apache.org'
Subject: DistributedUpdateProcessorFactory was explicitly disabled from this 
updateRequestProcessorChain

Hi All,

I am using SOLR Cloud 6.0

I am receiving below exception very frequently in solr logs,

o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
RunUpdateProcessor has received an AddUpdateCommand containing a document that 
appears to still contain Atomic document update operations, most likely because 
DistributedUpdateProcessorFactory was explicitly disabled from this 
updateRequestProcessorChain
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

Can you please help me with the root cause ? Below is the snapshot of 
solrconfig,






   


  [^\w-\.]
  _





  
-MM-dd'T'HH:mm:ss.SSSZ
-MM-dd'T'HH:mm:ss,SSSZ
-MM-dd'T'HH:mm:ss.SSS
-MM-dd'T'HH:mm:ss,SSS
-MM-dd'T'HH:mm:ssZ
-MM-dd'T'HH:mm:ss
-MM-dd'T'HH:mmZ
-MM-dd'T'HH:mm
-MM-dd HH:mm:ss.SSSZ
-MM-dd HH:mm:ss,SSSZ
-MM-dd HH:mm:ss.SSS
-MM-dd HH:mm:ss,SSS
-MM-dd HH:mm:ssZ
-MM-dd HH:mm:ss
-MM-dd HH:mmZ
-MM-dd HH:mm
-MM-dd
  


  strings
  
java.lang.Boolean
booleans
  
  
java.util.Date
tdates
  
  
java.lang.Long
java.lang.Integer
tlongs
  
  
java.lang.Number
tdoubles
  


  

Regards,
Pratik Thaker


The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel

Here is the same question in stackOverflow for better format.

http://stackoverflow.com/questions/42370231/solr-
dynamic-field-blowing-up-the-index-size

Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but
the problem is that index size with solr 6 is way too large. In solr 5,
index size was about 15GB and in solr 6, for the same data, the index size
is 300GB! I am not able to understand what contributes to such huge
difference in solr 6.

I have been able to identify a field which is blowing up the size of index.
It is as follows.






When this field is commented out, the index size reduces to less than 10GB.

This field is of type text_general. Following is the definition of this
type.


  







  
  







  
  

Few things which I did to debug this issue:

   - I have ensured that field type definition is same as what I was using
   in solr 5 and it is also valid in version 6. This field type considers a
   list of "stopwords" to be ignored during indexing. I have supplied the same
   list of stopwords which we were using in solr 5. I have verified that path
   of this file is correct and it is being loaded fine in solr admin UI. When
   I analyse these fields using "Analysis" tab of the solr admin UI, I can see
   that stopwords are being filtered out. However, when I query with some of
   these stopwords, I do get the results back which makes me think that
   probably stopwords are being indexed.

Any idea what could increase the size of index by so much in solr 6?

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel

Thanks for the reply. I can see that in solr 6, more than 50% of the index
directory is occupied by ".nvd" file extension. It is something related to
norms and doc values.

On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch 
wrote:

> Did you look in the data directories to check what index file extensions
> contribute most to the difference? That could give a hint.
>
> Regards,
> Alex
>
> On 21 Feb 2017 9:47 AM, "Pratik Patel"  wrote:
>
> > Here is the same question in stackOverflow for better format.
> >
> > http://stackoverflow.com/questions/42370231/solr-
> > dynamic-field-blowing-up-the-index-size
> >
> > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine
> but
> > the problem is that index size with solr 6 is way too large. In solr 5,
> > index size was about 15GB and in solr 6, for the same data, the index
> size
> > is 300GB! I am not able to understand what contributes to such huge
> > difference in solr 6.
> >
> > I have been able to identify a field which is blowing up the size of
> index.
> > It is as follows.
> >
> >  > stored="true" multiValued="true"  />
> >
> >  > stored="false" multiValued="true"  />
> > 
> >
> > When this field is commented out, the index size reduces to less than
> 10GB.
> >
> > This field is of type text_general. Following is the definition of this
> > type.
> >
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> > 
> >  > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >  > protected="protwords.txt" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> > 
> >  > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > />
> >   
> >   
> > 
> > 
> > 
> >  > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >  > protected="protwords.txt" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> > 
> >  > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > />
> >   
> >   
> >
> > Few things which I did to debug this issue:
> >
> >- I have ensured that field type definition is same as what I was
> using
> >in solr 5 and it is also valid in version 6. This field type
> considers a
> >list of "stopwords" to be ignored during indexing. I have supplied the
> > same
> >list of stopwords which we were using in solr 5. I have verified that
> > path
> >of this file is correct and it is being loaded fine in solr admin UI.
> > When
> >I analyse these fields using "Analysis" tab of the solr admin UI, I
> can
> > see
> >that stopwords are being filtered out. However, when I query with some
> > of
> >these stopwords, I do get the results back which makes me think that
> >probably stopwords are being indexed.
> >
> > Any idea what could increase the size of index by so much in solr 6?
> >
>

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel

I am using the schema from solr 5 which does not have any field with
docValues enabled.In fact to ensure that everything is same as solr 5
(except the breaking changes) I am using the solrconfig.xml also from solr
5 with schemaFactory set as classicSchemaFactory to use schema.xml from
solr 5.

On Tue, Feb 21, 2017 at 11:33 AM, Alexandre Rafalovitch 
wrote:

> Did you reuse the schema or rebuilt it on top of the latest examples?
> Because the latest example schema enabled docValues for strings on the
> fieldType level.
>
> I would do a diff of the schemas to see what changed. If they look
> very different and you are looking for tools to normalize/extract
> elements from schemas, you may find my latest Revolution presentation
> useful for that:
> https://www.slideshare.net/arafalov/rebuilding-solr-6-
> examples-layer-by-layer-lucenesolrrevolution-2016
> (e.g. slide 20). There is also the video there at the end.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 21 February 2017 at 11:18, Mike Thomsen  wrote:
> > Correct me if I'm wrong, but heavy use of doc values should actually blow
> > up the size of your index considerably if they are in fields that get
> sent
> > a lot of data.
> >
> > On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel 
> wrote:
> >
> >> Thanks for the reply. I can see that in solr 6, more than 50% of the
> index
> >> directory is occupied by ".nvd" file extension. It is something related
> to
> >> norms and doc values.
> >>
> >> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> wrote:
> >>
> >> > Did you look in the data directories to check what index file
> extensions
> >> > contribute most to the difference? That could give a hint.
> >> >
> >> > Regards,
> >> > Alex
> >> >
> >> > On 21 Feb 2017 9:47 AM, "Pratik Patel"  wrote:
> >> >
> >> > > Here is the same question in stackOverflow for better format.
> >> > >
> >> > > http://stackoverflow.com/questions/42370231/solr-
> >> > > dynamic-field-blowing-up-the-index-size
> >> > >
> >> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app
> fine
> >> > but
> >> > > the problem is that index size with solr 6 is way too large. In
> solr 5,
> >> > > index size was about 15GB and in solr 6, for the same data, the
> index
> >> > size
> >> > > is 300GB! I am not able to understand what contributes to such huge
> >> > > difference in solr 6.
> >> > >
> >> > > I have been able to identify a field which is blowing up the size of
> >> > index.
> >> > > It is as follows.
> >> > >
> >> > >  >> > > stored="true" multiValued="true"  />
> >> > >
> >> > >  >> > > stored="false" multiValued="true"  />
> >> > > 
> >> > >
> >> > > When this field is commented out, the index size reduces to less
> than
> >> > 10GB.
> >> > >
> >> > > This field is of type text_general. Following is the definition of
> this
> >> > > type.
> >> > >
> >> > >  >> > > positionIncrementGap="100">
> >> > >   
> >> > > 
> >> > > 
> >> > > 
> >> > >  >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >> > >  >> > > protected="protwords.txt" generateWordParts="1"
> >> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >> > > catenateAll="0" splitOnCaseChange="0"/>
> >> > > 
> >> > >  >> > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> >> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> >> > > />
> >> > >   
> >> > >   
> >> > > 
> >> > > 
> >> > > 
> >> > >  >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >> > >

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel

I think I have found something concrete. Reading up more on nvd file
extension, I found that it is being used to store length and boost factors
for documents and fields. These are normalization files. Normalization on a
field is controlled by omitNorms attribute. If omitNorms=true then the
field will not be normalized. I explicitly added omitNorms=true for the
field type text_general and re-indexed the data. Now, my index size is much
smaller. I haven't yet verified this with complete data set yet but I can
see that index size is reduced. We have a large data set and it takes about
5-6 hours to index it completely so I'll index the whole data set overnight
to confirm the fix.

But now I am curious about omitNorms attribute. What would be the default
value for omitNorms for field type "text_general". The documentation says
that omitNorms=true for primitive field types like string, int etc. but I
don't know what is the default value for "text_general"?

I never had omitNorms set explicitly on text_general field type or any of
the fields having type text_general. Has the default value of omitNorms
been changed from solr 5.0.0 to 6.4.1?

Any clarification on this would be really helpful.

I am posting some relevant links here for someone who might face similar
issue in future.

http://apprize.info/php/solr_4/2.html
http://stackoverflow.com/questions/18694242/what-is-omitnorms-and-version-field-in-solr-schema
https://lucidworks.com/2009/09/02/scaling-lucene-and-solr/#d0e71

Thanks,
Pratik

On Tue, Feb 21, 2017 at 12:03 PM, Pratik Patel  wrote:

> I am using the schema from solr 5 which does not have any field with
> docValues enabled.In fact to ensure that everything is same as solr 5
> (except the breaking changes) I am using the solrconfig.xml also from solr
> 5 with schemaFactory set as classicSchemaFactory to use schema.xml from
> solr 5.
>
>
> On Tue, Feb 21, 2017 at 11:33 AM, Alexandre Rafalovitch <
> arafa...@gmail.com> wrote:
>
>> Did you reuse the schema or rebuilt it on top of the latest examples?
>> Because the latest example schema enabled docValues for strings on the
>> fieldType level.
>>
>> I would do a diff of the schemas to see what changed. If they look
>> very different and you are looking for tools to normalize/extract
>> elements from schemas, you may find my latest Revolution presentation
>> useful for that:
>> https://www.slideshare.net/arafalov/rebuilding-solr-6-exampl
>> es-layer-by-layer-lucenesolrrevolution-2016
>> (e.g. slide 20). There is also the video there at the end.
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>
>>
>> On 21 February 2017 at 11:18, Mike Thomsen 
>> wrote:
>> > Correct me if I'm wrong, but heavy use of doc values should actually
>> blow
>> > up the size of your index considerably if they are in fields that get
>> sent
>> > a lot of data.
>> >
>> > On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel 
>> wrote:
>> >
>> >> Thanks for the reply. I can see that in solr 6, more than 50% of the
>> index
>> >> directory is occupied by ".nvd" file extension. It is something
>> related to
>> >> norms and doc values.
>> >>
>> >> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch <
>> >> arafa...@gmail.com>
>> >> wrote:
>> >>
>> >> > Did you look in the data directories to check what index file
>> extensions
>> >> > contribute most to the difference? That could give a hint.
>> >> >
>> >> > Regards,
>> >> > Alex
>> >> >
>> >> > On 21 Feb 2017 9:47 AM, "Pratik Patel"  wrote:
>> >> >
>> >> > > Here is the same question in stackOverflow for better format.
>> >> > >
>> >> > > http://stackoverflow.com/questions/42370231/solr-
>> >> > > dynamic-field-blowing-up-the-index-size
>> >> > >
>> >> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app
>> fine
>> >> > but
>> >> > > the problem is that index size with solr 6 is way too large. In
>> solr 5,
>> >> > > index size was about 15GB and in solr 6, for the same data, the
>> index
>> >> > size
>> >> > > is 300GB! I am not able to understand what contributes to such huge
>> >> > > difference in solr 6.
>> >> > >
>> >> > > I have been able to identify

How to figure out whether stopwords are being indexed or not

2017-02-21 Thread Pratik Patel

I have a field type in schema which has been applied stopwords list.
I have verified that path of stopwords file is correct and it is being
loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab
of the solr admin UI, I can see that stopwords are being filtered out.
However, when I query with some of these stopwords, I do get the results
back which makes me think that probably stopwords are being indexed.

For example, when I run following query, I do get back results. I have word
"and" in the stopwords list so I expect no results for this query.

http://localhost:8081/solr/collection1/select?fq=Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json

Does this mean that the "and" word is being indexed and stopwords are not
being used?

Following is the field type of field Description_note :

Re: How to figure out whether stopwords are being indexed or not

2017-02-22 Thread Pratik Patel

Hi Eric,

Thanks for the reply! Following is the relevant part of response header
with debugQuery on.

{
"responseHeader":{ "status":0, "QTime":282, "params":{ "q":"Description_note:*
and *", "indent":"on", "wt":"json", "debugQuery":"on", "_":"1487773835305"}},
"response":{"numFound":81771,"start":0,"docs":[ { "id":"", .
.
.
},..
]
}
}


On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson 
wrote:

> Attach &debug=query to your query and look at the parsed query that's
> returned.
> That'll tell you what was searched at least.
>
> You can also use the TermsComponent to examine terms in a field directly.
>
> Best,
> Erick
>
> On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel  wrote:
> > I have a field type in schema which has been applied stopwords list.
> > I have verified that path of stopwords file is correct and it is being
> > loaded fine in solr admin UI. When I analyse these fields using
> "Analysis" tab
> > of the solr admin UI, I can see that stopwords are being filtered out.
> > However, when I query with some of these stopwords, I do get the results
> > back which makes me think that probably stopwords are being indexed.
> >
> > For example, when I run following query, I do get back results. I have
> word
> > "and" in the stopwords list so I expect no results for this query.
> >
> > http://localhost:8081/solr/collection1/select?fq=
> Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json
> >
> > Does this mean that the "and" word is being indexed and stopwords are not
> > being used?
> >
> > Following is the field type of field Description_note :
> >
> >
> >  > positionIncrementGap="100" omitNorms="true">
> >   
> >   
> > 
> > 
> >  > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >  protected="protwords.txt"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> > 
> >  > words="stopwords.txt" />
> >   
> >   
> >   
> > 
> > 
> >  > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >  protected="protwords.txt"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> > 
> >  > words="stopwords.txt" />
> >   
> > 
>

Re: How to figure out whether stopwords are being indexed or not

2017-02-22 Thread Pratik Patel

Asterisks were not for formatting, I was trying to use a wildcard operator.
Here is another example query and "parsed_query toString" entry for that.

Query :
http://localhost:8081/solr/collection1/select?debugQuery=on&indent=on&q=Description_note:*their*&wt=json

"parsedquery_toString":"Description_note:*their*"

I have word "their" in my stopwords list so I am expecting zero results but
this query returns 20 documents with word "their"

Here is more of the debug object of response.


"debug":{
"rawquerystring":"Description_note:*their*",
"querystring":"Description_note:*their*",
"parsedquery":"Description_note:*their*",
"parsedquery_toString":"Description_note:*their*",
"explain":{
  "54227b012a1c4e574f88505556987be57ef1af28d01b6d94":"\n1.0 =
Description_note:*their*, product of:\n  1.0 = boost\n  1.0 =
queryNorm\n", 
  },
"QParser":"LuceneQParser",
"timing":{ ... }

}

Thanks,

Pratik






On Wed, Feb 22, 2017 at 11:25 AM, Erick Erickson 
wrote:

> That's not what I'm looking for. Way down near the end there should be
> an entry like
> "parsed_query toString"
>
> This line is pretty suspicious: 82, "params":{ "q":"Description_note:*
> and *"
>
> Are you really searching for asterisks (I'd originally interpreted
> that as bolding
> which sometimes happens). Please don't do formatting with asterisks in
> e-mails as it's very confusing.
>
> Best,
> Erick
>
>
> On Wed, Feb 22, 2017 at 8:12 AM, Pratik Patel  wrote:
> > Hi Eric,
> >
> > Thanks for the reply! Following is the relevant part of response header
> > with debugQuery on.
> >
> > {
> > "responseHeader":{ "status":0, "QTime":282, "params":{
> "q":"Description_note:*
> > and *", "indent":"on", "wt":"json", "debugQuery":"on",
> "_":"1487773835305"}},
> > "response":{"numFound":81771,"start":0,"docs":[ { "id":"", .
> > .
> > .
> > },..
> > ]
> > }
> > }
> >
> >
> > On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson  >
> > wrote:
> >
> >> Attach &debug=query to your query and look at the parsed query that's
> >> returned.
> >> That'll tell you what was searched at least.
> >>
> >> You can also use the TermsComponent to examine terms in a field
> directly.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel 
> wrote:
> >> > I have a field type in schema which has been applied stopwords list.
> >> > I have verified that path of stopwords file is correct and it is being
> >> > loaded fine in solr admin UI. When I analyse these fields using
> >> "Analysis" tab
> >> > of the solr admin UI, I can see that stopwords are being filtered out.
> >> > However, when I query with some of these stopwords, I do get the
> results
> >> > back which makes me think that probably stopwords are being indexed.
> >> >
> >> > For example, when I run following query, I do get back results. I have
> >> word
> >> > "and" in the stopwords list so I expect no results for this query.
> >> >
> >> > http://localhost:8081/solr/collection1/select?fq=
> >> Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json
> >> >
> >> > Does this mean that the "and" word is being indexed and stopwords are
> not
> >> > being used?
> >> >
> >> > Following is the field type of field Description_note :
> >> >
> >> >
> >> >  >> > positionIncrementGap="100" omitNorms="true">
> >> >   
> >> >   
> >> > 
> >> > 
> >> >  >> > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >> >  >> protected="protwords.txt"
> >> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> >> > 
> >> >  >> > words="stopwords.txt" />
> >> >   
> >> >   
> >> >   
> >> > 
> >> > 
> >> >  >> > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >> >  >> protected="protwords.txt"
> >> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> >> > 
> >> >  >> > words="stopwords.txt" />
> >> >   
> >> > 
> >>
>

Re: How to figure out whether stopwords are being indexed or not

2017-02-22 Thread Pratik Patel

That explains why I was getting back the results. Thanks! I was doing that
query only to test whether stopwords are being indexed or not but
apparently the query I had would not serve the purpose.  I should rather
have a document field with just the stop word and search against it without
using wildcard to test whether the stopword was indexed or not. Thanks
again.

Regards,
Pratik

On Wed, Feb 22, 2017 at 12:10 PM, Alexandre Rafalovitch 
wrote:

> StopFilterFactory (and WordDelimiterFilterFactory and maybe others)
> are NOT multiterm aware.
>
> Using wildcards triggers the edge-case third type of analyzer chain
> that is automatically constructed unless you specify it explicitly.
>
> You can see the full list of analyzers and whether they are multiterm
> aware at http://www.solr-start.com/info/analyzers/ (I mark them with
> "(multi)").
>
> Solution in your case is probably to go away from these
> performance-killing double-side wildcards and to switch to the NGrams
> instead. And you may want to look at ApostropheFilterFactory while you
> are at it (instead of regexp you have there).
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 22 February 2017 at 12:02, Pratik Patel  wrote:
> > Asterisks were not for formatting, I was trying to use a wildcard
> operator.
> > Here is another example query and "parsed_query toString" entry for that.
> >
> > Query :
> > http://localhost:8081/solr/collection1/select?debugQuery=
> on&indent=on&q=Description_note:*their*&wt=json
> >
> > "parsedquery_toString":"Description_note:*their*"
> >
> > I have word "their" in my stopwords list so I am expecting zero results
> but
> > this query returns 20 documents with word "their"
> >
> > Here is more of the debug object of response.
> >
> >
> > "debug":{
> > "rawquerystring":"Description_note:*their*",
> > "querystring":"Description_note:*their*",
> > "parsedquery":"Description_note:*their*",
> > "parsedquery_toString":"Description_note:*their*",
> > "explain":{
> >   "54227b012a1c4e574f88505556987be57ef1af28d01b6d94":"\n1.0 =
> > Description_note:*their*, product of:\n  1.0 = boost\n  1.0 =
> > queryNorm\n", 
> >   },
> > "QParser":"LuceneQParser",
> > "timing":{ ... }
> >
> > }
> >
> > Thanks,
> >
> > Pratik
> >
> >
> >
> >
> >
> >
> > On Wed, Feb 22, 2017 at 11:25 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> That's not what I'm looking for. Way down near the end there should be
> >> an entry like
> >> "parsed_query toString"
> >>
> >> This line is pretty suspicious: 82, "params":{ "q":"Description_note:*
> >> and *"
> >>
> >> Are you really searching for asterisks (I'd originally interpreted
> >> that as bolding
> >> which sometimes happens). Please don't do formatting with asterisks in
> >> e-mails as it's very confusing.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Wed, Feb 22, 2017 at 8:12 AM, Pratik Patel 
> wrote:
> >> > Hi Eric,
> >> >
> >> > Thanks for the reply! Following is the relevant part of response
> header
> >> > with debugQuery on.
> >> >
> >> > {
> >> > "responseHeader":{ "status":0, "QTime":282, "params":{
> >> "q":"Description_note:*
> >> > and *", "indent":"on", "wt":"json", "debugQuery":"on",
> >> "_":"1487773835305"}},
> >> > "response":{"numFound":81771,"start":0,"docs":[ { "id":"", .
> >> > .
> >> > .
> >> > },..
> >> > ]
> >> > }
> >> > }
> >> >
> >> >
> >> > On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Attach &debug=query to your query and look at the parsed query that's
> >> >> returned.
> >

Using multi valued field in solr cloud Graph Traversal Query

2017-03-10 Thread Pratik Patel

I am trying to do a graph traversal query using gatherNode function. I am
seeding a streaming expression to get some documents and then I am trying
to map their ids(conceptid) to a multi valued field "participantIds" and
gather nodes.

Here is the query I am doing.


gatherNodes(collection1,
> search(collection1,q="*:*",fl="conceptid",sort="conceptid
> asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Project"),
> walk=conceptid->participantIds,
> gather="conceptid")


The field participantIds is a multi valued field. This is the field which
holds connections between the documents. When I execute this query, I get
exception as below.


{ "result-set": { "docs": [ { "EXCEPTION":
"java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: -->
http://169.254.40.158:8081/solr/collection1_shard1_replica1/:can not sort
on multivalued field: participantIds", "EOF": true, "RESPONSE_TIME": 15 } ]
} }


Does this mean you can not look into multivalued fields in graph traversal
query? In our solr index, we have documents having "conceptid" field which
is id and we have participantIds which is a multivalued field storing
connections of that documents to other documents. I believe we need to have
one field in document which stores connections of that document so that
graph traversal is possible. If not, what is the other the way to index
graph data and use graph traversal. I am trying to explore graph traversal
and am new to it. Any help would be appreciated.

Thanks,
Pratik

BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel

> at
> org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:358)
> at
> org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructOperation(StreamFactory.java:339)
> at
> org.apache.solr.client.solrj.io.stream.HavingStream.(HavingStream.java:72)
> ... 38 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at
> org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:351)
> ... 40 more
> Caused by: java.lang.NumberFormatException: For input string:
> "524efcfd505637004b1f6f24"
> at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
> at sun.misc.FloatingDecimal.parseDouble(Unknown Source)
> at java.lang.Double.parseDouble(Unknown Source)
> at
> org.apache.solr.client.solrj.io.ops.LeafOperation.(LeafOperation.java:48)
> at
> org.apache.solr.client.solrj.io.ops.EqualsOperation.(EqualsOperation.java:42)
> ... 44 more


I can see that solr is trying to parse storeid as double and hence the
NumberFormatException, even though this field is of type String in schema.
How can I fix this?


Thanks,
Pratik

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel

it's not a stable version*

On Mon, Mar 13, 2017 at 1:34 PM, Pratik Patel  wrote:

> Thanks Joel! This is just a simplified sample query that I created to
> better demonstrate the issue. I am not sure whether I want to upgrade to
> solr 6.5 as only developer version is available yet and it's a stable
> version as far as I know. Thanks for the clarification. I will try to find
> some other logic for my query.
>
> On Mon, Mar 13, 2017 at 1:23 PM, Joel Bernstein 
> wrote:
>
>> If you're using Solr 6.4 then the expression you're running won't work,
>> because on numeric comparisons are supported.
>>
>> Solr 6.5 will have the expanded Evaluator functionality, which has string
>> comparisons.
>>
>> In the expression you're working with it would be much more performant
>> though to filter the query on the storeid.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, Mar 13, 2017 at 1:06 PM, Pratik Patel 
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to write a streaming expression with 'having' function in
>> it.
>> > Following is my simple query.
>> >
>> >
>> > having(
>> > >search(collection1,q="*:*",fl="storeid",sort="storeid
>> > > asc",fq=tags:"Company"),
>> > >eq(storeid,524efcfd505637004b1f6f24)
>> > > )
>> >
>> >
>> > Here, storeid is a field of type "string" in schema. But when I execute
>> > this query in admin UI, I am getting a  NumberFormatException.
>> >
>> > Here is the response in admin UI.
>> >
>> >
>> > { "result-set": { "docs": [ { "EXCEPTION": "For input string:
>> > \"524efcfd505637004b1f6f24\"", "EOF": true } ] } }
>> >
>> > If I change storeid value to 123 in the boolean evaluator then it works
>> > fine. I tried to quote the original value so that we have
>> > eq(storeid,"524efcfd505637004b1f6f24") but still it fails with same
>> > exception.
>> >
>> > Here is the detailed stack trace from log file.
>> >
>> >
>> > ERROR - 2017-03-13 16:56:39.516; [c:collection1 s:shard1 r:core_node1
>> > > x:collection1_shard1_replica1] org.apache.solr.common.SolrException;
>> > > java.io.IOException: Unable to construct instance of
>> > > org.apache.solr.client.solrj.io.stream.HavingStream
>> > > at
>> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
>> > createInstance(StreamFactory.java:358)
>> > > at
>> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
>> > constructStream(StreamFactory.java:222)
>> > > at
>> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
>> > constructStream(StreamFactory.java:215)
>> > > at
>> > > org.apache.solr.handler.StreamHandler.handleRequestBody(
>> > StreamHandler.java:212)
>> > > at
>> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> > RequestHandlerBase.java:166)
>> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
>> > > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.
>> java:658)
>> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
>> > > at
>> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > SolrDispatchFilter.java:345)
>> > > at
>> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > SolrDispatchFilter.java:296)
>> > > at
>> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> > doFilter(ServletHandler.java:1691)
>> > > at
>> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> > ServletHandler.java:582)
>> > > at
>> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> > ScopedHandler.java:143)
>> > > at
>> > > org.eclipse.jetty.security.SecurityHandler.handle(
>> > SecurityHandler.java:548)
>> > > at
>> > > org.eclipse.jetty.server.session.SessionHandler.
>> > doHandle(SessionHandler.java:226)
>> > > at
>> > > org.eclipse.jetty.server.handler.ContextHandler.
>> > doHandle(ContextHandler.java:1180)
>> > > at
>> > > org.eclipse.jetty.servlet.ServletHandler.doScope(
>> &

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel

Thanks Joel! This is just a simplified sample query that I created to
better demonstrate the issue. I am not sure whether I want to upgrade to
solr 6.5 as only developer version is available yet and it's a stable
version as far as I know. Thanks for the clarification. I will try to find
some other logic for my query.

On Mon, Mar 13, 2017 at 1:23 PM, Joel Bernstein  wrote:

> If you're using Solr 6.4 then the expression you're running won't work,
> because on numeric comparisons are supported.
>
> Solr 6.5 will have the expanded Evaluator functionality, which has string
> comparisons.
>
> In the expression you're working with it would be much more performant
> though to filter the query on the storeid.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 13, 2017 at 1:06 PM, Pratik Patel  wrote:
>
> > Hi,
> >
> > I am trying to write a streaming expression with 'having' function in it.
> > Following is my simple query.
> >
> >
> > having(
> > >search(collection1,q="*:*",fl="storeid",sort="storeid
> > > asc",fq=tags:"Company"),
> > >eq(storeid,524efcfd505637004b1f6f24)
> > > )
> >
> >
> > Here, storeid is a field of type "string" in schema. But when I execute
> > this query in admin UI, I am getting a  NumberFormatException.
> >
> > Here is the response in admin UI.
> >
> >
> > { "result-set": { "docs": [ { "EXCEPTION": "For input string:
> > \"524efcfd505637004b1f6f24\"", "EOF": true } ] } }
> >
> > If I change storeid value to 123 in the boolean evaluator then it works
> > fine. I tried to quote the original value so that we have
> > eq(storeid,"524efcfd505637004b1f6f24") but still it fails with same
> > exception.
> >
> > Here is the detailed stack trace from log file.
> >
> >
> > ERROR - 2017-03-13 16:56:39.516; [c:collection1 s:shard1 r:core_node1
> > > x:collection1_shard1_replica1] org.apache.solr.common.SolrException;
> > > java.io.IOException: Unable to construct instance of
> > > org.apache.solr.client.solrj.io.stream.HavingStream
> > > at
> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> > createInstance(StreamFactory.java:358)
> > > at
> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> > constructStream(StreamFactory.java:222)
> > > at
> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.
> > constructStream(StreamFactory.java:215)
> > > at
> > > org.apache.solr.handler.StreamHandler.handleRequestBody(
> > StreamHandler.java:212)
> > > at
> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> > RequestHandlerBase.java:166)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
> > > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:345)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:296)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:582)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > > at
> > > org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:512)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.

Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel

I have two types of documents in my index. eventLink and concepttData.

eventLink  { ancestors:[,] }
conceptData-{ id:id1, conceptid, concept_name . }

Both are in same collection.
In my query, I am doing a gatherNodes query wrapped in some other function
and ultimately I am getting a bunch of eventLink documents. Now, I am
trying to get conceptData document for each id specified in eventLink's
ancestors field. I am trying to do that using fetch() function. Here is
simplified form of my query.

fetch(collection1,
>  function to get eventLinks,
>   fl="concept_name",
>   on="ancestors=conceptid"
> )


On executing this query, I am getting back same set of documents which are
results of my streaming expression containing gatherNodes() function. No
fields are added to the tuples. From documentation, it seems like fetch
would fetch additional data and add it to the tuples. However, that is not
happening. Resulting tuples does not have concept_name field in them. What
am I missing here? I really need to get this additional data from one solr
query so that I don't have to iterate over the eventLinks and get
additional data by individual queries. That would badly impact performance.
Any suggestions?

Here is my actual query and the response.


fetch(collection1,
>  having(
> gatherNodes(collection1,
> search(collection1,q="*:*",fl="conceptid",sort="conceptid
> asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2",
> qt="/export"),
> walk=conceptid->eventParticipantID,
> gather="eventID",
> trackTraversal="true", scatter="leaves",
> count(*)
> ),
> gt(count(*),1)
> ),
> fl="concept_name",
> on="ancestors=conceptid"
> )



Response :

{
> "result-set": {
> "docs": [
> {
> "node": "524f03355056c8b53b4ed199",
> "field": "eventID",
> "level": 1,
> "count(*)": 2,
> "collection": "collection1",
> "ancestors": [
> "524f02845056c8b53b4e9871",
> "524f02755056c8b53b4e9269"
> ]
> },
> .
> }


Thanks,
Pratik

Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel

Hi, Joel. Thanks for the reply.

So, I need to do some graph traversal queries for my use case. In my data
set, I have concepts and events.

concept : {name, address, bio ..},
> event: {name, date, participantIds:[concept1, concept2...] .}

Events connects two or more concepts. So, this is a graph data where
concepts are connected to each other via events. Each event store links to
the concepts that it connects. So the field which stores those links is
multivalued. This is a natural structure for my data on which I wanted to
do some advanced graph traversal queries with some streaming expression.
However, gatherNodes() function does not support multivalued fields yet.
So, I changed my index structure to be something like this.

concept : {conceptId, name, address, bio ..},
> event: {eventId, name, date, participantIds:[concept1, concept2...] .}
> *create eventLink documents for each participantId in each
> event
> eventLink:{eventid, conceptid, id}

I created eventLink documents from each event so that I can traverse the
data using gatherNodes() function. With this change, I was able to do graph
query and get Ids of concepts which I wanted. However, I only have ids of
concepts. Now, using these ids, I want additional data from concept
documents like concept_name or address or bio.  This is what I was trying
to achieve with fetch() function but it seems I hit the multivalued
limitation again :) The reason why I am storing only the ids in eventLink
documents is because I don't want to duplicate data unnecessarily. It will
complicate maintenance of consistency in index when delete/update happens.
Is there any way I can achieve this?

Thanks!
Pratik

On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein  wrote:

> Wow that's an interesting expression!
>
> The problem is that you are trying to fetch using the ancestors field,
> which is multi-valued. fetch doesn't support multi-value join keys. I never
> thought someone might try to do that.
>
> So , your attempting to get the concept names for ancestors?
>
> Can you explain a little more about the use case?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel 
> wrote:
>
> > I have two types of documents in my index. eventLink and concepttData.
> >
> > eventLink  { ancestors:[,] }
> > conceptData-{ id:id1, conceptid, concept_name . }
> >
> > Both are in same collection.
> > In my query, I am doing a gatherNodes query wrapped in some other
> function
> > and ultimately I am getting a bunch of eventLink documents. Now, I am
> > trying to get conceptData document for each id specified in eventLink's
> > ancestors field. I am trying to do that using fetch() function. Here is
> > simplified form of my query.
> >
> > fetch(collection1,
> > >  function to get eventLinks,
> > >   fl="concept_name",
> > >   on="ancestors=conceptid"
> > > )
> >
> >
> > On executing this query, I am getting back same set of documents which
> are
> > results of my streaming expression containing gatherNodes() function. No
> > fields are added to the tuples. From documentation, it seems like fetch
> > would fetch additional data and add it to the tuples. However, that is
> not
> > happening. Resulting tuples does not have concept_name field in them.
> What
> > am I missing here? I really need to get this additional data from one
> solr
> > query so that I don't have to iterate over the eventLinks and get
> > additional data by individual queries. That would badly impact
> performance.
> > Any suggestions?
> >
> > Here is my actual query and the response.
> >
> >
> > fetch(collection1,
> > >  having(
> > > gatherNodes(collection1,
> > > search(collection1,q="*:*",fl="conceptid",sort="conceptid
> > > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"
> > Prospects2",
> > > qt="/export"),
> > > walk=conceptid->eventParticipantID,
> > > gather="eventID",
> > > trackTraversal="true", scatter="leaves",
> > > count(*)
> > > ),
> > > gt(count(*),1)
> > > ),
> > > fl="concept_name",
> > > on="ancestors=conceptid"
> > > )
> >
> >
> >
> > Response :
> >
> > {
> > > "result-set": {
> > > "docs": [
> > > {
> > > "node": "524f03355056c8b53b4ed199",
> > > "field": "eventID",
> > > "level": 1,
> > > "count(*)": 2,
> > > "collection": "collection1",
> > > "ancestors": [
> > > "524f02845056c8b53b4e9871",
> > > "524f02755056c8b53b4e9269"
> > > ]
> > > },
> > > .
> > > }
> >
> >
> > Thanks,
> > Pratik
> >
>

Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel

Wow, this is interesting! Is it going to be a new addition to solr or is it
already available cause I can not find it in documentation? I am using solr
version 6.4.1.

On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein  wrote:

> I'm going to add a "cartesian" function that create a cartesian product
> from a multi-value field. This will turn a single tuple with a multi-value
> into multiple tuples with a single value field. This will allow the fetch
> operation to work on ancestors. It also has many other use cases. Sample
> syntax:
>
> fetch(collection1,
>  cartesian(field=ancestors,
>  having(gatherNodes(collection1,
>
>  search(collection1,
>
>  q="*:*",
>
>  fl="conceptid",
>
>  sort="conceptid asc",
>
>  fq=storeid:"524efcfd505637004b1f6f24",
>
>  fq=tags:"Company",
>
>  fq=tags:"Prospects2",
>
>  qt="/export"),
>
> walk=conceptid->eventParticipantID,
>
> gather="eventID",
>   t
> rackTraversal="true",
>
> scatter="leaves",
> count(*)),
>  gt(count(*),1))),
>  fl="concept_name",
>  on="ancestors=conceptid")
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel 
> wrote:
>
> > Hi, Joel. Thanks for the reply.
> >
> > So, I need to do some graph traversal queries for my use case. In my data
> > set, I have concepts and events.
> >
> > concept : {name, address, bio ..},
> > > event: {name, date, participantIds:[concept1, concept2...] .}
> >
> >
> > Events connects two or more concepts. So, this is a graph data where
> > concepts are connected to each other via events. Each event store links
> to
> > the concepts that it connects. So the field which stores those links is
> > multivalued. This is a natural structure for my data on which I wanted to
> > do some advanced graph traversal queries with some streaming expression.
> > However, gatherNodes() function does not support multivalued fields yet.
> > So, I changed my index structure to be something like this.
> >
> > concept : {conceptId, name, address, bio ..},
> > > event: {eventId, name, date, participantIds:[concept1, concept2...]
> > .}
> > > *create eventLink documents for each participantId in each
> > > event
> > > eventLink:{eventid, conceptid, id}
> >
> >
> >
> > I created eventLink documents from each event so that I can traverse the
> > data using gatherNodes() function. With this change, I was able to do
> graph
> > query and get Ids of concepts which I wanted. However, I only have ids of
> > concepts. Now, using these ids, I want additional data from concept
> > documents like concept_name or address or bio.  This is what I was trying
> > to achieve with fetch() function but it seems I hit the multivalued
> > limitation again :) The reason why I am storing only the ids in eventLink
> > documents is because I don't want to duplicate data unnecessarily. It
> will
> > complicate maintenance of consistency in index when delete/update
> happens.
> > Is there any way I can achieve this?
> >
> > Thanks!
> > Pratik
> >
> >
> >
> >
> >
> > On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein 
> > wrote:
> >
> > > Wow that's an interesting expression!
> > >
> > > The problem is that you are trying to fetch using the ancestors field,
> > > which is multi-valued. fetch doesn't support multi-value join keys. I
> > never
> > > thought someone might try to do that.
> > >
> > > So , your attempting to get the concept names for ancestors?
> > >
> > > Can you explain a little more about the use case?
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel 
> > > wrote:
> > >
> > > > I have two types of documents in my index. eventLink and
> concepttData.
> > > >
> > > > eventLink  { ancestors:[,] }
> > > > conceptData-{ id:id1, conceptid, concept_name . > data> }
> > > >
> > > > Both are in same collection.
> > > > In my query, I am doing a gatherNodes query wrapped in some other
> > > f

How to implement nested streaming expressions in Java using solrj

2017-03-15 Thread Pratik Patel

I am trying to write a streaming expression in solrj. Following is the
query that I want to implement in Java.

having(
> gatherNodes(collection1,
> search(collection1,q="*:*",fl="conceptid",sort="conceptid
> asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2",
> qt="/export"),
> walk=conceptid->eventParticipantID,
> gather="eventID",
> trackTraversal="true", scatter="leaves",
> count(*)
> ),
> gt(count(*),1)
> )


Using this article (
http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html)
I could implement and run single streaming expression,

search(collection1,q="*:*",fl="conceptid",sort="conceptid
asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2",
qt="/export")

But I can not find a way to create a nested query. How can I do that?

Thanks,
Pratik

Re: Using fetch function with streaming expression

2017-03-15 Thread Pratik Patel

Great, I think I can achieve what I want by combining "select" and
"cartersian" functions in my expression. Thanks a lot for help!

Regards,
Pratik

On Wed, Mar 15, 2017 at 10:21 AM, Joel Bernstein  wrote:

> I haven't created the jira ticket for this yet. It's fairly quick to
> implement but the Solr 6.5 release is just around the corner. So most
> likely it would be in the Solr 6.6.  It will be committed fairly soon
> though so if you want to use master, or branch_6x you can experiment with
> it earlier.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 7:53 PM, Pratik Patel  wrote:
>
> > Wow, this is interesting! Is it going to be a new addition to solr or is
> it
> > already available cause I can not find it in documentation? I am using
> solr
> > version 6.4.1.
> >
> > On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein 
> > wrote:
> >
> > > I'm going to add a "cartesian" function that create a cartesian product
> > > from a multi-value field. This will turn a single tuple with a
> > multi-value
> > > into multiple tuples with a single value field. This will allow the
> fetch
> > > operation to work on ancestors. It also has many other use cases.
> Sample
> > > syntax:
> > >
> > > fetch(collection1,
> > >  cartesian(field=ancestors,
> > >  having(gatherNodes(collection1,
> > >
> > >  search(collection1,
> > >
> > >  q="*:*",
> > >
> > >  fl="conceptid",
> > >
> > >  sort="conceptid asc",
> > >
> > >  fq=storeid:"524efcfd505637004b1f6f24",
> > >
> > >  fq=tags:"Company",
> > >
> > >  fq=tags:"Prospects2",
> > >
> > >  qt="/export"),
> > >
> > > walk=conceptid->eventParticipantID,
> > >
> > > gather="eventID",
> > >   t
> > > rackTraversal="true",
> > >
> > > scatter="leaves",
> > > count(*)),
> > >  gt(count(*),1))),
> > >  fl="concept_name",
> > >  on="ancestors=conceptid")
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel 
> > > wrote:
> > >
> > > > Hi, Joel. Thanks for the reply.
> > > >
> > > > So, I need to do some graph traversal queries for my use case. In my
> > data
> > > > set, I have concepts and events.
> > > >
> > > > concept : {name, address, bio ..},
> > > > > event: {name, date, participantIds:[concept1, concept2...] .}
> > > >
> > > >
> > > > Events connects two or more concepts. So, this is a graph data where
> > > > concepts are connected to each other via events. Each event store
> links
> > > to
> > > > the concepts that it connects. So the field which stores those links
> is
> > > > multivalued. This is a natural structure for my data on which I
> wanted
> > to
> > > > do some advanced graph traversal queries with some streaming
> > expression.
> > > > However, gatherNodes() function does not support multivalued fields
> > yet.
> > > > So, I changed my index structure to be something like this.
> > > >
> > > > concept : {conceptId, name, address, bio ..},
> > > > > event: {eventId, name, date, participantIds:[concept1, concept2...]
> > > > .}
> > > > > *create eventLink documents for each participantId in each
> > > > > event
> > > > > eventLink:{eventid, conceptid, id}
> > > >
> > > >
> > > >
> > > > I created eventLink documents from each event so that I can traverse
> > the
> > > > data using gatherNodes() function. With this change, I was able to do
> > > graph
> > > > query and get Ids of concepts which I wanted. However, I only have
> ids
> > of
> > > > concepts. Now, using these ids, I want additional data from concept
> > > > documents like concept_name or address or bio.  This is what I was
> > trying
> > > > to achieve with fetch() function but it s

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-04-12 Thread Pratik Thaker

Hi All,

I am facing this issue since very long, can you please provide your suggestion 
on it ?

Regards,
Pratik Thaker

-Original Message-
From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com]
Sent: 09 February 2017 21:24
To: 'solr-user@lucene.apache.org'
Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled from 
this updateRequestProcessorChain

Hi Friends,

Can you please try to give me some details about below issue ?

Regards,
Pratik Thaker

From: Pratik Thaker
Sent: 07 February 2017 17:12
To: 'solr-user@lucene.apache.org'
Subject: DistributedUpdateProcessorFactory was explicitly disabled from this 
updateRequestProcessorChain

Hi All,

I am using SOLR Cloud 6.0

I am receiving below exception very frequently in solr logs,

o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
RunUpdateProcessor has received an AddUpdateCommand containing a document that 
appears to still contain Atomic document update operations, most likely because 
DistributedUpdateProcessorFactory was explicitly disabled from this 
updateRequestProcessorChain
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

Can you please help me with the root cause ? Below is the snapshot of 
solrconfig,






   


  [^\w-\.]
  _





  
-MM-dd'T'HH:mm:ss.SSSZ
-MM-dd'T'HH:mm:ss,SSSZ
-MM-dd'T'HH:mm:ss.SSS
-MM-dd'T'HH:mm:ss,SSS
-MM-dd'T'HH:mm:ssZ
-MM-dd'T'HH:mm:ss
-MM-dd'T'HH:mmZ
-MM-dd'T'HH:mm
-MM-dd HH:mm:ss.SSSZ
-MM-dd HH:mm:ss,SSSZ
-MM-dd HH:mm:ss.SSS
-MM-dd HH:mm:ss,SSS
-MM-dd HH:mm:ssZ
-MM-dd HH:mm:ss
-MM-dd HH:mmZ
-MM-dd HH:mm
-MM-dd
  


  strings
  
java.lang.Boolean
booleans
  
  
java.util.Date
tdates
  
  
java.lang.Long
java.lang.Integer
    tlongs
  
  
java.lang.Number
tdoubles
  


  

Regards,
Pratik Thaker


The information in this email is confidential and may be legally privileged. It 
is intende

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-04-19 Thread Pratik Thaker

Hi Ishan,

After making suggested changes to solrconfig.xml, I did upconfig on all 3 SOLR 
VMs and restarted SOLR engines.

But still I am facing same issue. Is it something I am missing ?

Regards,
Pratik Thaker

-Original Message-
From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com]
Sent: 14 April 2017 02:12
To: solr-user@lucene.apache.org
Subject: Re: DistributedUpdateProcessorFactory was explicitly disabled from 
this updateRequestProcessorChain

Why are you adding these update processors (esp. the
AddSchemaFieldsUpdateProcessor) after DistributedUpdateProcessor? Try adding 
them before DUP, and it has a better chance to work.

On Wed, Apr 12, 2017 at 3:44 PM, Pratik Thaker < 
pratik.tha...@smartstreamrdu.com> wrote:

> Hi All,
>
> I am facing this issue since very long, can you please provide your
> suggestion on it ?
>
> Regards,
> Pratik Thaker
>
> -----Original Message-
> From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com]
> Sent: 09 February 2017 21:24
> To: 'solr-user@lucene.apache.org'
> Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled
> from this updateRequestProcessorChain
>
> Hi Friends,
>
> Can you please try to give me some details about below issue ?
>
> Regards,
> Pratik Thaker
>
> From: Pratik Thaker
> Sent: 07 February 2017 17:12
> To: 'solr-user@lucene.apache.org'
> Subject: DistributedUpdateProcessorFactory was explicitly disabled
> from this updateRequestProcessorChain
>
> Hi All,
>
> I am using SOLR Cloud 6.0
>
> I am receiving below exception very frequently in solr logs,
>
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
> RunUpdateProcessor has received an AddUpdateCommand containing a
> document that appears to still contain Atomic document update
> operations, most likely because DistributedUpdateProcessorFactory was
> explicitly disabled from this updateRequestProcessorChain
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
> RunUpdateProcessorFactory.java:63)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessor
> Factory$AddSchemaFieldsUpdateProcessor.processAdd(
> AddSchemaFieldsUpdateProcessorFactory.java:335)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.
> processAdd(FieldMutatingUpdateProcessor.java:117)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.
> processAdd(FieldMutatingUpdateProcessor.java:117)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.
> processAdd(FieldMutatingUpdateProcessor.java:117)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.
> processAdd(FieldMutatingUpdateProcessor.java:117)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldNameMutatingUpdateProcess
> orFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.
> processAdd(FieldMutatingUpdateProcessor.java:117)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at org.apache.solr.update.processor.DistributedUpdateProcessor.
> doLocalAdd(DistributedUpdateProcessor.java:936)
> at org.apache.solr.update.processor.DistributedUpdateProcessor.
> versionAdd(DistributedUpdateProcessor.java:1091)
> at org.apache.solr.update.processor.DistributedUpdateProcessor.
> processAdd(DistributedUpdateProcessor.java:714)
> at org.apache.solr.update.processor.UpdateRequestProcessor.
> processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProc
> essorFactory$DefaultValueUpdateProcessor.processAdd(
> AbstractDefaultValueUpdateProcessorFactory.java:93)
> at org.apache.solr.handler.loader.JavabinLoader$1.update(
> JavabinLoader.java:97)
>

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-04-24 Thread Pratik Thaker

Hi Alessandro,

Can you please suggest what should be the correct order of adding processors ?

I am having 5 collections, 6 shards, replication factor 2, 3 nodes on 3 
separate VMs.

Regards,
Pratik Thaker

-Original Message-
From: alessandro.benedetti [mailto:a.benede...@sease.io]
Sent: 21 April 2017 13:38
To: solr-user@lucene.apache.org
Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled from 
this updateRequestProcessorChain

Let's make a quick differentiation between PRE and POST processors in a Solr 
Cloud atchitecture :

 "In a single node, stand-alone Solr, each update is run through all the update 
processors in a chain exactly once. But the behavior of update request 
processors in SolrCloud deserves special consideration. " cit. wiki

*PRE PROCESSORS*
All the processors defined BEFORE the distributedUpdateProcessor happen ONLY on 
the first node that receive the update ( regardless if it is a leader or a 
replica ).

*POST PROCESSORS*
The distributedUpdateProcessor will forward the update request to the the 
correct leader ( or multiple leaders if the request involves more shards), the 
leader will then forward to the replicas.
The leaders and replicas at this point will execute all the update request 
processors defined AFTER the distributedUpdateProcessor.

" Pre-processors and Atomic Updates
Because DistributedUpdateProcessor is responsible for processing Atomic Updates 
into full documents on the leader node, this means that pre-processors which 
are executed only on the forwarding nodes can only operate on the partial 
document. If you have a processor which must process a full document then the 
only choice is to specify it as a post-processor."
wiki

In your example, your chain is definitely messed up, the order is important and 
you want your heavy processing to happen only on the first node.

For better info and clarification:
https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ( you can find 
here a working alternative to your chain) 
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DistributedUpdateProcessorFactory-was-explicitly-disabled-from-this-updateRequestProcessorChain-tp4319154p4331215.html
Sent from the Solr - User mailing list archive at Nabble.com.

 The information in this email is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Solr Carrot Clustering query with specific label in it

2017-05-16 Thread Pratik Patel

Hi,

When we do a Carrot Clustering query on a set of solr documents we get back
following type of response.



  
DDR
  
  3.9599865057283354
  
TWINX2048-3200PRO
VS1GB400C3
VDBDB1A16
  


  
iPod
  
  11.959228467119022
  
F8V7067-APL-KIT
IW-02
MA147LL/A
  





Each label(cluster) has corresponding set of documents. The question is, is
it possible to make another Carrot Clustering query with specific label in
it so as to only get back documents corresponding to that label.

In my use case, I am trying to write a streaming expression where one of
the stream is documents corresponding to a label(carrot cluster) selected
by user. Hence, I can not use the data present in original response object.

I have been exploring Carrot2 documentation but I can't seem find any
option which lets you specify a label in the query. I am using solr 6.4.1
in cloud mode and clustering algorithm is
"org.carrot2.clustering.lingo.LingoClusteringAlgorithm"

Thanks,

Pratik

Semantic Knowledge Graph query using SolrJ

2018-08-14 Thread Pratik Patel

I am trying to use Semantic Knowledge Graph in my java based application.

I have a Semantic Knowledge Graph query which works fine if I trigger it
through browser using restlet client. Following is the query.

{
  "queries": [
"foo:\"5a6127a7234e76473a816f1c\""
  ],
  "compare": [
{
  "type": "bar",
  "limit": 30
}
  ]}

Now, I want to trigger the same query through SolrJ client. I have tried
following code but it gives me an error

{"error":{"msg":"KnowledgeGraphHandler requires POST data","code":400}}

The code in java is

SolrQuery request = new SolrQuery();
request.setRequestHandler("/skg");
request.setShowDebugInfo(true);
request.setParam("wt", "json");
request.setParam("json",
"{\"queries\":[\"foo:\\\"5a6127a7234e76473a816f1c\\\"\"],\"compare\":[{\"type\":\"bar\",\"limit\":30}]}");
request.set("rows", 10);
request.setParam("qf", "conceptname^10 tags^3 textproperty^2 file_text^4");
try {
QueryResponse response =
getStore().getEnvironment().getSolr().query(request,
SolrRequest.METHOD.POST);
NamedList rsp = response.getResponse();

ArrayList> skg_resp =
(ArrayList>) rsp.get("clusters");

if (skg_resp != null) {

}
}

Any idea what is wrong here? Any pointer to documentation on how to
construct request for Semantic Knowledge Graph through solrJ would be very
helpful.

Thanks

Pratik

Question on query time boosting

2018-08-23 Thread Pratik Patel

Hello All,

I am trying to understand how exactly query time boosting works in solr.
Primarily, I want to understand if absolute boost values matter or is it
just the relative difference between various boost values which decides
scoring. Let's take following two queries for example.

// case1: q parameter

> concept_name:(*semantic*)^200 OR
> concept_name:(*machine*)^400 OR
> Abstract_note:(*semantic*)^20 OR
> Abstract_note:(*machine*)^40


//case2: q parameter

> concept_name:(*semantic*)^20 OR
> concept_name:(*machine*)^40 OR
> Abstract_note:(*semantic*)^2 OR
> Abstract_note:(*machine*)^4


Are these two queries any different?

Relative boosting is same in both of them.
I can see that they produce same results and ordering. Only difference is
that the score in case1 is 10 times the score in case2.

Thanks,
Pratik

Named entity extraction/correlation using Semantic Knowledge Graph

2018-10-17 Thread Pratik Patel

Hi Everyone,

I have been using Semantic Knowledge Graph for document summarization, term
correlation and document similarity. It has produced very good results
after appropriate tuning.

I was wondering if there is any way the Semantic Knowledge Graph can be
used to for Named Entity Extraction like person names, companies etc.
Related cases could be like below.

1. Extracting top named entities given a specific document or set of
documents.
2. Given a named entity (let's say person name), return top N entities
which are conceptually related to that entity.

Does anyone have an idea as to how this can be achieved? Any direction
would be a great help!

Thanks And Regards,
Pratik

Re: Named entity extraction/correlation using Semantic Knowledge Graph

2018-10-18 Thread Pratik Patel

I am on look out for ideas too but I was thinking of using some NER
technique to index named entities in a specific field and then use Semantic
Knowledge Graph on that specific field i.e. limit SKG queries to that field
only. I am not sure however if this would produce desired results. I don't
have a training corpus yet. Essentially what I want is something like a
Solr Filter for entities or a request handler which can extract entities at
query time.


On Wed, Oct 17, 2018 at 4:45 PM Alexandre Rafalovitch 
wrote:

> Solr does have:
> 1) OpenNLP that does NER specifically
> 2) TextTagger that does gazeteer NER based on existing list but with
> Solr analysis power
>
> I would be curious to know how Semantic Knowledge Graph could be used
> from NER (or even for other things you already have used it for), but
> I am not sure it is clear what specifically you invisage. As in, is
> there training corpus, are you looking at NGram techniques, etc.
>
> Regards,
>    Alex.
> On Wed, 17 Oct 2018 at 13:40, Pratik Patel  wrote:
> >
> > Hi Everyone,
> >
> > I have been using Semantic Knowledge Graph for document summarization,
> term
> > correlation and document similarity. It has produced very good results
> > after appropriate tuning.
> >
> > I was wondering if there is any way the Semantic Knowledge Graph can be
> > used to for Named Entity Extraction like person names, companies etc.
> > Related cases could be like below.
> >
> > 1. Extracting top named entities given a specific document or set of
> > documents.
> > 2. Given a named entity (let's say person name), return top N entities
> > which are conceptually related to that entity.
> >
> > Does anyone have an idea as to how this can be achieved? Any direction
> > would be a great help!
> >
> > Thanks And Regards,
> > Pratik
>

Extracting important multi term phrases from the text

2018-11-15 Thread Pratik Patel

Hello Everyone,

Standard way of tokenizing in solr would divide the text by white space in
solr.

Is there a way by which we can index multi-term phrases like "Machine
Learning" instead of "Machine", "Learning"?
Is it possible to create a specific field type for such phrases which has
its own indexing pipeline? I am open to storing n-grams but these n-grams
would be across terms and not just one term? In other words, I don't want
to store n-grams of the term "machine", I want to store n-grams for a
sentence like below.

"I like machine learning" --> "I like", "like machine", "machine learning"
and so on.

It seems like Shingle Filter (
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter)
may be used for this. Is there a better alternative?

I want to use this field as an input to Semantic Knowledge Graph. The
plugin works great for words. But now I want to use it for phrases. Any
idea around this would be really helpful.

Thanks a lot!

- Pratik

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Pratik Patel

Hi Markus,

Thanks for the reply. I tried using ShingleFilter and it seems to
be working. However, I am hitting an issue when it is used with
StopWordFilter. StopWordFilter leaves an underscore "_" for removed words
and it kind of screws up the data in index.

I tried setting enablePositionIncrements="false" for stop word filter but
that parameter only works for lucene version 4.3 or earlier. Looks like
it's an open issue in lucene
https://issues.apache.org/jira/browse/LUCENE-4065

For now, I am trying to find a workaround using PatternReplaceFilterFactory.

Regards,
Pratik

On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma 
wrote:

> Hello Pratik,
>
> We would use ShingleFilter for this indeed. If you only want
> bigrams/shingles, don't forget to disable outputUnigrams and set both
> shinle size limits to 2.
>
> Regards,
> Markus
>
> -Original message-
> > From:Pratik Patel 
> > Sent: Thursday 15th November 2018 17:00
> > To: solr-user@lucene.apache.org
> > Subject: Extracting important multi term phrases from the text
> >
> > Hello Everyone,
> >
> > Standard way of tokenizing in solr would divide the text by white space
> in
> > solr.
> >
> > Is there a way by which we can index multi-term phrases like "Machine
> > Learning" instead of "Machine", "Learning"?
> > Is it possible to create a specific field type for such phrases which has
> > its own indexing pipeline? I am open to storing n-grams but these n-grams
> > would be across terms and not just one term? In other words, I don't want
> > to store n-grams of the term "machine", I want to store n-grams for a
> > sentence like below.
> >
> > "I like machine learning" --> "I like", "like machine", "machine
> learning"
> > and so on.
> >
> > It seems like Shingle Filter (
> >
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter
> )
> > may be used for this. Is there a better alternative?
> >
> > I want to use this field as an input to Semantic Knowledge Graph. The
> > plugin works great for words. But now I want to use it for phrases. Any
> > idea around this would be really helpful.
> >
> > Thanks a lot!
> >
> > - Pratik
> >
>

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Pratik Patel

@Markus @Walter,  @Alexandre is right. The culprit was not StopWord Filter,
it was ShingleFilter. I could not find parameter filterToken in
documentation, is it a new addition? BTW, I tried that and it works. Thanks!
I still ended up using pattern replacement filter because I did not want
any single word string in that field.

@David I am using SKG through the plugin. So it is a POST request with
query in body. I haven't yet upgraded to version 7.5.

Thank you all for the help!

Regards,
Pratik

On Fri, Nov 16, 2018 at 8:36 AM David Hastings 
wrote:

> Which function of the SKG are you using?  significantTerms?
>
> On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch 
> wrote:
>
> > I think the underscore actually comes from the Shingles (parameter
> > fillerToken). Have you tried setting it to empty string?
> >
> > Regards,
> >Alex.
> > On Thu, 15 Nov 2018 at 17:16, Pratik Patel  wrote:
> > >
> > > Hi Markus,
> > >
> > > Thanks for the reply. I tried using ShingleFilter and it seems to
> > > be working. However, I am hitting an issue when it is used with
> > > StopWordFilter. StopWordFilter leaves an underscore "_" for removed
> words
> > > and it kind of screws up the data in index.
> > >
> > > I tried setting enablePositionIncrements="false" for stop word filter
> but
> > > that parameter only works for lucene version 4.3 or earlier. Looks like
> > > it's an open issue in lucene
> > > https://issues.apache.org/jira/browse/LUCENE-4065
> > >
> > > For now, I am trying to find a workaround using
> > PatternReplaceFilterFactory.
> > >
> > > Regards,
> > > Pratik
> > >
> > > On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma <
> > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > Hello Pratik,
> > > >
> > > > We would use ShingleFilter for this indeed. If you only want
> > > > bigrams/shingles, don't forget to disable outputUnigrams and set both
> > > > shinle size limits to 2.
> > > >
> > > > Regards,
> > > > Markus
> > > >
> > > > -Original message-
> > > > > From:Pratik Patel 
> > > > > Sent: Thursday 15th November 2018 17:00
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Extracting important multi term phrases from the text
> > > > >
> > > > > Hello Everyone,
> > > > >
> > > > > Standard way of tokenizing in solr would divide the text by white
> > space
> > > > in
> > > > > solr.
> > > > >
> > > > > Is there a way by which we can index multi-term phrases like
> "Machine
> > > > > Learning" instead of "Machine", "Learning"?
> > > > > Is it possible to create a specific field type for such phrases
> > which has
> > > > > its own indexing pipeline? I am open to storing n-grams but these
> > n-grams
> > > > > would be across terms and not just one term? In other words, I
> don't
> > want
> > > > > to store n-grams of the term "machine", I want to store n-grams
> for a
> > > > > sentence like below.
> > > > >
> > > > > "I like machine learning" --> "I like", "like machine", "machine
> > > > learning"
> > > > > and so on.
> > > > >
> > > > > It seems like Shingle Filter (
> > > > >
> > > >
> >
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter
> > > > )
> > > > > may be used for this. Is there a better alternative?
> > > > >
> > > > > I want to use this field as an input to Semantic Knowledge Graph.
> The
> > > > > plugin works great for words. But now I want to use it for phrases.
> > Any
> > > > > idea around this would be really helpful.
> > > > >
> > > > > Thanks a lot!
> > > > >
> > > > > - Pratik
> > > > >
> > > >
> >
>

Re: Extracting important multi term phrases from the text

2018-11-20 Thread Pratik Patel

@David Sorry for late reply. The SKG query that I am using is actually
fairly basic in itself.  For example,

{
> "queries":[
> "dataStoreId:\"123\"",
> "text:\"foo\""
> ],
> "compare":[
> {
> "type":"text_shingles",
> "limit":30,
> "discover_values":true
> }
> ]
> }


What I am expecting is that SKG will return words/phrases that are related
to the term "foo". I am filtering the text through StopWordFilter before
that. I have also found that specifying a good foreground can drastically
improve the results.

Good luck!

- Pratik

On Fri, Nov 16, 2018 at 11:15 AM Alexandre Rafalovitch 
wrote:

> Good catch Pratik.
>
> It is in Javadoc, but not in the reference guide:
>
> https://lucene.apache.org/core/6_3_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html
> . I'll try to fix that later (SOLR-12996).
>
> Regards,
>Alex.
> On Fri, 16 Nov 2018 at 10:44, Pratik Patel  wrote:
> >
> > @Markus @Walter,  @Alexandre is right. The culprit was not StopWord
> Filter,
> > it was ShingleFilter. I could not find parameter filterToken in
> > documentation, is it a new addition? BTW, I tried that and it works.
> Thanks!
> > I still ended up using pattern replacement filter because I did not want
> > any single word string in that field.
> >
> > @David I am using SKG through the plugin. So it is a POST request with
> > query in body. I haven't yet upgraded to version 7.5.
> >
> > Thank you all for the help!
> >
> > Regards,
> > Pratik
> >
> > On Fri, Nov 16, 2018 at 8:36 AM David Hastings <
> hastings.recurs...@gmail.com>
> > wrote:
> >
> > > Which function of the SKG are you using?  significantTerms?
> > >
> > > On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > > wrote:
> > >
> > > > I think the underscore actually comes from the Shingles (parameter
> > > > fillerToken). Have you tried setting it to empty string?
> > > >
> > > > Regards,
> > > >Alex.
> > > > On Thu, 15 Nov 2018 at 17:16, Pratik Patel 
> wrote:
> > > > >
> > > > > Hi Markus,
> > > > >
> > > > > Thanks for the reply. I tried using ShingleFilter and it seems to
> > > > > be working. However, I am hitting an issue when it is used with
> > > > > StopWordFilter. StopWordFilter leaves an underscore "_" for removed
> > > words
> > > > > and it kind of screws up the data in index.
> > > > >
> > > > > I tried setting enablePositionIncrements="false" for stop word
> filter
> > > but
> > > > > that parameter only works for lucene version 4.3 or earlier. Looks
> like
> > > > > it's an open issue in lucene
> > > > > https://issues.apache.org/jira/browse/LUCENE-4065
> > > > >
> > > > > For now, I am trying to find a workaround using
> > > > PatternReplaceFilterFactory.
> > > > >
> > > > > Regards,
> > > > > Pratik
> > > > >
> > > > > On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma <
> > > > markus.jel...@openindex.io>
> > > > > wrote:
> > > > >
> > > > > > Hello Pratik,
> > > > > >
> > > > > > We would use ShingleFilter for this indeed. If you only want
> > > > > > bigrams/shingles, don't forget to disable outputUnigrams and set
> both
> > > > > > shinle size limits to 2.
> > > > > >
> > > > > > Regards,
> > > > > > Markus
> > > > > >
> > > > > > -Original message-
> > > > > > > From:Pratik Patel 
> > > > > > > Sent: Thursday 15th November 2018 17:00
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Subject: Extracting important multi term phrases from the text
> > > > > > >
> > > > > > > Hello Everyone,
> > > > > > >
> > > > > > > Standard way of tokenizing in solr would divide the text by
> white
> > > > space
> > > > > > in
> > > > > > > solr.
> > > > > > >
> > > > > > > Is there a way by which we can index multi-term phrases like
> > > "Machine
> > > > > > > Learning" instead of "Machine", "Learning"?
> > > > > > > Is it possible to create a specific field type for such phrases
> > > > which has
> > > > > > > its own indexing pipeline? I am open to storing n-grams but
> these
> > > > n-grams
> > > > > > > would be across terms and not just one term? In other words, I
> > > don't
> > > > want
> > > > > > > to store n-grams of the term "machine", I want to store n-grams
> > > for a
> > > > > > > sentence like below.
> > > > > > >
> > > > > > > "I like machine learning" --> "I like", "like machine",
> "machine
> > > > > > learning"
> > > > > > > and so on.
> > > > > > >
> > > > > > > It seems like Shingle Filter (
> > > > > > >
> > > > > >
> > > >
> > >
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter
> > > > > > )
> > > > > > > may be used for this. Is there a better alternative?
> > > > > > >
> > > > > > > I want to use this field as an input to Semantic Knowledge
> Graph.
> > > The
> > > > > > > plugin works great for words. But now I want to use it for
> phrases.
> > > > Any
> > > > > > > idea around this would be really helpful.
> > > > > > >
> > > > > > > Thanks a lot!
> > > > > > >
> > > > > > > - Pratik
> > > > > > >
> > > > > >
> > > >
> > >
>

Get MLT Interesting Terms for a set of documents corresponding to the query specified

2019-01-20 Thread Pratik Patel

Hi Everyone!

I am trying to use MLT request handler. My query matches more than one
documents but the response always seems to pick up the first document and
interestingTerms also seems to be corresponding to that single document
only.

What I am expecting is that if my query matches multiple documents then the
InterestingTerms handler result also corresponds to that set of documents
and not the first document.

Following is my query,

http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0

Ultimately, my goal is to get interesting terms corresponding to this whole
set of documents. I don't need similar documents as such. If not with mlt,
is there any other way I can achieve this? that is, given a query matching
set of documents, find interestingTerms for that set of documents based on
tf-idf?

Thanks!
Pratik

Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified

2019-01-21 Thread Pratik Patel

Aman,

Thanks for the reply!

I have tried with corrected query but it doesn't solve the problem. also,
my tags filter matches multiple documents, however the interestingTerms
seems to correspond to just the first document.
Here is an example of a query which matches 1900 documents.

http://localhost:8081/solr/collection1/mlt?debugQuery=on&q=tags:voltage&mlt.boost=true&mlt.fl=my_field&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&start=0


Thanks,
Pratik


On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon  wrote:

> I see two rows params, looks like which will be overwritten by rows=2, and
> then your tags filter is resulting only one document. Please remove extra
> rows and try.
>
> On Mon, Jan 21, 2019, 08:44 Pratik Patel 
> > Hi Everyone!
> >
> > I am trying to use MLT request handler. My query matches more than one
> > documents but the response always seems to pick up the first document and
> > interestingTerms also seems to be corresponding to that single document
> > only.
> >
> > What I am expecting is that if my query matches multiple documents then
> the
> > InterestingTerms handler result also corresponds to that set of documents
> > and not the first document.
> >
> > Following is my query,
> >
> >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0
> >
> > Ultimately, my goal is to get interesting terms corresponding to this
> whole
> > set of documents. I don't need similar documents as such. If not with
> mlt,
> > is there any other way I can achieve this? that is, given a query
> matching
> > set of documents, find interestingTerms for that set of documents based
> on
> > tf-idf?
> >
> > Thanks!
> > Pratik
> >
>

Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified

2019-01-22 Thread Pratik Patel

I will certainly try it out. Thanks!


On Mon, Jan 21, 2019 at 8:48 PM Joel Bernstein  wrote:

> You find the significantTerms streaming expressions useful:
>
>
> https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#significantterms
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Jan 21, 2019 at 3:02 PM Pratik Patel  wrote:
>
> > Aman,
> >
> > Thanks for the reply!
> >
> > I have tried with corrected query but it doesn't solve the problem. also,
> > my tags filter matches multiple documents, however the interestingTerms
> > seems to correspond to just the first document.
> > Here is an example of a query which matches 1900 documents.
> >
> >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on&q=tags:voltage&mlt.boost=true&mlt.fl=my_field&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&start=0
> >
> >
> > Thanks,
> > Pratik
> >
> >
> > On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon 
> > wrote:
> >
> > > I see two rows params, looks like which will be overwritten by rows=2,
> > and
> > > then your tags filter is resulting only one document. Please remove
> extra
> > > rows and try.
> > >
> > > On Mon, Jan 21, 2019, 08:44 Pratik Patel  > >
> > > > Hi Everyone!
> > > >
> > > > I am trying to use MLT request handler. My query matches more than
> one
> > > > documents but the response always seems to pick up the first document
> > and
> > > > interestingTerms also seems to be corresponding to that single
> document
> > > > only.
> > > >
> > > > What I am expecting is that if my query matches multiple documents
> then
> > > the
> > > > InterestingTerms handler result also corresponds to that set of
> > documents
> > > > and not the first document.
> > > >
> > > > Following is my query,
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0
> > > >
> > > > Ultimately, my goal is to get interesting terms corresponding to this
> > > whole
> > > > set of documents. I don't need similar documents as such. If not with
> > > mlt,
> > > > is there any other way I can achieve this? that is, given a query
> > > matching
> > > > set of documents, find interestingTerms for that set of documents
> based
> > > on
> > > > tf-idf?
> > > >
> > > > Thanks!
> > > > Pratik
> > > >
> > >
> >
>

Re: Using solr graph to traverse N relationships

2019-03-13 Thread Pratik Patel

Problem #1 can probably be solved by using "fetch" function. (
https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#fetch)

Problem #2 and #3 can be solved by normalizing the graph connections and by
applying cartesianProduct on multi valued field, as described here.
http://lucene.472066.n3.nabble.com/Using-fetch-function-with-streaming-expression-td4324896.html

On Wed, Mar 13, 2019 at 11:20 AM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> Hi,
> I posted this question originally on stack overflow and it was suggested I
> use this mailing list instead so I'm sending it out here also. Here's my
> original link if you want to maybe answer there also. But I also copied the
> question into the body of the email.
>
>
> https://stackoverflow.com/questions/55130208/using-solr-graph-to-traverse-n-relationships
>
> I'm investigating if I can use an existing solr store to do graph
> traversal. It would be ideal to not have to duplicate the data in a graph
> store. I was playing with the solr streaming capabilities and the nodes
> (gatherNodes) source. I have three problems with it and I'm wondering if
> people have found solutions:
> 1) getting the original documents that the nodes references with all of
> their fields. I did eventually solve this by doing an innerJoin on the
> nodes returned by gatherNodes and a query against "*:*" but this seem less
> than ideal. Is there a better way to do this? Even better would be if I
> could do it as an "export" and not a "select" to better handle large
> amounts of data. This problem is small compared to the other two which seem
> like major bugs in Solr
> 2) I can't traverse to nodes from a field that has more than one value. In
> the nodes stream source definition there is a walk parameter.
> nodes(collection,
> search(some search params)
> walk="ref->id",
> gather="vals")
>
> in this example its walking the from the search results, taking the field
> "ref" on those docs and finding all nodes that match that as an id. This
> works until ref becomes a list of values. Has anyone had success making
> this work? A simple example would be a tree structure where you have a
> folder document and it has a multiValue field representing its subfolders
> and files. How would I walk that relationship?
> 3) in that example the gather is returning the nodes that are represented
> by the "vals" field on all the nodes that result from the walk. This also
> does not work if that field is multiValued. Has anyone had any success with
> this also? Again going back to the files and folders example, I want to
> return all the files in the subfolders of the selected folder.
> nodes(collection,
> search(collection, q="path:currentFolder", qt="/select", sort="fileId
> ASC"),
> walk="contents->fileId",
> gather="contents",
> fq="type:file")
>
> I made this up so there may be some typos but the premise is that contents
> are a multiValued string field and every document, either of type "file" or
> "folder" has a fileId, which is what the contents field references. How
> would I accomplish this? Do these fields need to be indexed in a special
> way?
> Something that interesting is I see in the solr documentation it does
> support a multi valued walk but only if its a hard coded value
>
> nodes(emails, walk="john...@apache.org, janesm...@apache.org->from",
> gather="to")
>
> but when using a different stream as the input of the nodes function it
> can't resolve fields that are multivalues. It can't even properly resolve
> text fields that mimic the example above. If I store a field called refs
> with a string value of "ref-1, ref-2, ref-3", the only match will be on an
> id of "ref-1" when walk="refs->id"
>
> Thanks, I'd appreciate any help
>
>

Best Practice about solr cloud schema

2018-02-07 Thread Pratik Patel

Hello all,

I have added some fields to default managed-schema file. I was wondering if
it is safe to take default managed-schema file as is and add your own
fields to it in production. What is the best practice for this? As I
understand, it should be safe to use default schema as base if documents
that are going to be indexed in solr will only have newly defined fields in
it. In fact, it helps because the common field types are already defined in
default schema which can be re-used. I looked through the documentation but
couldn't find the answer and more clarity on this would be helpful.

Is it safe to use default managed-schema file as base add your own fields
to it?

Thanks,
Pratik

Re: Best Practice about solr cloud schema

2018-02-07 Thread Pratik Patel

Hey Eric, thanks for the clarification! What about solrConfig.xml file?
Sure, it should be customized to suit one's needs but can it be used as a
base or is it best to create one from scratch ?

Thanks,
Pratik

On Wed, Feb 7, 2018 at 5:29 PM, Erick Erickson 
wrote:

> That's really the point of the default managed-schema, to be a base
> you use for your customizations. In fact, I often _remove_ most of the
> fields (and especially fieldTypes) that I don't need. This includes
> dynamic fields, copyFields and the like.
>
> Sometimes it's actually easier, though, to just start all over.
>
> BTW, do not delete any field that begins and ends with an underscore,
> e.g. _version_ unless you know exactly what the consequences are
>
> Best,
> Erick
>
> On Wed, Feb 7, 2018 at 2:59 PM, Pratik Patel  wrote:
> > Hello all,
> >
> > I have added some fields to default managed-schema file. I was wondering
> if
> > it is safe to take default managed-schema file as is and add your own
> > fields to it in production. What is the best practice for this? As I
> > understand, it should be safe to use default schema as base if documents
> > that are going to be indexed in solr will only have newly defined fields
> in
> > it. In fact, it helps because the common field types are already defined
> in
> > default schema which can be re-used. I looked through the documentation
> but
> > couldn't find the answer and more clarity on this would be helpful.
> >
> > Is it safe to use default managed-schema file as base add your own fields
> > to it?
> >
> > Thanks,
> > Pratik
>

Re: Best Practice about solr cloud schema

2018-02-08 Thread Pratik Patel

That makes it clear. Thanks a lot for your help.

Pratik

On Feb 7, 2018 10:33 PM, "Erick Erickson"  wrote:

> It can pretty much be used as-is, _except_
>
> you'll find one or more entries in your request handlers like:
> _text_
>
> Change "_text_" to something in your schema, that's the default search
> field if you don't field-qualify your search terms.
>
> Note that if you take out, for instance, all of your non-english
> fieldTypes, you can also remove most of the stuff under the /lang
> folder.
>
> I essentially always test this out on a local, stand-alone instance
> until I can index a few documents and query them, it's faster than
> always having to remember to move them to ZooKeeper
>
> Best,
> Erick
>
> On Wed, Feb 7, 2018 at 7:14 PM, Pratik Patel  wrote:
> > Hey Eric, thanks for the clarification! What about solrConfig.xml file?
> > Sure, it should be customized to suit one's needs but can it be used as a
> > base or is it best to create one from scratch ?
> >
> > Thanks,
> > Pratik
> >
> > On Wed, Feb 7, 2018 at 5:29 PM, Erick Erickson 
> > wrote:
> >
> >> That's really the point of the default managed-schema, to be a base
> >> you use for your customizations. In fact, I often _remove_ most of the
> >> fields (and especially fieldTypes) that I don't need. This includes
> >> dynamic fields, copyFields and the like.
> >>
> >> Sometimes it's actually easier, though, to just start all over.
> >>
> >> BTW, do not delete any field that begins and ends with an underscore,
> >> e.g. _version_ unless you know exactly what the consequences are
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Feb 7, 2018 at 2:59 PM, Pratik Patel 
> wrote:
> >> > Hello all,
> >> >
> >> > I have added some fields to default managed-schema file. I was
> wondering
> >> if
> >> > it is safe to take default managed-schema file as is and add your own
> >> > fields to it in production. What is the best practice for this? As I
> >> > understand, it should be safe to use default schema as base if
> documents
> >> > that are going to be indexed in solr will only have newly defined
> fields
> >> in
> >> > it. In fact, it helps because the common field types are already
> defined
> >> in
> >> > default schema which can be re-used. I looked through the
> documentation
> >> but
> >> > couldn't find the answer and more clarity on this would be helpful.
> >> >
> >> > Is it safe to use default managed-schema file as base add your own
> fields
> >> > to it?
> >> >
> >> > Thanks,
> >> > Pratik
> >>
>

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel

I had a similar issue with index size after upgrading to version 6.4.1 from
5.x. The issue for me was that the field which caused index size to be
increased disproportionately had a field type("text_general") for which
default value of omitNorms was not true. Turning it on explicitly on field
fixed the problem. Following is the link to my related question.  You can
verify value of omitNorms for your fields to check whether this is
applicable in your case or not.
http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size

On Tue, Feb 13, 2018 at 8:48 PM, Howe, David 
wrote:

>
> I have set docValues=false on all of the string fields in our index that
> have indexed=false and stored=true.  This gave a small improvement in the
> index size from 13.3GB to 12.82GB.
>
> I have also tried running an optimize, which then reduced the index to
> 12.6GB.
>
> Next step is to dump the sizes of the Solr index files for the index
> version that is the correct size and the version that has the large size.
>
> Regards,
>
> David
>
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> -Original Message-
> From: Howe, David [mailto:david.h...@auspost.com.au]
> Sent: Wednesday, 14 February 2018 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Index size increases disproportionately to size of added
> field when indexed=false
>
>
> Thanks Hoss.  I will try setting docValues to false, as we only ever want
> to be able to retrieve the value of this field.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
>

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel

You are right, in my case this field type was applied to many text fields.
These includes many copy fields and dynamic fields as well. In my case,
only specifying omitNorms=true for field type "text_general" fixed the
issue. I didn't do anything else or had any other bug.

On Wed, Feb 14, 2018 at 1:01 PM, Alessandro Benedetti 
wrote:

> Hi pratik,
> how is it possible that just the norms for a single field were causing such
> a massive index size increment in your case ?
>
> In your case I think it was for a field type used by multiple fields, but
> it's still suspicious in my opinions,
> norms should be that big.
> If I remember correctly in old versions of Solr before the drop of index
> time boost, norms were containing both an approximation of the length of
> the
> field + index time boost.
> From your mailing list problem you moved from 10 Gb to 300 Gb.
> It can't be just the norms, are you sure you didn't face some bug ?
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Pratik Patel

@Alessandro I will see if I can reproduce the same issue just by turning
off omitNorms on field type. I'll open another mail thread if required.
Thanks.

On Thu, Feb 15, 2018 at 6:12 AM, Howe, David 
wrote:

>
> Hi Alessandro,
>
> Some interesting testing today that seems to have gotten me closer to what
> the issue is.  When I run the version of the index that is working
> correctly against my database table that has the extra field in it, the
> index suddenly increases in size.  This is even though the data importer is
> running the same SELECT as before (which doesn't include the extra column)
> and loads the same number of rows.
>
> After scratching my head for a bit and browsing through both versions of
> the table I am loading from (with and without the extra field), I noticed
> that the natural ordering of the tables is different.  These tables are
> "staging" tables that I populate with another set of queries and inserts to
> get the data into a format that is easy to ingest into Solr.  When I add
> the extra field to these queries, it changes the Oracle query plan as the
> field is contained in a different table that I need to join to.  As I don't
> specify an "ORDER BY" on the query (as I didn't think it would make a
> difference and would slow the query down), Oracle is free to chose how it
> orders the result set.  Adding the extra field changes that natural
> ordering, which affects the order things go into my staging table.  As I
> don't specify an "ORDER BY" when I select things out of the staging table,
> my data in the scenario that is working is being loaded in a different
> order to the scenario which doesn't work.
>
> I am currently running full loads to verify this under each scenario, as I
> have now forced the data in the scenario that doesn't work to be in the
> same order as the scenario that does.  Will see how this load goes
> overnight.
>
> This leads to the question of what difference does it make to Solr what
> order I load the data in?
>
> I also noticed that the .cfs file is quite large in the second scenario,
> even though this is supposed to be disabled by default in Solr.  I checked
> my Solr config and there is no override of the default.
>
> In answer to your questions:
>
> 1) same number of documents - YES ~14,000,000 documents
> 2) identical documents ( + 1 new field each not indexed) - YES, the second
> scenario has one extra field that is stored but not indexed
> 3) same number of deleted documents - YES, there are zero deleted
> documents in both scenarios
> 4) they both were born from scratch ( an empty index) - YES, both start
> from a brand new virtual server with a brand new installation of Solr
>
> I am using the default auto commit, which I think is 15000.
>
> Thanks again for your assistance.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
>

Re: Getting more documents from resultsSet

2018-05-18 Thread Pratik Patel

Using cursor marker might help as explained in this documentation
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

On Fri, May 18, 2018 at 4:13 PM, Deepak Goel  wrote:

> I wonder if in-memory-filesystem would help...
>
> On Sat, 19 May 2018, 01:03 Erick Erickson, 
> wrote:
>
> > If you only return fields that are docValue=true that'll largely
> > eliminate the disk seeks. 30 seconds does seem kind of excessive even
> > with disk seeks though.
> >
> > Here'r a reference:
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html
> >
> > Whenever I see anything like "...our business requirement is...", I
> > cringe. _Why_ is that a requirement? What is being done _for the user_
> > that requires 2000 documents? There may be legitimate reasons, but
> > there also may be better ways to get what you need. This may very well
> > be an XY problem.
> >
> > For instance, if you want to take the top 2,000 docs from query X and
> > score just those, see:
> > https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html,
> > specifically: ReRankQParserPlugin.
> >
> > Best,
> > Erick
> >
> > On Fri, May 18, 2018 at 11:09 AM, root23  wrote:
> > > Hi all,
> > > I am working on Solr 6. Our business requirement is that we need to
> > return
> > > 2000 docs for every query we execute.
> > > Now normally if i execute the same set to query with start=0 to
> rows=10.
> > It
> > > returns very fast(event for our most complex queries in like less then
> 3
> > > seconds).
> > > however the moment i add start=0 to rows =2000, the response time is
> > like 30
> > > seconds or so.
> > >
> > > I understand that solr has to do probably disk seek to get the
> documents
> > > which might be the bottle neck in this case.
> > >
> > > Is there a way i can optimize around this knowingly that i might have
> to
> > get
> > > 2000 results in one go and then might have to paginate also further and
> > > showing 2000 results on each page. We could go to as much as 50 page.
> > >
> > >
> > >
> > > --
> > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>

Applying streaming expression as a filter in graph traversal expression (gatherNodes)

2018-06-20 Thread Pratik Patel

We can limit the scope of graph traversal by applying some filter along the
way as follows.

gatherNodes(emails,
walk="john...@apache.org->from",
fq="body:(solr rocks)",
gather="to")


Is it possible to replace "body:(solr rocks)" by some streaming expression
like "search" function for example? Like as follows..

gatherNodes(emails,
walk="john...@apache.org->from",
fq="search(...)", // use streaming expression as filter
gather="to")



In my case, it would improve performance significantly if one can do that.
Other approach I can think of is to save results of "search" streaming
expression in some variable in pipeline and then use it at multiple places
including "fq" clause of "gatherNodes". Is it possible to do something like
this?

Re: Applying streaming expression as a filter in graph traversal expression (gatherNodes)

2018-06-22 Thread Pratik Patel

Hi Joel,

Thanks for the reply!

I have indexed graph data in solr where an "event" can have one or more
"participants". Thus, it's a graph of "participants" connected to each
other via "events". Because participants are multiple, I am indexing the
graph as follows.

event--event_participant_child--participant

Now my end goal is this, I have a list of "events" and for that list I want
to plot a graph of "participants" by connecting them via events (which have
to be from the original list). I get this list of "events" from a search()
function which I use as my seed expression for gatherNodes().

I am doing a two hop graph traversal as follows.

having(
having(
gatherNodes(
collection1,
having(
   gatherNodes(
collection1,
search(.), // gets list of events with each
node having "eventId"
walk=eventId->eventId,   // walk to event_participant_child
document which has both "eventId" and "participantId"
gather="participantId",
trackTraversal="true", scatter="leaves",
count(*)
),
gt(count(*),0)
),
walk=node->participantId,
gather="eventId",
fq=(),
  // limit traversal to original list of events by
using search() here??
trackTraversal="true", scatter="branches",
count(*)
),
eq(level,0)
),
gt(count(*),1)
)

I am able to get the graph I want from ancestors fields of nodes which are
at level 0. Essentially, these are the events from my original list. Using
"having()" function, I am able to limit the response so that it only
includes original events. But it would be a great improvement if I can also
limit the traversal so that only events from original list are visited at
second hop. That is why, I want to apply original search() function as a
filter in outer gatherNodes() function. I know it's a long shot but
considering the potential improvement in performance, I was curious. Please
let me know if you feel there is a better approach.

Thanks
- Pratik

On Thu, Jun 21, 2018 at 7:05 PM, Joel Bernstein  wrote:

> Currently the gatherNodes expression can only be filtered by a traditional
> filter query. I'm curious about the type of expression you are thinking of
> filtering by?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jun 20, 2018 at 1:54 PM, Pratik Patel  wrote:
>
> > We can limit the scope of graph traversal by applying some filter along
> the
> > way as follows.
> >
> > gatherNodes(emails,
> > walk="john...@apache.org->from",
> > fq="body:(solr rocks)",
> > gather="to")
> >
> >
> > Is it possible to replace "body:(solr rocks)" by some streaming
> expression
> > like "search" function for example? Like as follows..
> >
> > gatherNodes(emails,
> > walk="john...@apache.org->from",
> > fq="search(...)", // use streaming expression as filter
> > gather="to")
> >
> >
> >
> > In my case, it would improve performance significantly if one can do
> that.
> > Other approach I can think of is to save results of "search" streaming
> > expression in some variable in pipeline and then use it at multiple
> places
> > including "fq" clause of "gatherNodes". Is it possible to do something
> like
> > this?
> >
>

Java library for building Streaming Expressions

2018-06-27 Thread Pratik Patel

Hello Everyone,

Is there any java library for building Streaming Expressions? Currently, I
am using solr's java client and building Streaming Expressions as follows.

StreamFactory factory = new StreamFactory().withCollectionZkHost( collName,
zkHost )
.withFunctionName("gatherNodes",
GatherNodesStream.class)
.withFunctionName("search", CloudSolrStream.class)
.withFunctionName("count", CountMetric.class)
.withFunctionName("having", HavingStream.class)
.withFunctionName("gt", GreaterThanOperation.class)
.withFunctionName("eq", EqualsOperation.class);
HavingStream cs = (HavingStream) factory.constructStream(
 );

In this approach, I still have to build  streaming_expression_str in code.
Is there any better approach for this or is there any java library to do
this? My search for it didn't yield anything so I was wondering if anyone
here has an idea.

Thanks,
Pratik

String concatenation in Streaming Expressions

2018-06-27 Thread Pratik Patel

Hello,

Is there a function which can be used in Streaming Expressions to
concatenate two strings? I want to use it just like add(1,2) in a Streaming
Expression. Essentially, I want to achieve something as follows.

select(
search(..),
conceptid as foo,
storeid as bar
concat(foo,bar) as id
)

I can use merge() function but my streaming expression is quite complex and
that will make it even more complex as that would be a round about way of
doing it. Any idea how this can be achieved?

Thanks,
Pratik

Re: String concatenation in Streaming Expressions

2018-06-27 Thread Pratik Patel

Thanks a lot for help!

Looks like this is a recent addition? It doesn't work for me in version
6.6.4



On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly 
wrote:

> So it will become:
> select(
> search(..),
> conceptid as foo,
>storeid as bar
>   append(conceptid, storeid) as id
> )
>
> Or
> select
> select(
> search(..),
> conceptid as foo,
>storeid as bar
> ),
> foo,
> bar,
> append(foo,bar) as id
> )
>
> > On Jun 27, 2018, at 1:12 PM, Aroop Ganguly 
> wrote:
> >
> > this test case here will help in understanding the usage:
> > https://github.com/apache/lucene-solr/blob/branch_7_2/
> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/
> AppendEvaluatorTest.java <https://github.com/apache/
> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/
> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
> >
> >> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly 
> wrote:
> >>
> >> I think u can use the append evaluator
> >> https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java <
> https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java>
> >>
> >>
> >>> On Jun 27, 2018, at 12:58 PM, Pratik Patel 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Is there a function which can be used in Streaming Expressions to
> >>> concatenate two strings? I want to use it just like add(1,2) in a
> Streaming
> >>> Expression. Essentially, I want to achieve something as follows.
> >>>
> >>> select(
> >>> search(..),
> >>> conceptid as foo,
> >>>  storeid as bar
> >>>  concat(foo,bar) as id
> >>> )
> >>>
> >>> I can use merge() function but my streaming expression is quite
> complex and
> >>> that will make it even more complex as that would be a round about way
> of
> >>> doing it. Any idea how this can be achieved?
> >>>
> >>> Thanks,
> >>> Pratik
> >>
> >
>
>

Re: String concatenation in Streaming Expressions

2018-06-27 Thread Pratik Patel

Thanks Aroop,

I tired following Streaming Expression but it doesn't work for me.

select(
search(collection1,q="*:*",fl="conceptid",sort="conceptid
asc",fq=storeid:"59c03d21d997b97bf47b3eeb",fq=schematype:"Article",fq=tags:"genetics",
qt="/export"),
conceptid as conceptid,
storeid as "test_",
concat([conceptid,storeid], conceptid, "-")
)

It generates an exception,  "Invalid expression
concat([conceptid,storeid],conceptid,\"-\") - unknown operands found"

Is this correct syntax?

On Wed, Jun 27, 2018 at 4:30 PM, Aroop Ganguly 
wrote:

> It seems like append is not available on 6.4, but concat is …
> Check this out on the 6.4 branch:
> https://github.com/apache/lucene-solr/blob/branch_6_4/
> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/ops/
> ConcatOperationTest.java <https://github.com/apache/
> lucene-solr/blob/branch_6_4/solr/solrj/src/test/org/
> apache/solr/client/solrj/io/stream/ops/ConcatOperationTest.java>
>
>
> > On Jun 27, 2018, at 1:27 PM, Aroop Ganguly 
> wrote:
> >
> > It should, but 6.6.* has some issues of things not working per
> documentation.
> > Try using 7+.
> >
> >> On Jun 27, 2018, at 1:24 PM, Pratik Patel  wrote:
> >>
> >> Thanks a lot for help!
> >>
> >> Looks like this is a recent addition? It doesn't work for me in version
> >> 6.6.4
> >>
> >>
> >>
> >> On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly  >
> >> wrote:
> >>
> >>> So it will become:
> >>> select(
> >>> search(..),
> >>> conceptid as foo,
> >>>  storeid as bar
> >>> append(conceptid, storeid) as id
> >>> )
> >>>
> >>> Or
> >>> select
> >>> select(
> >>> search(..),
> >>> conceptid as foo,
> >>>  storeid as bar
> >>> ),
> >>> foo,
> >>> bar,
> >>> append(foo,bar) as id
> >>> )
> >>>
> >>>> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly 
> >>> wrote:
> >>>>
> >>>> this test case here will help in understanding the usage:
> >>>> https://github.com/apache/lucene-solr/blob/branch_7_2/
> >>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/
> >>> AppendEvaluatorTest.java <https://github.com/apache/
> >>> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/
> >>> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java>
> >>>>
> >>>>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly 
> >>> wrote:
> >>>>>
> >>>>> I think u can use the append evaluator
> >>>>> https://github.com/apache/lucene-solr/blob/master/solr/
> >>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java
> <
> >>> https://github.com/apache/lucene-solr/blob/master/solr/
> >>> solrj/src/java/org/apache/solr/client/solrj/io/eval/
> AppendEvaluator.java>
> >>>>>
> >>>>>
> >>>>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel 
> >>> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> Is there a function which can be used in Streaming Expressions to
> >>>>>> concatenate two strings? I want to use it just like add(1,2) in a
> >>> Streaming
> >>>>>> Expression. Essentially, I want to achieve something as follows.
> >>>>>>
> >>>>>> select(
> >>>>>> search(..),
> >>>>>> conceptid as foo,
> >>>>>>storeid as bar
> >>>>>>concat(foo,bar) as id
> >>>>>> )
> >>>>>>
> >>>>>> I can use merge() function but my streaming expression is quite
> >>> complex and
> >>>>>> that will make it even more complex as that would be a round about
> way
> >>> of
> >>>>>> doing it. Any idea how this can be achieved?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Pratik
> >>>>>
> >>>>
> >>>
> >>>
> >
>
>

Re: Bug in scoreNodes function of streaming expressions?

2020-01-28 Thread pratik@semandex

Joel Bernstein wrote
> Ok, that sounds like a bug. I can create a ticket for this.
> 
> On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel <

> pratik@

> > wrote:
> 
>> I think the problem was that my streaming expression was always returning
>> just one node. When I added more data so that I can have more than one
>> node, I started seeing the result.
>>
>> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel <

> pratik@

> > wrote:
>>
>>> Hello Everyone,
>>>
>>> I am trying to execute following streaming expression with "scoreNodes"
>>> function in it. This is taken from the documentation.
>>>
>>> scoreNodes(top(n="50",
>>>sort="count(*) desc",
>>>nodes(baskets,
>>>  random(baskets, q="productID:ABC",
>>> fl="basketID", rows="500"),
>>>  walk="basketID->basketID",
>>>  fq="-productID:ABC",
>>>  gather="productID",
>>>  count(*
>>>
>>> I have ensured that I have the collection and data present for it.
>>> Upon executing this, I am getting an error message as follows.
>>>
>>> "No collection param specified on request and no default collection has
>>> been set: []"
>>>
>>> Upon digging into the source code I found that there is a possible bug
>>> in
>>> ScoreNodesStream.java
>>>
>>> StringBuilder instance is never appended any string and the block which
>>> initializes collection, needs the length of that instance to be more
>>> than
>>> zero. This condition will always be false and hence the collection will
>>> never be set.
>>>
>>> I checked this file in solr version 8.1 and that also has the same
>>> issue.
>>> Is there any JIRA open for this or any patch available?
>>>
>>> [image: image.png]
>>>
>>> Thanks,
>>> Pratik
>>>
>>


Hi Joel,

You mentioned creating a ticket for this bug, I can't find any, was it
created? If not then I can create one. Currently, ScoreNodes has two issues.

1. It fails when result has only one node.
2. It triggers a GET request instead of POST. GET fails if number of nodes
is large.

I have been using a custom class as workaround for #2, it would be good to
use the original SolrJ class.

Thanks,
Pratik



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Pratik Patel

Thanks a lot. I will update the ticket with more details if appropriate.

Pratik

On Wed, Jan 29, 2020 at 10:07 AM Joel Bernstein  wrote:

> Here is the ticket:
> https://issues.apache.org/jira/browse/SOLR-14231
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 29, 2020 at 10:03 AM Joel Bernstein 
> wrote:
>
> > Hi Pratik,
> >
> > I'll create the ticket now and report back. If you've got a fix please
> > post it to the ticket and I'll try to get this in for the next release.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
> > wrote:
> >
> >> Joel Bernstein wrote
> >> > Ok, that sounds like a bug. I can create a ticket for this.
> >> >
> >> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel <
> >>
> >> > pratik@
> >>
> >> > > wrote:
> >> >
> >> >> I think the problem was that my streaming expression was always
> >> returning
> >> >> just one node. When I added more data so that I can have more than
> one
> >> >> node, I started seeing the result.
> >> >>
> >> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel <
> >>
> >> > pratik@
> >>
> >> > > wrote:
> >> >>
> >> >>> Hello Everyone,
> >> >>>
> >> >>> I am trying to execute following streaming expression with
> >> "scoreNodes"
> >> >>> function in it. This is taken from the documentation.
> >> >>>
> >> >>> scoreNodes(top(n="50",
> >> >>>sort="count(*) desc",
> >> >>>nodes(baskets,
> >> >>>  random(baskets, q="productID:ABC",
> >> >>> fl="basketID", rows="500"),
> >> >>>  walk="basketID->basketID",
> >> >>>  fq="-productID:ABC",
> >> >>>  gather="productID",
> >> >>>  count(*
> >> >>>
> >> >>> I have ensured that I have the collection and data present for it.
> >> >>> Upon executing this, I am getting an error message as follows.
> >> >>>
> >> >>> "No collection param specified on request and no default collection
> >> has
> >> >>> been set: []"
> >> >>>
> >> >>> Upon digging into the source code I found that there is a possible
> bug
> >> >>> in
> >> >>> ScoreNodesStream.java
> >> >>>
> >> >>> StringBuilder instance is never appended any string and the block
> >> which
> >> >>> initializes collection, needs the length of that instance to be more
> >> >>> than
> >> >>> zero. This condition will always be false and hence the collection
> >> will
> >> >>> never be set.
> >> >>>
> >> >>> I checked this file in solr version 8.1 and that also has the same
> >> >>> issue.
> >> >>> Is there any JIRA open for this or any patch available?
> >> >>>
> >> >>> [image: image.png]
> >> >>>
> >> >>> Thanks,
> >> >>> Pratik
> >> >>>
> >> >>
> >>
> >>
> >> Hi Joel,
> >>
> >> You mentioned creating a ticket for this bug, I can't find any, was it
> >> created? If not then I can create one. Currently, ScoreNodes has two
> >> issues.
> >>
> >> 1. It fails when result has only one node.
> >> 2. It triggers a GET request instead of POST. GET fails if number of
> nodes
> >> is large.
> >>
> >> I have been using a custom class as workaround for #2, it would be good
> to
> >> use the original SolrJ class.
> >>
> >> Thanks,
> >> Pratik
> >>
> >>
> >>
> >> --
> >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> >
>

Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-07 Thread Pratik Patel

Hello Everyone,

Let's say I have an analyzer which has following token stream as an output.

*token stream : [], a, ab, [], c, [], d, de, def .*

Now let's say I want to add another filter which will drop a certain tokens
based on whether adjacent token on the right side is [] or some string.

for a given token,
 drop/replace it by empty string it if there is a non-empty string
token on its right and
 keep it if there is an empty token string on its right.

based on this, the resulting token stream would be like this.

*desired output stream : [], [a], ab, [], c, [], d,
de, def *


*Is there any Filter available in solr with which this can be achieved?*
*If writing a custom filter is the only possible option then I want to know
whether its possible to access adjacent tokens in the custom filter?*

*Any idea about this would be really helpful.*

Thanks,
Pratik

Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-10 Thread Pratik Patel

Thanks for the reply Emir.

I will be exploring the option of creating a custom filter. It's good to
know that we can consume more than one tokens from previous filter and emit
different number of tokens. Do you know of any existing filter in Solr
which does something similar? It would be greatly helpful to see how more
than one tokens can be consumed. I can implement my custom logic once I
have access to multiple tokens from previous filter.

Thanks
Pratik

On Mon, Feb 10, 2020 at 2:47 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Pratik,
> You might be able to do some of required things using
> PatternReplaceChartFilter, but as you can see it does not operate on tokens
> level but input string. Your best bet is custom token filter. Not sure how
> familiar you are with how token filters work, but you have access to tokens
> from previous filter and you can implement any logic you want: you consume
> three tokens and emit tokens based on adjacent tokens.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 Feb 2020, at 19:27, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > Let's say I have an analyzer which has following token stream as an
> output.
> >
> > *token stream : [], a, ab, [], c, [], d, de, def .*
> >
> > Now let's say I want to add another filter which will drop a certain
> tokens
> > based on whether adjacent token on the right side is [] or some string.
> >
> > for a given token,
> > drop/replace it by empty string it if there is a non-empty string
> > token on its right and
> > keep it if there is an empty token string on its right.
> >
> > based on this, the resulting token stream would be like this.
> >
> > *desired output stream : [], [a], ab, [], c, [], d,
> > de, def *
> >
> >
> > *Is there any Filter available in solr with which this can be achieved?*
> > *If writing a custom filter is the only possible option then I want to
> know
> > whether its possible to access adjacent tokens in the custom filter?*
> >
> > *Any idea about this would be really helpful.*
> >
> > Thanks,
> > Pratik
>
>

NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-16 Thread Pratik Patel

Hello Everyone,

I am trying to update a field of a child document using atomic updates
feature. I am using solr and solrJ version 8.5.0

I have ensured that my schema satisfies the conditions for atomic updates
and I am able to do atomic updates on normal documents but with nested
child documents, I am getting a Null Pointer Exception. Following is the
simple test which I am trying.

TestPojo  pojo1  = new TestPojo().cId( "abcd" )
>  .conceptid( "c1" )
>  .storeid( storeId )
>  .testChildPojos(
> Collections.list( testChildPOJO, testChildPOJO2,
>  testChildPOJO3 )
> );
> TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> "c1_child1" )
>   .conceptid( "c1" )
>   .storeid( storeId )
>   .fieldName(
> "c1_child1_field_value1" )
>   .startTime(
> Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
>   .integerField_iDF(
> 10 )
>
> .booleanField_bDF(true);
> // index pojo1 with child testChildPOJO
> SolrInputDocument sdoc = new SolrInputDocument();
> sdoc.addField( "_route_", pojo1.cId() );
> sdoc.addField( "id", testChildPOJO.cId() );
> sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> sdoc.addField( "storeid", testChildPOJO.cId() );
> sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field
> "fieldName"
> collection.client.add( sdoc );   // results in NPE!


Stack Trace:

ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
> collection [collectionTest2] failed due to (500)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at
> http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
> java.lang.NullPointerException
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
> at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:127)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)
> at org.apache.solr.s

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-16 Thread Pratik Patel

Looking at some other unit tests in repo, I tried an approach using
UpdateRequest as follows.

SolrInputDocument sdoc = new SolrInputDocument(  );
> sdoc.addField( "id", testChildPOJO.id() );
> sdoc.setField( "fieldName",
> java.util.Collections.singletonMap("set", testChildPOJO.fieldName() +
> postfix) );
> final UpdateRequest req = new UpdateRequest();
> req.withRoute( pojo1.id() );
> req.add(sdoc);
>
> collection.client.request( req, collection.getCollectionName()
> );
> req.commit( collection.client, collection.getCollectionName());


But this also results in the SAME Null Pointer Exception.

Looking at the source code, it looks like "fieldPath" is null below.



>  AtomicUpdateDocumentMerger.getFieldFromHierarchy(SolrInputDocument
> completeHierarchy, String fieldPath) {
> final List docPaths =
> StrUtils.splitSmart(fieldPath.substring(1), '/');
> ..
>}


Any idea what's wrong here?

Thanks

On Wed, Sep 16, 2020 at 1:27 PM Pratik Patel  wrote:

> Hello Everyone,
>
> I am trying to update a field of a child document using atomic updates
> feature. I am using solr and solrJ version 8.5.0
>
> I have ensured that my schema satisfies the conditions for atomic updates
> and I am able to do atomic updates on normal documents but with nested
> child documents, I am getting a Null Pointer Exception. Following is the
> simple test which I am trying.
>
> TestPojo  pojo1  = new TestPojo().cId( "abcd" )
>>  .conceptid( "c1" )
>>  .storeid( storeId )
>>  .testChildPojos(
>> Collections.list( testChildPOJO, testChildPOJO2,
>>  testChildPOJO3 )
>> );
>> TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
>> "c1_child1" )
>>   .conceptid( "c1" )
>>   .storeid( storeId )
>>   .fieldName(
>> "c1_child1_field_value1" )
>>   .startTime(
>> Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
>>   .integerField_iDF(
>> 10 )
>>
>> .booleanField_bDF(true);
>> // index pojo1 with child testChildPOJO
>> SolrInputDocument sdoc = new SolrInputDocument();
>> sdoc.addField( "_route_", pojo1.cId() );
>> sdoc.addField( "id", testChildPOJO.cId() );
>> sdoc.addField( "conceptid", testChildPOJO.conceptid() );
>> sdoc.addField( "storeid", testChildPOJO.cId() );
>> sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
>> Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field
>> "fieldName"
>> collection.client.add( sdoc );   // results in NPE!
>
>
> Stack Trace:
>
> ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
>> collection [collectionTest2] failed due to (500)
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
>> from server at
>> http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
>> java.lang.NullPointerException
>> at
>> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
>> at
>> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
>> at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
>> at
>> org.apache.solr.update.processor.DistributedZkUp

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread pratik@semandex

Following are the approaches I have tried so far and both results in NPE.



*approach 1

TestChildPOJO  testChildPOJO = new TestChildPOJO().cId( "c1_child1" )
  .conceptid( "c1" )
  .storeid( storeId )
  .fieldName(
"c1_child1_field_value1" )
  .startTime( Date.from(
now.minus( 10, ChronoUnit.DAYS ) ) )
  .integerField_iDF( 10
)
 
.booleanField_bDF(true);


TestPojo  pojo1  = new TestPojo().cId( "abcd" )
 .conceptid( "c1" )
 .storeid( storeId )
 .testChildPojos(
Collections.list( testChildPOJO, testChildPOJO2, testChildPOJO3 ) );

 

// index pojo1 with child testChildPOJO

SolrInputDocument sdoc = new SolrInputDocument();
sdoc.addField( "_route_", pojo1.cId() );
sdoc.addField( "id", testChildPOJO.cId() );
sdoc.addField( "conceptid", testChildPOJO.conceptid() );
sdoc.addField( "storeid", testChildPOJO.cId() );
sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
Collections.list(testChildPOJO.fieldName() + postfix) ) );  // modify field
"fieldName"

collection.client.add( sdoc );  
// results in NPE!

*approach 1


*approach 2

SolrInputDocument sdoc = new SolrInputDocument(  );
sdoc.addField( "id", testChildPOJO.id() );
sdoc.setField( "fieldName",
java.util.Collections.singletonMap("set", testChildPOJO.fieldName() +
postfix) );
final UpdateRequest req = new UpdateRequest();
req.withRoute( pojo1.id() );
req.add(sdoc);
   
collection.client.request( req, collection.getCollectionName()
);
req.commit( collection.client, collection.getCollectionName());


*approach 2




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Pratik Patel

Thanks for your reply Alexandre.

I have "_root_" and "_nest_path_" fields in my schema but not
"_nest_parent_".








I ran my test after adding the "_nest_parent_" field and I am not getting
NPE any more which is good. Thanks!

But looking at the documents in the index, I see that after the atomic
update, now there are two children documents with the same id. One document
has old values and another one has new values. Shouldn't they be merged
based on the "id"? Do we need to specify anything else in the request to
ensure that documents are merged/updated and not duplicated?

For your reference, below is the test I am running now.

// update field of one child doc
SolrInputDocument sdoc = new SolrInputDocument(  );
sdoc.addField( "id", testChildPOJO.id() );
sdoc.addField( "conceptid", testChildPOJO.conceptid() );
sdoc.addField( "storeid", "foo" );
sdoc.setField( "fieldName",
java.util.Collections.singletonMap("set", Collections.list("bar" ) ));

final UpdateRequest req = new UpdateRequest();
req.withRoute( pojo1.id() );// parent id
req.add(sdoc);

collection.client.request( req, collection.getCollectionName()
);
collection.client.commit();


Resulting documents :

{id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1,
startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
{id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep
07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
_root_=abcd, _version_=1678099970405695488}






On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch 
wrote:

> Can you double-check your schema to see if you have all the fields
> required to support nested documents. You are supposed to get away
> with just _root_, but really you should also include _nest_path and
> _nest_parent_. Your particular exception seems to be triggering
> something (maybe a bug) related to - possibly - missing _nest_path_
> field.
>
> See:
> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
>
> Regards,
>Alex.
>
> On Wed, 16 Sep 2020 at 13:28, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > I am trying to update a field of a child document using atomic updates
> > feature. I am using solr and solrJ version 8.5.0
> >
> > I have ensured that my schema satisfies the conditions for atomic updates
> > and I am able to do atomic updates on normal documents but with nested
> > child documents, I am getting a Null Pointer Exception. Following is the
> > simple test which I am trying.
> >
> > TestPojo  pojo1  = new TestPojo().cId( "abcd" )
> > >  .conceptid( "c1" )
> > >  .storeid( storeId )
> > >  .testChildPojos(
> > > Collections.list( testChildPOJO, testChildPOJO2,
> > >
> testChildPOJO3 )
> > > );
> > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> > > "c1_child1" )
> > >   .conceptid( "c1"
> )
> > >   .storeid(
> storeId )
> > >   .fieldName(
> > > "c1_child1_field_value1" )
> > >   .startTime(
> > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
> > >
>  .integerField_iDF(
> > > 10 )
> > >
> > > .booleanField_bDF(true);
> > > // index pojo1 with child testChildPOJO
> > > SolrInputDocument sdoc = new SolrInputDocument();
> > > sdoc.addField( "_route_", pojo1.cId() );
> > > sdoc.addField( "id", testChildPOJO.cId() );
> > > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > > sdoc.addField( "storeid", testChildPOJO.cId() );
> > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify
> field
> > > "fieldName"
> > > collection.client.add( sdoc );   // results in NPE!
> >
> >
> > Stack Trace:
> >
> > ERROR org.apache.solr.client.solrj.impl.Base

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Pratik Patel

I am running this in a unit test which deletes the collection after the
test is over. So every new test run gets a fresh collection.

It is a very simple test where I am first indexing a couple of parent
documents with few children and then testing an atomic update on one parent
as I have posted in my previous message. (using UpdateRequest)

I am not sure if I am triggering the atomic update correctly, do you see
any potential issue in that code?

I noticed something in the documentation here.
https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents

 

field_type is declared with name *"_nest_path_"* whereas field is declared
with type *"nest_path". *

Is this intentional? or should it be as follows?

 

Also, should we explicitly set index=true and store=true on _nest_path_
and _nest_parent_ fields?



On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch 
wrote:

> Did you reindex the original document after you added a new field? If
> not, then the previously indexed content is missing it and your code
> paths will get out of sync.
>
> Regards,
>Alex.
> P.s. I haven't done what you are doing before, so there may be
> something I am missing myself.
>
>
> On Thu, 17 Sep 2020 at 12:46, Pratik Patel  wrote:
> >
> > Thanks for your reply Alexandre.
> >
> > I have "_root_" and "_nest_path_" fields in my schema but not
> > "_nest_parent_".
> >
> >
> > 
> > 
> >  > docValues="false" />
> > 
> >  > name="_nest_path_" class="solr.NestPathField" />
> >
> > I ran my test after adding the "_nest_parent_" field and I am not getting
> > NPE any more which is good. Thanks!
> >
> > But looking at the documents in the index, I see that after the atomic
> > update, now there are two children documents with the same id. One
> document
> > has old values and another one has new values. Shouldn't they be merged
> > based on the "id"? Do we need to specify anything else in the request to
> > ensure that documents are merged/updated and not duplicated?
> >
> > For your reference, below is the test I am running now.
> >
> > // update field of one child doc
> > SolrInputDocument sdoc = new SolrInputDocument(  );
> > sdoc.addField( "id", testChildPOJO.id() );
> > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > sdoc.addField( "storeid", "foo" );
> > sdoc.setField( "fieldName",
> > java.util.Collections.singletonMap("set", Collections.list("bar" ) ));
> >
> > final UpdateRequest req = new UpdateRequest();
> > req.withRoute( pojo1.id() );// parent id
> > req.add(sdoc);
> >
> > collection.client.request( req,
> collection.getCollectionName()
> > );
> > collection.client.commit();
> >
> >
> > Resulting documents :
> >
> > {id=c1_child1, conceptid=c1, storeid=s1,
> fieldName=c1_child1_field_value1,
> > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
> > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
> > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon
> Sep
> > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
> > _root_=abcd, _version_=1678099970405695488}
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > Can you double-check your schema to see if you have all the fields
> > > required to support nested documents. You are supposed to get away
> > > with just _root_, but really you should also include _nest_path and
> > > _nest_parent_. Your particular exception seems to be triggering
> > > something (maybe a bug) related to - possibly - missing _nest_path_
> > > field.
> > >
> > > See:
> > >
> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel 
> wrote:
> > > >
> > > > Hello Everyone,
> > > >
> > > > I am trying to update a field of a child document using atomic
> updates
> > > > feature. I am using solr and solrJ version 8.5.0
> > > >
>

SolrJ : Inserting Bean object containing different types of Child documents

2019-04-16 Thread Pratik Patel

Hello Everyone,

I have a Bean object which can have child documents of classes Child_type1
and Child_type2. When I try to index this document, I get an error message
"Doc cannot have more than one Field with child=true".

I looked at the mailing list but couldn't find any solution for this.
Any suggestions on how such documents should be indexed?

I am using SolrJ version 7.7.1 and Solr 7.4.0

Thanks!
Pratik

Pagination with streaming expressions

2019-05-01 Thread Pratik Patel

Hello Everyone,

Is there a way to paginate the results of Streaming Expression?

Let's say I have a simple gatherNodes function which has count operation at
the end of it. I can sort by the count fine but now I would like to be able
to select specific sub set of result based on pagination parameters. Is
there any way to do that?

Thanks!
Pratik

Writing unit tests to test complex solr queries

2019-05-10 Thread Pratik Patel

Hello Everyone,

I want to write unit tests for some solr queries which are being triggered
through java code. These queries includes complex streaming expressions and
faceting queries which requires large number of documents to be present in
solr index. I can not create and push so many documents programmatically
through my tests.

I am trying to find a way to test these queries without depending on
externally running solr instance. I found following approach which is using
classes like EmbeddedSolrServer and CoreContainer. We can put index files
and solr configuration on classpath and run the tests against them.

https://dzone.com/articles/junit-testing-for-solr-6

However, this seems to be an old approach and I am trying to find a way to
do it using latest solr-test-framework. I also can not use old approach
because I want to test Streaming Expressions as well and I need
SolrCloudClient for that.
In solr-test-framework, I found MiniSolrCloudCluster class but I don't know
how to use pre-created index files and configuration with that.

Does anyone know how we can use pre-created index files and configuration
with latest test-framework? What is the recommended way to do such kind of
testing? Any direction with this would be really helpful.

Thanks!
Pratik

Re: Writing unit tests to test complex solr queries

2019-05-16 Thread Pratik Patel

Thanks a lot for the response Mikhail and Angie!

I did go through most of the test classes in solr before posting here but
couldn't find anything which is close to what I want to do which is to load
pre-created index files and configuration or at least index files.
However, the class HelloWorldSolrCloudTestCase.java class pointed out by
Angie put together with his code that he has shared seems to be completing
the picture and looks spot on! Thanks a lot.

I will try to re-write my unit tests with this approach and will post an
update soon.

@Angie, can you please share the format of data in your
"testdata/test-data.json" file? I want to be sure about using the correct
format.

Thanks!
Pratik

On Tue, May 14, 2019 at 1:14 PM Angie Rabelero 
wrote:

> Hi, I’ll advised you to extend the class SolrCloudTestCase, which extends
> the MiniSolrCloudCluster. Theres a hello world example in the solr source
> at
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/HelloWorldSolrCloudTestCase.java
> .
>
> Here’s how I setup a cluster, create a collection with my ConfigSet, and
> index a file.
>
> @BeforeClass
> public static void setupCluster() throws Exception {
>
> // Create and configure cluster
> configureCluster(nodeCount)
> .addConfig(CONFIG_NAME, getFile(CONFIG_DIR).toPath())
> .configure();
>
> // Create an empty collection
> Create.createCollection(COLLECTION, CONFIG_NAME, numShards,
> numReplicas)
>   .setMaxShardsPerNode(maxShardsPerNode)
>   .process(cluster.getSolrClient(), COLLECTION);
> AbstractDistribZkTestBase
> .waitForRecoveriesToFinish(COLLECTION,
> cluster.getSolrClient().getZkStateReader(), true, true, 120);
>
> // Set default collection
> cluster.getSolrClient().setDefaultCollection(COLLECTION);
>
> // Add documents to collection
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update");
> up.addFile(getFile("testdata/test-data.json"), "application/json");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> NamedList result = cluster.getSolrClient().request(up);
>
> // Print cluster status
> System.out.println("Default Collection: " +
> cluster.getSolrClient().getDefaultCollection());
> System.out.println("Cluster State: " +
> cluster.getSolrClient().getZkStateReader().getClusterState());
> System.out.println("Update Result: " + result);
>
> }
>
>  I copy the configset to the resources dir in the pom using a mauven
> plugin. And the test file is already in the resources dir.
>
>
>
>
> > On May 14, 2019, at 04:01, Mikhail Khludnev  wrote:
> >
> > Hello, Pratick.
> > Welcome to mysterious world of Solr testing. The best way is to find
> > existing test closest to your problem field, copy in and amend
> necessarily.
> > What about
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_solr_solrj_src_test_org_apache_solr_client_solrj_io_stream_StreamExpressionTest.java&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=lUsTzFRk0CX38HvagQ0wd52D67dA0fx_D6M6F3LHzAU&m=9tFliF4KA1tiG2lGmDJWO34hyq9-Sz1inAxRPVKkz78&s=KjveDzxzQAKRmvzPYk2y1FQ-w6yAGWuwfTVGHMQP2ZA&e=
> > ?
> >
> > On Fri, May 10, 2019 at 11:36 PM Pratik Patel 
> wrote:
> >
> >> Hello Everyone,
> >>
> >> I want to write unit tests for some solr queries which are being
> triggered
> >> through java code. These queries includes complex streaming expressions
> and
> >> faceting queries which requires large number of documents to be present
> in
> >> solr index. I can not create and push so many documents programmatically
> >> through my tests.
> >>
> >> I am trying to find a way to test these queries without depending on
> >> externally running solr instance. I found following approach which is
> using
> >> classes like EmbeddedSolrServer and CoreContainer. We can put index
> files
> >> and solr configuration on classpath and run the tests against them.
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dzone.com_articles_junit-2Dtesting-2Dfor-2Dsolr-2D6&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=lUsTzFRk0CX38HvagQ0wd52D67dA0fx_D6M6F3LHzAU&m=9tFliF4KA1tiG2lGmDJWO34hyq9-Sz1inAxRPVKkz78&s=K4vPwvz9h9H8s-nsZTbkmCvTh002RP3CHcpbb9IOrpw&e=
> >>
> >> However, this seems to be an old approach and

Solr test framework not able to upload configuration to zk and fails with KeeperException

2019-06-04 Thread Pratik Patel

okeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x1003ec815f30007 type:create
cxid:0x16 zxid:0x48 txntype:-1 reqpath:n/a Error Path:/solr/configs
Error:KeeperErrorCode = NodeExists for /solr/configs
2019-06-04T15:07:01,163 [ProcessThread(sid:0 cport:50192):] INFO
 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x1003ec815f30007 type:create
cxid:0x17 zxid:0x49 txntype:-1 reqpath:n/a Error
Path:/solr/configs/collection2 Error:KeeperErrorCode = NodeExists for
/solr/configs/collection2
2019-06-04T15:07:01,163 [ProcessThread(sid:0 cport:50192):] INFO
 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x1003ec815f30007 type:create
cxid:0x18 zxid:0x4a txntype:-1 reqpath:n/a Error
Path:/solr/configs/collection2/conf Error:KeeperErrorCode = NodeExists for
/solr/configs/collection2/conf
2019-06-04T15:07:01,165 [ProcessThread(sid:0 cport:50192):] INFO
 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x1003ec815f30007 type:create
cxid:0x1b zxid:0x4d txntype:-1 reqpath:n/a Error Path:/solr/configs
Error:KeeperErrorCode = NodeExists for /solr/configs
2019-06-04T15:07:01,166 [ProcessThread(sid:0 cport:50192):] INFO
 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x1003ec815f30007 type:create
cxid:0x1c zxid:0x4e txntype:-1 reqpath:n/a Error
Path:/solr/configs/collection2 Error:KeeperErrorCode = NodeExists for
/solr/configs/collection2
**

I have searched through the mailing list and related areas. Also, I have
tried various ways of creating MiniSolrCloudCluster but I get the same
exception.

I have made sure that a new directory is always used as BASE_DIR to
MiniSolrCloudCluster.

Can anyone please throw some light on whats wrong here? Am I hitting any
solr test framework issue? I am using solr test framework version 7.7.1

Thanks a lot,
Pratik

Loading pre created index files into MiniSolrCloudCluster of test framework

2019-06-05 Thread Pratik Patel

Hello Everyone,

I am trying to write some unit tests for solr queries which requires some
data in specific state. There is a way to load this data through json files
but the problem is that the required data needs to have parent-child blocks
to be present.
Because of this, I would prefer if there is a way to load pre-created index
files into the cluster.
I checked the solr test framework and related examples but couldn't find
any example of index files being loaded in cloud mode.

Is there a way to load index files into solr running in cloud mode?

Thanks!
Pratik

Re: Loading pre created index files into MiniSolrCloudCluster of test framework

2019-06-06 Thread Pratik Patel

Thanks for the reply Alexandre, only special thing about JSON/XML is that
in order to export the data in that form, I need to have "docValues"
enabled for all the fields which are to be retrieved. I need to retrieve
all the fields and I can not enable docValues on all fields.
If there was a way to export data in JSON format without having to change
schema and index then I would have no issues with JSON.
I can not use "select" handler as it does not include parent/child
relationships.

The options I have are following I guess. I am not sure if they are real
possibilities though.

1. Find a way to load pre-created index files either through
SolrCloudClient or directly to ZK
2. Find a way to export the data in JSON format without having to make all
fields docValues enabled.
3. Use Merge Index tool with an empty index and a real index. I am don't
know if it is possible to do this through solrJ though.

Please let me know if there is better way available, it would really help.
Just so you know, I am trying to do this for unit tests related to solr
queries. Ultimately I want to load some pre-created data into
MiniSolrCloudCluster.

Thanks a lot,
Pratik

On Wed, Jun 5, 2019 at 6:56 PM Alexandre Rafalovitch 
wrote:

> Is there something special about parent/child blocks you cannot do through
> JSON? Or XML?
>
> Both Solr XML and Solr JSON support it.
>
> New style parent/child mapping is also supported in latest Solr but I think
> it is done differently.
>
> Regards,
> Alex
>
> On Wed, Jun 5, 2019, 6:29 PM Pratik Patel,  wrote:
>
> > Hello Everyone,
> >
> > I am trying to write some unit tests for solr queries which requires some
> > data in specific state. There is a way to load this data through json
> files
> > but the problem is that the required data needs to have parent-child
> blocks
> > to be present.
> > Because of this, I would prefer if there is a way to load pre-created
> index
> > files into the cluster.
> > I checked the solr test framework and related examples but couldn't find
> > any example of index files being loaded in cloud mode.
> >
> > Is there a way to load index files into solr running in cloud mode?
> >
> > Thanks!
> > Pratik
> >
>

Re: Solr test framework not able to upload configuration to zk and fails with KeeperException

2019-06-06 Thread Pratik Patel

Thanks guys, I found that the issue I had was because of some binary files
(NLP models) in my configuration. Once I fixed that, I was able to set up a
cluster. These exceptions are still logged but they are logged as INFO and
were not the real issue.

Thanks Again
Pratik

On Tue, Jun 4, 2019 at 4:15 PM Angie Rabelero 
wrote:

> For what I know the configuration files need to be already in the
> test/resource directory before runnin. I copy them to the directory using a
> maven maven-antrun-plugin in the generate-test-sources phase. And the
> framework can "create a collection” without the configfiles, but it will
> obviously fail when try to use it.
>
>
> On the surface, this znode already exists:
>
> /solr/configs/collection2
>
> So it looks like somehow you're
>
> > On Jun 4, 2019, at 12:29 PM, Pratik Patel  pra...@semandex.net>> wrote:
> >
> > /solr/configs/collection2
>
> > On Jun 4, 2019, at 14:29, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > I am trying to run a simple unit test using solr test framework. At this
> > point, all I am trying to achieve is to be able to upload some
> > configuration and create a collection using solr test framework.
> >
> > Following is the simple code which I am trying to run.
> >
> > private static final String COLLECTION = "collection2" ;
> >
> > private static final int numShards = 1;
> > private static final int numReplicas = 1;
> > private static final int maxShardsPerNode = 1;
> > private static final int nodeCount = (numShards*numReplicas +
> > (maxShardsPerNode-1))/maxShardsPerNode;
> >
> > private static final String id = "id";
> > private static final String CONFIG_DIR =
> > "src/test/resources/testdata/solr/collection2";
> >
> > @BeforeClass
> > public static void setupCluster() throws Exception {
> >
> >// create and configure cluster
> >configureCluster(nodeCount)
> >.addConfig("collection2", getFile(CONFIG_DIR).toPath())
> >.configure();
> >
> >// create an empty collection
> >CollectionAdminRequest.createCollection(COLLECTION, "collection2",
> > numShards, numReplicas)
> >.setMaxShardsPerNode(maxShardsPerNode)
> >.process(cluster.getSolrClient());
> >
> >// add further document(s) here
> >// TODO
> > }
> >
> >
> > However, I see that solr fails to upload the configuration to zk.
> > Following method of ZooKeeper class fails with the "KeeperException"
> >
> > public String create(final String path, byte data[], List acl,
> >CreateMode createMode)
> >throws KeeperException, InterruptedException
> > {
> >final String clientPath = path;
> >PathUtils.validatePath(clientPath, createMode.isSequential());
> >
> >final String serverPath = prependChroot(clientPath);
> >
> >RequestHeader h = new RequestHeader();
> >h.setType(ZooDefs.OpCode.create);
> >CreateRequest request = new CreateRequest();
> >CreateResponse response = new CreateResponse();
> >request.setData(data);
> >request.setFlags(createMode.toFlag());
> >request.setPath(serverPath);
> >if (acl != null && acl.size() == 0) {
> >throw new KeeperException.InvalidACLException();
> >}
> >request.setAcl(acl);
> >ReplyHeader r = cnxn.submitRequest(h, request, response, null);
> >if (r.getErr() != 0) {
> >throw KeeperException.create(KeeperException.Code.get(r.getErr()),
> >clientPath);
> >}
> >if (cnxn.chrootPath == null) {
> >return response.getPath();
> >} else {
> >return response.getPath().substring(cnxn.chrootPath.length());
> >}
> > }
> >
> >
> > And following are the Keeper exceptions thrown for each file of the
> > configuration.
> >
> > Basically, it says
> > Got user-level KeeperException when processing sessionid: Error
> > Path:/solr/configs Error:KeeperErrorCode = NodeExists for /solr/configs
> >
> >
> **
> > 2019-06-04T15:07:01,157 [ProcessThread(sid:0 cport:50192):] INFO
> > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
> > KeeperException when processing sessionid:0x1003ec815f30007 type:create
> > cxid:0xe zxid:0x40 txntype:-1 reqpath:n/a Error Path:/solr/configs
> > Error:KeeperE

Re: Streaming expression function which can give parent document along with its child documents ?

2019-06-10 Thread Pratik Patel

If your children documents have a link to parent documents (like parent id
or something) then you can use graph traversal to do this.

On Mon, Jun 10, 2019 at 8:01 AM Jai Jamba 
wrote:

> Can anyone help me in this ?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Loading pre created index files into MiniSolrCloudCluster of test framework

2019-06-10 Thread Pratik Patel

So, I found a way to programmatically restore a collection from a backup.
I though that I could create a backup of a collection, put it on the
classpath, restore it during unit test set up and run the queries against
newly created collection using restore.
Theoretically, it sounded like it would work.

I have following code doing the restore.

CollectionAdminRequest.Restore restore =
CollectionAdminRequest.restoreCollection( newCollectionName,
backupName )
.setLocation( pathToBackup );

CollectionAdminResponse resp = restore.process( cluster.getSolrClient() );

AbstractDistribZkTestBase.waitForRecoveriesToFinish(
newCollectionName, cluster.getSolrClient().getZkStateReader(),
true, true, 30);


However, any query I run against this new collection returns zero
documents. I have tried queries which should match many documents but they
all return zero documents. It seems like the data is not really loaded
during the restore operation.
I stepped through the  "doRestore()" method of class RestoreCore.java which
is internally doing the restore, I see that it has no errors or exceptions
and the restore operation status is successful, but in reality there is no
data in new collection. I see that new collection is created but it seems
to be without any data.

Am I missing something here? Any idea what could be the cause of this?

Thanks!
Pratik








On Thu, Jun 6, 2019 at 11:18 AM Pratik Patel  wrote:

> Thanks for the reply Alexandre, only special thing about JSON/XML is that
> in order to export the data in that form, I need to have "docValues"
> enabled for all the fields which are to be retrieved. I need to retrieve
> all the fields and I can not enable docValues on all fields.
> If there was a way to export data in JSON format without having to change
> schema and index then I would have no issues with JSON.
> I can not use "select" handler as it does not include parent/child
> relationships.
>
> The options I have are following I guess. I am not sure if they are real
> possibilities though.
>
> 1. Find a way to load pre-created index files either through
> SolrCloudClient or directly to ZK
> 2. Find a way to export the data in JSON format without having to make all
> fields docValues enabled.
> 3. Use Merge Index tool with an empty index and a real index. I am don't
> know if it is possible to do this through solrJ though.
>
> Please let me know if there is better way available, it would really help.
> Just so you know, I am trying to do this for unit tests related to solr
> queries. Ultimately I want to load some pre-created data into
> MiniSolrCloudCluster.
>
> Thanks a lot,
> Pratik
>
>
> On Wed, Jun 5, 2019 at 6:56 PM Alexandre Rafalovitch 
> wrote:
>
>> Is there something special about parent/child blocks you cannot do through
>> JSON? Or XML?
>>
>> Both Solr XML and Solr JSON support it.
>>
>> New style parent/child mapping is also supported in latest Solr but I
>> think
>> it is done differently.
>>
>> Regards,
>> Alex
>>
>> On Wed, Jun 5, 2019, 6:29 PM Pratik Patel,  wrote:
>>
>> > Hello Everyone,
>> >
>> > I am trying to write some unit tests for solr queries which requires
>> some
>> > data in specific state. There is a way to load this data through json
>> files
>> > but the problem is that the required data needs to have parent-child
>> blocks
>> > to be present.
>> > Because of this, I would prefer if there is a way to load pre-created
>> index
>> > files into the cluster.
>> > I checked the solr test framework and related examples but couldn't find
>> > any example of index files being loaded in cloud mode.
>> >
>> > Is there a way to load index files into solr running in cloud mode?
>> >
>> > Thanks!
>> > Pratik
>> >
>>
>

How to increase maximum size of files allowed in configuration for MiniSolrCloudCluster

2019-06-10 Thread Pratik Patel

Hi,

I am trying to upload a configuration to "MiniSolrCloudCluster" in my unit
test. This configuration has some binary files for NLP related
functionality. Some of these binary files are bigger than 5 MB. If I try to
upload configuration with these files then it doesn't work. I can set up
the cluster fine if I remove all binary files bigger than 5 MB.

I have noticed the same issue when I try to restore a backup having
configuration files bigger than 5 MB.

Does jetty have some limit on the size of configuration files? Is there a
way to override this?

Thanks,
Pratik

Re: How to increase maximum size of files allowed in configuration for MiniSolrCloudCluster

2019-06-11 Thread Pratik Patel

That was spot on. Thanks a lot for your help!

On Tue, Jun 11, 2019 at 2:14 AM Jörn Franke  wrote:

> It is probably a Zookeeper limit. You have to set jute.maxbuffer in the
> Java System properties of all (!) zookeeper Servers and clients to the same
> value (in your case it should be a little bit larger than your largest
> file).
> If possible you can try to avoid storing the NLP / ML models in Solr but
> provide them on a share or similar where all Solr nodes have access to.
>
> > Am 11.06.2019 um 00:32 schrieb Pratik Patel :
> >
> > Hi,
> >
> > I am trying to upload a configuration to "MiniSolrCloudCluster" in my
> unit
> > test. This configuration has some binary files for NLP related
> > functionality. Some of these binary files are bigger than 5 MB. If I try
> to
> > upload configuration with these files then it doesn't work. I can set up
> > the cluster fine if I remove all binary files bigger than 5 MB.
> >
> > I have noticed the same issue when I try to restore a backup having
> > configuration files bigger than 5 MB.
> >
> > Does jetty have some limit on the size of configuration files? Is there a
> > way to override this?
> >
> > Thanks,
> > Pratik
>

Bug in scoreNodes function of streaming expressions?

2019-07-01 Thread Pratik Patel

Hello Everyone,

I am trying to execute following streaming expression with "scoreNodes"
function in it. This is taken from the documentation.

scoreNodes(top(n="50",
   sort="count(*) desc",
   nodes(baskets,
 random(baskets, q="productID:ABC", fl="basketID",
rows="500"),
 walk="basketID->basketID",
 fq="-productID:ABC",
 gather="productID",
 count(*

I have ensured that I have the collection and data present for it.
Upon executing this, I am getting an error message as follows.

"No collection param specified on request and no default collection has
been set: []"

Upon digging into the source code I found that there is a possible bug in
ScoreNodesStream.java

StringBuilder instance is never appended any string and the block which
initializes collection, needs the length of that instance to be more than
zero. This condition will always be false and hence the collection will
never be set.

I checked this file in solr version 8.1 and that also has the same issue.
Is there any JIRA open for this or any patch available?

[image: image.png]

Thanks,
Pratik

Re: Bug in scoreNodes function of streaming expressions?

2019-07-01 Thread Pratik Patel

I think the problem was that my streaming expression was always returning
just one node. When I added more data so that I can have more than one
node, I started seeing the result.

On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel  wrote:

> Hello Everyone,
>
> I am trying to execute following streaming expression with "scoreNodes"
> function in it. This is taken from the documentation.
>
> scoreNodes(top(n="50",
>sort="count(*) desc",
>nodes(baskets,
>  random(baskets, q="productID:ABC", fl="basketID",
> rows="500"),
>  walk="basketID->basketID",
>  fq="-productID:ABC",
>  gather="productID",
>  count(*
>
> I have ensured that I have the collection and data present for it.
> Upon executing this, I am getting an error message as follows.
>
> "No collection param specified on request and no default collection has
> been set: []"
>
> Upon digging into the source code I found that there is a possible bug in
> ScoreNodesStream.java
>
> StringBuilder instance is never appended any string and the block which
> initializes collection, needs the length of that instance to be more than
> zero. This condition will always be false and hence the collection will
> never be set.
>
> I checked this file in solr version 8.1 and that also has the same issue.
> Is there any JIRA open for this or any patch available?
>
> [image: image.png]
>
> Thanks,
> Pratik
>

Re: Bug in scoreNodes function of streaming expressions?

2019-07-02 Thread Pratik Patel

Great, thanks!

On Tue, Jul 2, 2019 at 6:37 AM Joel Bernstein  wrote:

> Ok, that sounds like a bug. I can create a ticket for this.
>
> On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel  wrote:
>
> > I think the problem was that my streaming expression was always returning
> > just one node. When I added more data so that I can have more than one
> > node, I started seeing the result.
> >
> > On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel 
> wrote:
> >
> >> Hello Everyone,
> >>
> >> I am trying to execute following streaming expression with "scoreNodes"
> >> function in it. This is taken from the documentation.
> >>
> >> scoreNodes(top(n="50",
> >>sort="count(*) desc",
> >>nodes(baskets,
> >>  random(baskets, q="productID:ABC",
> >> fl="basketID", rows="500"),
> >>  walk="basketID->basketID",
> >>  fq="-productID:ABC",
> >>  gather="productID",
> >>  count(*
> >>
> >> I have ensured that I have the collection and data present for it.
> >> Upon executing this, I am getting an error message as follows.
> >>
> >> "No collection param specified on request and no default collection has
> >> been set: []"
> >>
> >> Upon digging into the source code I found that there is a possible bug
> in
> >> ScoreNodesStream.java
> >>
> >> StringBuilder instance is never appended any string and the block which
> >> initializes collection, needs the length of that instance to be more
> than
> >> zero. This condition will always be false and hence the collection will
> >> never be set.
> >>
> >> I checked this file in solr version 8.1 and that also has the same
> issue.
> >> Is there any JIRA open for this or any patch available?
> >>
> >> [image: image.png]
> >>
> >> Thanks,
> >> Pratik
> >>
> >
>

Re: Bug in scoreNodes function of streaming expressions?

2019-07-02 Thread Pratik Patel

Hi Joel,

There also seems to be an issue related to how QueryRequest instance is
created in scoreNodes implementation. It seems to be using GET method
instead of POST. As a result, when underlying stream is big, scoreNodes
function fails with an exception "URI is too large"

I found a related is issue mentioned here,
http://lucene.472066.n3.nabble.com/Streaming-Expressions-GET-vs-POST-td4415044.html

ScoreNodesStream.java initializes QueryRequest as follows.

QueryRequest request = new QueryRequest(params);

vs TimeSeriesStream.java which does it like this.

QueryRequest request = new QueryRequest(paramsLoc, SolrRequest.METHOD.POST);

Is this also a bug?



On Tue, Jul 2, 2019 at 10:17 AM Pratik Patel  wrote:

> Great, thanks!
>
> On Tue, Jul 2, 2019 at 6:37 AM Joel Bernstein  wrote:
>
>> Ok, that sounds like a bug. I can create a ticket for this.
>>
>> On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel  wrote:
>>
>> > I think the problem was that my streaming expression was always
>> returning
>> > just one node. When I added more data so that I can have more than one
>> > node, I started seeing the result.
>> >
>> > On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel 
>> wrote:
>> >
>> >> Hello Everyone,
>> >>
>> >> I am trying to execute following streaming expression with "scoreNodes"
>> >> function in it. This is taken from the documentation.
>> >>
>> >> scoreNodes(top(n="50",
>> >>sort="count(*) desc",
>> >>nodes(baskets,
>> >>  random(baskets, q="productID:ABC",
>> >> fl="basketID", rows="500"),
>> >>  walk="basketID->basketID",
>> >>  fq="-productID:ABC",
>> >>  gather="productID",
>> >>  count(*
>> >>
>> >> I have ensured that I have the collection and data present for it.
>> >> Upon executing this, I am getting an error message as follows.
>> >>
>> >> "No collection param specified on request and no default collection has
>> >> been set: []"
>> >>
>> >> Upon digging into the source code I found that there is a possible bug
>> in
>> >> ScoreNodesStream.java
>> >>
>> >> StringBuilder instance is never appended any string and the block which
>> >> initializes collection, needs the length of that instance to be more
>> than
>> >> zero. This condition will always be false and hence the collection will
>> >> never be set.
>> >>
>> >> I checked this file in solr version 8.1 and that also has the same
>> issue.
>> >> Is there any JIRA open for this or any patch available?
>> >>
>> >> [image: image.png]
>> >>
>> >> Thanks,
>> >> Pratik
>> >>
>> >
>>
>

Best way to retrieve parent documents with children using getBeans method?

2019-08-12 Thread Pratik Patel

Hello Everyone,

We use SolrJ with POJOs to index documents into solr. If a POJO has a field
annotated with @child then SolrJ automatically adds those objects as
children of the POJO. This works fine and indexing is done properly.

However, when I retrieve the same document through same POJO using
"getBeans" method of DocumentObjectBinder class, the field annotated
with @child annotation is always null i.e. the children are not populated
in POJO.

What is the best way to get children in the same POJO along with other
fields. I read about child transformers but I am not sure if it is the
prescribed and recommended way to get children with parent. What is the
best practice to achieve this?

Thanks!
Pratik

Re: The Visual Guide to Streaming Expressions and Math Expressions

2019-10-16 Thread Pratik Patel

Hi Joel,

Looks like this is going to be very helpful, thank you! I am wondering
whether the visualizations are generated through third party library or is
it something which would be part of solr distribution?
https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/visualization.adoc#visualization


Thanks,
Pratik


On Wed, Oct 16, 2019 at 10:54 AM Joel Bernstein  wrote:

> Hi,
>
> The Visual Guide to Streaming Expressions and Math Expressions is now
> complete. It's been published to Github at the following location:
>
>
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/math-expressions.adoc#streaming-expressions-and-math-expressions
>
> The guide will eventually be part of Solr's release when the RefGuide is
> ready to accommodate it. In the meantime its been designed to be easily
> read directly from Github.
>
> The guide contains close to 200 visualizations and examples showing how to
> use Streaming Expressions and Math Expressions for data analysis and
> visualization. The visual guide is also designed to guide users that are
> not experts in math in how to apply the functions to analysis and visualize
> data.
>
> The new visual data loading feature in Solr 8.3 is also covered in the
> guide. This feature should cut down on the time it takes to load CSV files
> so that more time can be spent on analysis and visualization.
>
>
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/loading.adoc#loading-data
>
> Joel Bernstein
>

Re: Issues with the handling of NULLs in Streaming Expressions

2019-11-14 Thread pratik@semandex

I am facing exactly the same issue right now. There is no way to check if a
particular field is not present in tuple or is null.

Was there any development related to this issue? Is there a work around?

In my case, I have an incoming stream of tuples and I want to filter out all
the tuples which do not have certain field set, so I was thinking of
"having" function like this.

having( seed_expr, not(eq(fieldA,null) )

this would result in stream of tuples which definitely have fieldA set and I
can do some operation on it.

Problem is that "eq" evaluator fails with null value. 

Is there a related JIRA that I can track?

@Joel is there any way/ workaround  to achieve this? i.e. to know whether
certain field is null or not? 


Thanks and Regards,
Pratik



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

How to change config set for some collection

2019-11-20 Thread Pratik Patel

Hello Everyone,

Let's say I have a collection called "collection1" which uses config set
"config_set_1".
Now, using "upconfig" command, I upload a new configuration called
"config_set_2". How can I make "collection1" use "config_set_2" instead of
"config_set_1"?

I know that if I upload new configuration with the same name "config_set_1"
and reload the collection then it will have new configuration but I want to
keep the old config set, add a new one and make changes so that collection1
starts using new config set.

Is it possible?


Thanks and Regards
Pratik

Re: How to change config set for some collection

2019-11-20 Thread Pratik Patel

Thanks Shawn! This is what I needed.

On Wed, Nov 20, 2019 at 3:59 PM Shawn Heisey  wrote:

> On 11/20/2019 1:34 PM, Pratik Patel wrote:
> > Let's say I have a collection called "collection1" which uses config set
> > "config_set_1".
> > Now, using "upconfig" command, I upload a new configuration called
> > "config_set_2". How can I make "collection1" use "config_set_2" instead
> of
> > "config_set_1"?
> >
> > I know that if I upload new configuration with the same name
> "config_set_1"
> > and reload the collection then it will have new configuration but I want
> to
> > keep the old config set, add a new one and make changes so that
> collection1
> > starts using new config set.
> >
> > Is it possible?
>
> There is an action, available in the zkcli script and possibly
> elsewhere, called "linkconfig".
>
> It looks like the config can also be changed with the collections API,
> using the MODIFYCOLLECTION action.
>
>
> https://lucene.apache.org/solr/guide/8_2/collection-management.html#modifycollection
>
> To make the change effective after linking to a new config, you'll need
> to reload the collection.
>
> Thanks,
> Shawn
>

94 matches

Mail list logo