Re: keyword-in-context for PDF document

2017-04-13 Thread ankur
Apologies, I meant "keyword-in-context".



--
View this message in context: 
http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754p4329756.html
Sent from the Solr - User mailing list archive at Nabble.com.


keyword-in-content for PDF document

2017-04-13 Thread ankur
If i am search for word "growth" in a PDF, i want to output all the sentences
with the word "growth" in it.

How can that be done?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: keyword-in-content for PDF document

2017-04-13 Thread ankur
Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text
flow.

There is something interesting in your reply, "Or you could try using
highlighter to return only 
the sentence. ".

I didnt understand that bit. How do we use Highlighter to return the
sentence?

To make sure, I want to return all sentences where the word "Growth"
appears. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/keyword-in-context-for-PDF-document-tp4329754p4329794.html
Sent from the Solr - User mailing list archive at Nabble.com.


looking for a way to get structured nested response from solr

2017-03-01 Thread ankur bansal
Hi,

I am new SOLR user, I am trying to use SOLR in a application where we have
1 core with 1 config file having multiple queries. We have root entities
and many sub entities as well. Currently I am getting a response of
something like this

"response":{"numFound":1,"start":0,"docs":[
{
"unique_key":"4493234234",
"_version_":1560479076226957312,
"_childDocuments_":[
{
"a" : "value_a_1",
"b" : "value_b_1",
},
{
"a" : "value_a_2",
"b" : "value_b_2",
}]
}]}

where as what I am looking to get response is of this kind

"_childDocuments_":[
{"table_temp_response" :[
{
"a" : "value_a_1",
"b" : "value_b_1",
},
{
"a" : "value_a_2",
"b" : "value_b_2",
}]
}]

or this will also do
(we can do this using [subqueries] transformer I am not able to get child
data with this approach)

"response":{"numFound":1,"start":0,"docs":[
{
"_version_":1560657028091740160,
"childA":{"numFound":0,"start":0,"docs":[]
},
"childB":{"numFound":0,"start":0,"docs":[]
}}]

Can someone guide me how I can achieve this?


Getting error while excuting full import

2017-04-10 Thread ankur.168
Hi All,I am trying to use solr with 2 cores interacting with 2 different
databases, one core is executing full-import successfully where as when I am
running for 2nd one it is throwing table or view not found exception. If I
am using the query directly It is running fine. Below is the error meassge I
am getting.Kindly help me, not able to understand what could be the issue
here.I am using solr 6.4.1. 2017-04-10 09:17:23.167 INFO  (Thread-14) [  
x:aggr_content] o.a.s.h.d.DataImporter Starting Full Import2017-04-10
09:17:23.183 WARN  (Thread-14) [   x:aggr_content]
o.a.s.h.d.SimplePropertiesWriter Unable to read:
dataimport.properties2017-04-10 09:17:23.304 INFO  (Thread-14) [  
x:aggr_content] o.a.s.h.d.JdbcDataSource Creating a connection for entity
aggrPropertiesList with URL:
jdbc:oracle:thin:@hostname:1521/serviceId2017-04-10 09:17:23.465 INFO 
(qtp1348949648-19) [   x:aggr_content] o.a.s.c.S.Request [aggr_content] 
webapp=/solr path=/dataimport
params={indent=on&wt=json&command=status&_=1491815835958} status=0
QTime=02017-04-10 09:17:23.569 INFO  (Thread-14) [   x:aggr_content]
o.a.s.h.d.JdbcDataSource Time taken for getConnection(): 2632017-04-10
09:17:23.630 ERROR (Thread-14) [   x:aggr_content] o.a.s.h.d.DocBuilder
Exception while processing: aggrList document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT GLBL_ID FROM AGGR_OWNER2.GLBL_DETAILS Processing
Document # 1at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:327)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at
org.apache.solr.handler.dataimport.DataImporter$$Lambda$93/239004134.run(Unknown
Source) at java.lang.Thread.run(Thread.java:745)Caused by:
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist   
at
oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)   at
oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)   at
oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951) at
oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)at
oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227)  at
oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)   at
oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:195)  at
oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:876)   
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1916)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1878)
at
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.executeStatement(JdbcDataSource.java:349)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:321)
... 15 more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting error while excuting full import

2017-04-17 Thread ankur.168
Thanks for replying Shawn,

There was an issue with the db connection url, silly mistake.

I am facing one another problem, do not know if should post in the same
thread or as a new post. Anyways posting here only, let me know if needs to
be posted as new one.

I am using DIH as you know. I have property_id as a unique key and I have i
parent and 14-15 child entities(trying to improve performance for pretty old
system hence can't avoid/reduce so many childs).
We have around 2.5 lacs ids in DB. So full import is becoming kind of near
impossible for me here. I tried to split this into multiple document files
within the same core and added a new data import handler as well. but when I
am running import on both urls. The latest data import overrides the
previous one, hence I am not able to get complete data.

So I have 2 questions here.

1. Is there a better way of doing indexing and import than the way I am
doing it right now?
2. if no, then how can I make full import faster here?

--Ankur



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting error while excuting full import

2017-04-17 Thread ankur.168
Hi Erick,

Thanks for replying, As you suggest I can use solrJ to map RDBMS fetched
data and index/search it later on. but DIH gives multi db connection for
full import and other benefits.
Does solrJ supports this or we need to put efforts to make a multithreaded
connection pool similar to DIH?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330442.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Hi Mikhail,

Thanks for replying,

I am currently trying to use zipper join but getting null pointer exception
as given below stacktrace

2017-04-18 09:11:51.154 INFO  (qtp1348949648-13) [   x:sample_content]
o.a.s.u.p.LogUpdateProcessorFactory [sample_content]  webapp=/solr
path=/dataimport
params={debug=true&indent=on&commit=true&start=0&clean=true&rows=10&command=full-import&verbose=false&core=sample_content&optimize=false&name=dataimport&wt=json&_=1492506703156}{deleteByQuery=*:*
(-1565006716610805760)} 0 615
2017-04-18 09:11:51.173 ERROR (qtp1348949648-13) [   x:sample_content]
o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 34 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:247)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
   

Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Hi Mikhail,

I tried with a simplest zipper entity. Here are the config details-






  





Here child entity have multiple records for a given property id. Hence I
believe full import is failing. I have added new logs below. Is there a way
Zipper supports multiple records merge?

aused by: java.lang.IllegalArgumentException: expect strictly increasing
primary keys for Relation PROPERTY_ID='${propertiesList.PROPERTY_ID}' got: ,
at 
org.apache.solr.handler.dataimport.Zipper.onNewParent(Zipper.java:108)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Yes, both column names are same. But if we just use property_id=property_id
in child entity, then how zipper gets to know which child document to merge
with which parent?

Any how I just tried with ur suggested where condition which result in
arrayindexoutofbound exception, here are the logs

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:561)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 36 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:110)
at
org.apache.solr.handler.dataimport.ContextImpl.resolve(ContextImpl.java:250)
at 
org.apache.solr.handler.dataimport.Zipper.onNewParent(Zipper.java:106)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:63)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:52)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
... 37 more

Thanks,
--Ankur 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330498.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Thanks for enlightening,  Shawn :)

I thought DIH does parallel db request for all the entities defined in a
document. 

I do believe that DIH is easier to use that's why I am trying to find a way
to use this in my current system. But as I explained above since I have so
many sub entities,each returns list of response which will be joined in to
parent. for more than 2 lacs document, full import is taking forever.

What I am looking for is a way to speed up my full import using DIH only. To
achieve this I tried to split the document in 2 and do full import
parallely. but with this approach latest import overrides other document
indexed data, since unique key(property_id) is same for both documents. 

One way I could think of is to keep document in different core which will
maintain different index files and merge the search results from both cores
while performing search on indexed data. But is this a good approach?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Running Solr6 on Tomcat7

2017-04-21 Thread ankur.168
As Shawn said, it is not recommended, still if you want to do this you can
follow these steps(picked from following post
http://lucene.472066.n3.nabble.com/Running-Solr-6-3-on-Tomcat-Help-Please-td4320874.html)

The following instructions work with Solr 6.2 + Tomcat 8.5: 
1. Copy solr-6.2.0/server/solr-webapp/webapp directory to tomcat/webapps 
and rename it to 'solr'. 
2. Copy solr-6.2.0/server/lib/ext/*.jar and 
solr-6.2.0/dist/solr-dataimporthandler-*.jar to solr/WEB-INF/lib 
3. Uncomment env-entry for solr/home in web.xml and set the value to 
solr-6.2.0/server/solr 
4. Copy solr-6.2.0/server/WEB-INF/resources/log4j.properties to 
solr/WEB-INF/classes 

I have tried this and was able to use basic admin ui functions, havn't tried
any 6.2+  versions with this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Running-Solr6-on-Tomcat7-tp4330500p4331256.html
Sent from the Solr - User mailing list archive at Nabble.com.


change data dir on fly in solr 4.6

2017-05-25 Thread ankur.168
Hi All,

I am using SOLR 4.6, I am running quartz job to trigger solr indexing. I
have requirement to maintain index in different locations for different
jobs. For ex. I have daily indexing job and monthly indexing job, so I want
to main 2 different index location for both. Is there a way we can change
data dir on fly in solr 4.6?

--Ankur



--
View this message in context: 
http://lucene.472066.n3.nabble.com/change-data-dir-on-fly-in-solr-4-6-tp4337098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Leading and trailing wildcard with phrase query and positional ordering

2013-11-19 Thread GOYAL, ANKUR
Hi,

I am using Solr 4.2.1. I have a couple of questions regarding using leading and 
trailing wildcards with phrase queries and doing positional ordering.

*   I have a field called text which is defined as the text_general field. 
I downloaded the ComplexPhraseQuery plugin 
(https://issues.apache.org/jira/browse/SOLR-1604) and it works perfectly for 
trailing wildcards and wildcards within the phrase. However, if we use a 
leading wildcard, then it leads to an error saying that WildCard query does not 
permit usage of leading wildcard. So, is there any other way that we can use 
leading and trailing wildcards along with a phrase ?
*   I am using boosting (qf parameter in requestHandler in solrConfig.xml) 
to do ordering of results that are returned from Solr. However, the order is 
not correct. The fields that I am doing boosting on are "text_general" fields. 
So, is it possible that boosting does not occur when the wildcards are used ?

-Ankur



Solr Docvalues grouping

2013-11-20 Thread GOYAL, ANKUR
Hi,

I am using Solr 4.5.1. and I am planning to use docValues attribute for a 
string type. The values in that field change only once a day. I would like to 
only group on that field. At the following link :-

http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/

it is mentioned that "For repeated access to the same field, the inverted index 
performs better due to internal Lucene caching". However, that link discusses 
about faceting. So, does the docValues performs slower as compared to inverted 
index when doing grouping also?

-Ankur





TermVectorComponent NullPointerException

2013-11-26 Thread GOYAL, ANKUR
Hi,

I am working on using term vector component with solr 4.2.1. If I use solr in a 
multicore environment, then I am getting a Null Pointer exception. However, if 
I use single core as is mentioned at :-

http://wiki.apache.org/solr/TermVectorComponent

then I do not get any exception. However, the response that I get does not 
contain any term information. So, did anybody else also faced this issue ?

With Regards,
Ankur





OpenNLP integration with Solr

2014-09-09 Thread Ankur Dulwani
I am using Solr 4.9 and want to integrate openNLP with it. I ran the patch
successfully  LUCENE-2899
  , the patch ran
successfully and following are my changes in schema.xmlBut no proper
outcomes can be seen. It is not recognizing the Named Entities like person,
organization etc, instead it gives all the text in person field.What am I
doing wrong, please help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OpenNLP-integration-with-Solr-tp4157569.html
Sent from the Solr - User mailing list archive at Nabble.com.