Tika-DIH (zip to xml)

2017-07-21 Thread txlap786
# I am trying to extract zip files(which have xml files in it) using DIH

# I can get data and index them from xml like this



















# How can i add TikaEntityProcessor ? I tried like this way





















# And this is the result i got

"--- row #1-",
  "file",
  "YYY.zip",
  "fileSize",
  1124851,
  "fileLastModified",
  "2017-07-21T08:18:23.085Z",
  "fileDir",
  "C:\\Users\\USER\\Desktop\\solr-6.6.0\\example\\exampledocs\\myFiles",
  "fileAbsolutePath",
 
"C:\\Users\\USER\\Desktop\\solr-6.6.0\\example\\exampledocs\\myFiles\\YYY.zip",
  null,
  "-",
  "entity:ext",
  [
"query",
   
"C:\\Users\\USER\\Desktop\\solr-6.6.0\\example\\exampledocs\\myFiles\\YYY.zip",
"time-taken",
"0:0:0.0",
null,
"--- row #1-",
"text",
"http://www.w3.org/1999/xhtml\";>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n
9860029035-201601-Y-00.xml
\r\n\r\n",
null,
"-",
"entity:xml",
[
  "document#1",
  [
"query",
   
"C:\\Users\\USER\\Desktop\\solr-6.6.0\\example\\exampledocs\\myFiles\\YYY.zip",
"time-taken",
"0:0:0.0"

# Please explain how it works



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tika-DIH-zip-to-xml-tp4347122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: finds all documents without a value for field

2017-07-21 Thread Shawn Heisey
On 7/20/2017 3:27 PM, Hendrik Haddorp wrote:
> If the range query is so much better shouldn't the Solr query parser
> create a range query for a token query that only contains the
> wildcard? For the *:* case it does already contain a special path. 

The *:* query is a special string.  Although it *looks* like it has a
wildcard for the field and a wildcard for the value, this is now how the
query parser treats that string.  It is a special "all documents" query
that is *highly* optimized to execute very quickly.

Although it probably could be possible to optimize "field:*" queries to
a range query, there are certain situations in which the wildcard query
*is* the best option ... so if Solr were to optimize it, it might in
fact be *slower*.  Instead of having this optimization, Solr lets you do
whatever you want with the available syntax, even if it's not the best
option.

I can't think of any downside to the optimization for "*:*", which is
very likely why that string is treated specially.

Something to note: You cannot specify a wildcard for the fieldname.  So
"*:searchterm" queries do not work.

Thanks,
Shawn



[POLL] Solr Plugin Improvements - request for feedback

2017-07-21 Thread Jan Høydahl
Hi Solr users and developers,

I am currently working on improving Solr’s plugin/contrib system.
My goal is to make it a breeze to discover and install plugins and to create
a much more vivid 3rd party plugin ecosystem for Apache Solr.

If this interests you then I would love to hear your opinion on the matter.
The easiest way to do so is to answer this online poll (5 minutes):

 https://goo.gl/forms/BsNbD4QrlixGBnqL2



If you have more time and would like to dive into the technicalities, you
can also read the design document[1] and provide feedback directly there
or in the corresponding JIRA issue [2].

I’ll share the result of my work, including the results from this poll
in my Lucene Revolution 2017 talk titled "Solr’s Missing Plugin Ecosystem” [3],
where I hope to see many of you too!

[1] https://s.apache.org/solr-plugin
[2] https://issues.apache.org/jira/browse/SOLR-10665
[3] https://lucenesolrrevolution2017.sched.com/event/BAwj

DISCLAIMER: This poll is conducted by Jan Høydahl, a Lucene/Solr committer and
PMC member, for my LuceneRev talk, and not officially on behalf of the Lucene 
PMC.

- Jan

Re: CDCR - how to deal with the transaction log files

2017-07-21 Thread Amrit Sarkar
Patrick,

Yes! You created default UpdateLog which got written to a disk and then you
changed it to CdcrUpdateLog in configs. I find no reason it would create a
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting
source cluster nodes, the leaders of shard will try to create the same
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel <
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The log
> reader for target collection {collection name} is not initialised" as you
> saw.
>
> It looks like you're creating collections on a regular basis, but for me,
> I create it one time and never again. I've been creating the collection
> first from defaults and then applying the CDCR-aware solrconfig changes
> afterward. It sounds like maybe I need to create the configset in ZK first,
> then create the collections, first on the Target and then on the Source,
> and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this post
> and others on this discussion board many times and have tried so many
> tweaks to configuration, order of steps, etc, all with absolutely no
> success in getting the Source cluster tlogs to delete.  So incredibly
> frustrating.  If anyone has other pearls of wisdom I'd love some advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively) expect
> no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The log
> reader for target collection {collection name} is not initialised".  When I
> reverse the order (create the collection on target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data imports
> that take 5-10 minutes. Again, all data replication, log rollover, and
> autocommit activity seems to work as expected, and logs on target are
> deleted.  It's just those pesky source tlogs I can't get to delete.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4345715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Getting IO Exception while Indexing

2017-07-21 Thread Susheel Kumar
You may want to dig deeper to see what's going on.  It shouldn't be the
case.  Most likely your code is producing the SolrInputDocument in a
different way which is making it fail. You can write SolrInputDocument or
print it in json to compare...

On Fri, Jul 21, 2017 at 1:31 AM, mesenthil1 <
senthilkumar.arumu...@viacomcontractor.com> wrote:

> While debugging following are the findings.
>
> When we send the same document as json, it is getting indexed without an
> issue. When the same document is converted as SolrInputDocument and sent to
> solr using SolrServer, it fails.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Getting-IO-Exception-while-Indexing-Documents-in-SolrCloud-
> tp4346801p4347096.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Issue While indexing Data

2017-07-21 Thread Susheel Kumar
By the way you shouldn't be running solr as root.

On Fri, Jul 21, 2017 at 12:06 AM, rajat rastogi <
rajat.rast...@hindustantimes.com> wrote:

> Hi Shawn ,
> I have Two instances of solr running and my indexing process is in java as
> well .
> PID 15958 is my indexing process.
> PID 4499 is my Solr instance which has Stuck Commits
> PID 9299 is another solr instance which is forking fine
>
> regards
>
> Rajat
>
> On 20-Jul-2017, at 16:40, Shawn Heisey-2 [via Lucene] <
> ml+s472066n4346953...@n3.nabble.com s472066n4346953...@n3.nabble.com>> wrote:
>
> On 7/20/2017 12:29 AM, rajat rastogi wrote:
> > I shared The code base, config , schema with you . Were they of any help
> , or can You point what I am doing wrong in them .
>
> I did not see any schema or config.
>
> The top output shows that you have three large Java processes, all
> running as root.  Which of these is Solr?  Are they all instances of Solr?
>
> Thanks,
> Shawn
>
>
>
> 
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-
> tp4339417p4346953.html
> To unsubscribe from Solr Issue While indexing Data, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=
> unsubscribe_by_code&node=4339417&code=cmFqYXQucmFzdG9naUBoaW5kdXN0YW
> 50aW1lcy5jb218NDMzOTQxN3wtMTQwMjc3NDE5Mg==>.
> NAML NamlServlet.jtp?macro=macro_viewer&id=instant_html%
> 21nabble%3Aemail.naml&base=nabble.naml.namespaces.
> BasicNamespace-nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace&breadcrumbs=
> notify_subscribers%21nabble%3Aemail.naml-instant_emails%
> 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> IMPORTANT NOTICE: "This email is confidential containing HT Media
> confidential information, may be legally privileged, and is for the
> intended recipient only. Access, disclosure, copying, distribution, or
> reliance on any of it by anyone else is prohibited and may be a criminal
> offense. Please delete if obtained in error and email confirmation to the
> sender." Experience news. Like never before. Only on
> www.hindustantimes.com
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4347094.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp

Hi,

when I try to use an atomic update in conjunction with optimistic 
concurrency Solr sometimes complains that the version I passed in does 
not match. The version in my request however match to what is stored and 
what the exception states as the actual version does not exist in the 
collection at all. Strangely this does only happen sometimes but once it 
happens for a collection it seems to stay like that. Any idea why that 
might happen?


I'm using Solr 6.3 in Cloud mode with SolrJ.

regards,
Hendrik


Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Amrit Sarkar
Hendrik,

Can you list down the error snippet so that we can refer the code where
exactly that is happening.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> when I try to use an atomic update in conjunction with optimistic
> concurrency Solr sometimes complains that the version I passed in does not
> match. The version in my request however match to what is stored and what
> the exception states as the actual version does not exist in the collection
> at all. Strangely this does only happen sometimes but once it happens for a
> collection it seems to stay like that. Any idea why that might happen?
>
> I'm using Solr 6.3 in Cloud mode with SolrJ.
>
> regards,
> Hendrik
>


Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp

Hi,

I can't find anything about this in the Solr logs. On the caller side I 
have this:
Error from server at http://x_shard1_replica2: version conflict for 
x expected=1573538179623944192 actual=1573546159565176832
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error 
from server at http://x_shard1_replica2: version conflict for x 
expected=1573538179623944192 actual=1573546159565176832
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1062) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]

...
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://x_shard1_replica2: version conflict for 
x expected=1573538179623944192 actual=1573546159565176832
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.lambda$directUpdate$0(CloudSolrClient.java:742) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_131]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) 
~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - 
shalin - 2016-11-02 19:52:43]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

The version "1573546159565176832" does not exist. It looks a bit like 
the update was first creating a new value and then checks against it.


regards,
Hendrik

On 21.07.2017 18:21, Amrit Sarkar wrote:

Hendrik,

Can you list down the error snippet so that we can refer the code where
exactly that is happening.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp 
wrote:


Hi,

when I try to use an atomic update in conjunction with optimistic
concurrency Solr sometimes complains that the version I passed in does not
match. The version in my request however match to what is stored and what
the exception states as the actual version does not exist in the collection
at all. Strangely this does only happen sometimes but once it happens for a
collection it seems to stay like that. Any idea why that might happen?

I'm using Solr 6.3 in Cloud mode with SolrJ.

regards,
Hendrik





Re: mm = 1 and multi-field searches (update)

2017-07-21 Thread Susheel Kumar
Interesting. If its working for you then its good but to your original
question, qf seems to be working.

Adding to mailing list for the benefit of others.

On Fri, Jul 21, 2017 at 9:41 AM, Michael Joyner  wrote:

> Thanks,
>
> We finally figured out that setting mm=100% doesn't seem to provide the
> desired results across multiple fields.
>
> We switched to using q.op=AND and it seems to work as desired at first
> glance.
>
> We discovered additionally that when mm=100% and we try using an explicit
> OR operator in the queries that the OR operator seems to get ignored and
> that we need to set mm=0 and set q.op=AND for the OR operator to work.
>
> -Mike/NewsRx
>
> On 07/10/2017 05:50 PM, Susheel Kumar wrote:
>
> How are you specifying multiple fields. Use qf parameter to specify
> multiple fields e.g.
> http://localhost:8983/solr/techproducts/select?indent=on&q=Samsung%20Maxtor%20hard&wt=json&defType=edismax&qf=name%20manu&debugQuery=on&mm=1
>
>
> On Mon, Jul 10, 2017 at 4:51 PM, Michael Joyner  
>  wrote:
>
>
> Hello all,
>
> How does setting mm = 1 for edismax impact multi-field searches?
>
> We set mm to 1 and get zero results back when specifying multiple fields
> to search across.
>
> Is there a way to set mm = 1 for each field, but to OR the individual
> field searches together?
>
> -Mike/NewsRx
>
>
>
>
>


solr.core metric not being reported

2017-07-21 Thread Ramsés Morales
Hi all,

I am creating a metric under the solr.core group, for arbitrary core names,
and I can verify that it does exist programmatically. However, it does not
get reported under http://localhost:8983/solr/admin/metrics

I do not have a per core xml reporting configuration restricting reporting.
What am I missing?

Cheers,

Ramsés Morales


Wildcard query difference

2017-07-21 Thread Saurabh Sethi
I have a question in terms of how solr/lucene will lookup terms from
postings list for the below two queries:

1. a*
2. a*gh

My understanding is that for first, it will get all terms starting with 'a'
and issue query on those terms.
For second, it will again get all terms starting with 'a', then remove
those then do not end with 'gh' and issue query on remaining terms.

Please let me know if my understanding is correct. And if not, what am I
missing?

I am trying to do some optimization based on above assumption, that both
these queries will behave differently.

Thanks,
Saurabh


Re: Wildcard query difference

2017-07-21 Thread Erick Erickson
It's the same in both cases:
enumerate all terms that start with "a" and collect them into
(conceptually) a huge OR query and execute it. There's been some work
lately to avoid the TooManyBooleanClauses exception, but it's still
the case that every term starting with "a" has to be examined and
either added to the terms to be searched or not.

Best,
Erick

On Fri, Jul 21, 2017 at 11:34 AM, Saurabh Sethi
 wrote:
> I have a question in terms of how solr/lucene will lookup terms from
> postings list for the below two queries:
>
> 1. a*
> 2. a*gh
>
> My understanding is that for first, it will get all terms starting with 'a'
> and issue query on those terms.
> For second, it will again get all terms starting with 'a', then remove
> those then do not end with 'gh' and issue query on remaining terms.
>
> Please let me know if my understanding is correct. And if not, what am I
> missing?
>
> I am trying to do some optimization based on above assumption, that both
> these queries will behave differently.
>
> Thanks,
> Saurabh


Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Amrit Sarkar
Hendrik,

Ran a little test on 6.3, with infinite atomic updates with optimistic
concurrency,
cannot *reproduce*:

List docs = new ArrayList<>();
> SolrInputDocument document = new SolrInputDocument();
> document.addField("id", String.valueOf(1));
> document.addField("external_version_field_s", System.currentTimeMillis()); // 
> normal update
> docs.add(document);
> UpdateRequest updateRequest = new UpdateRequest();
> updateRequest.add(docs);
> client.request(updateRequest, collection);
> updateRequest = new UpdateRequest();
> updateRequest.commit(client, collection);
>
> while (true) {
> QueryResponse response = client.query(new ModifiableSolrParams().add("q", 
> "id:1"));
> System.out.println(response.getResults().get(0).get("_version_"));
> docs = new ArrayList<>();
> document = new SolrInputDocument();
> document.addField("id", String.valueOf(1));
> Map map = new HashMap<>();
> map.put("set", createSentance(1)); // atomic map value
> document.addField("external_version_field_s", map);
> document.addField("_version_", 
> response.getResults().get(0).get("_version_"));
> docs.add(document);
> updateRequest = new UpdateRequest();
> updateRequest.add(docs);
> client.request(updateRequest, collection);
> updateRequest = new UpdateRequest();
> updateRequest.commit(client, collection);
> }
>
> Maybe you can let us know more details how the update been made?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 10:36 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> I can't find anything about this in the Solr logs. On the caller side I
> have this:
> Error from server at http://x_shard1_replica2: version conflict for
> x expected=1573538179623944192 actual=1573546159565176832
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://x_shard1_replica2: version conflict for x
> expected=1573538179623944192 actual=1573546159565176832
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> hRetryOnStaleState(CloudSolrClient.java:1062)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> ...
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://x_shard1_replica2: version conflict for
> x expected=1573538179623944192 actual=1573546159565176832
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.impl.CloudSolrClient.lambda$dir
> ectUpdate$0(CloudSolrClient.java:742) ~[solr-solrj-6.3.0.jar:6.3.0
> a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:52:43]
> at java.util.concurrent.FutureTask.run(

Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Hendrik Haddorp

Thanks for trying to reproduce my issue.

I'm using a Solr Cloud, my collection was quite small, only a 50-500 
documents, with one shard and a replication factor of 3. I updated all 
of the documents in one request. Beside that the flow is pretty much 
like yours.


The goal of my code was to add a field to all documents that did not 
contain the field already, which should actually have been every 
document in the collection. So the code does a query and gets a few 
fields for each document. Then I use an atomic update to add two fields. 
In the end I send all fields to Solr in one request. The whole thing did 
then loop over a few hundred collections. In most cases it worked just 
fine but in some it failed. If so I could reproduce the issue on that 
collection constantly. In an earlier version I accidentally tried to add 
a field instead of setting it and that had caused an exception as the 
field only allowed a single value. At least in one case the version 
conflict later on happened on a collection that had this issue before. 
don't really think that causes it so just stating for the sake of 
completeness. I was not using a document centric version field but the 
normal _version_ field.


Even though I'm running on Solr 6.3 my config still contains 
"6.2.1". Not sure if that might 
have an effect.


The main code is this:
SolrQuery query = new SolrQuery();
query.setQuery(documentsMatchingQueryQ);
query.setRows(Integer.MAX_VALUE);
query.setFields(documentsMatchingQueryFL);

QueryResponse queryResponse = 
solr.query(collectionName, query);

if (queryResponse.getStatus() != 0) {
LOGGER.error("request failed, 
status: ", queryResponse);
throw new 
IllegalStateException("request failed");

}
LOGGER.info("{} | documents to 
process: {}", collectionName, queryResponse.getResults().getNumFound());
if 
(queryResponse.getResults().getNumFound() > 0) {
LOGGER.trace("{} | start 
processing", collectionName);

List docs = queryResponse.getResults().stream()
.map(documentGenerator::apply)
.peek(doc -> 
LOGGER.trace("{} doc: {}", collectionName, doc))

.collect(Collectors.toList());
if (!reportOnly) {
LOGGER.trace("{} | updating 
solr", collectionName);

solr.add(collectionName, docs);
solr.commit(collectionName, false, false, true);

}
LOGGER.trace("{} | processing 
done", collectionName);

}

The documentGenerator looks like this:
originalDoc -> {
SolrInputDocument doc = new SolrInputDocument();

doc.addField("id", originalDoc.getFieldValue("id"));
doc.addField("__docId__", 
Collections.singletonMap("set", 
originalDoc.getFieldValue("classification") + ":" + 
originalDoc.getFieldValue("id")));
doc.addField("systemModified", 
Collections.singletonMap("set", originalDoc.getFieldValue("lastModified")));
doc.addField("_version_", 
originalDoc.getFieldValue("_version_"));


return doc;
}

On 21.07.2017 22:33, Amrit Sarkar wrote:

Hendrik,

Ran a little test on 6.3, with infinite atomic updates with optimistic
concurrency,
cannot *reproduce*:

List docs = new ArrayList<>();

SolrInputDocument document = new SolrInputDocument();
document.addField("id", String.valueOf(1));
document.addField("external_version_field_s", System.currentTimeMillis()); // 
normal update
docs.add(document);
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.add(docs);
client.request(updateRequest, collection);
updateRequest = new UpdateRequest();
updateRequest.commit(client, collection);

while (true) {
 QueryResponse response = client.query(new ModifiableSolrParams().add("q", 
"id:1"));
 System.out.println(response.getResults().get(0).get("_version_"));
 docs = new ArrayList<>();
 document = new SolrInputDocument();
 document.addField("id", String.valueOf(1));
 Map map = new HashMap<>();
 map.put("set", createSentance(1)); // atomic map value
 document.addField("external_version_field_s", map);
 document.addField("_version_", 
response.getResults().get(0).get("_version_"));
 docs.add(document);
 updateRequest = new UpdateRequest();
 updateRequest.add(docs);
 client.request(updateRequest, collection);
 updateRequest = new UpdateRequest();
 updateRequest.commit(client, collection);
}

Maybe you can let us know more detail

how to generate code from QueryParser.jj file

2017-07-21 Thread Nawab Zada Asad Iqbal
Hi,

I know that we can make changes in the language by editing QueryParser.jj,
however, how does it get generated into java code? Is there any ant target?
'compile' doesn't seem to generate java code for my changes (e.g., adding
lower case logical operators).


Regards
Nawab


Re: how to generate code from QueryParser.jj file

2017-07-21 Thread Nawab Zada Asad Iqbal
ok,  I see there is an `ant javacc` target in some folders, e.g.

1) lucene-solr/solr/build/solr/src-export/solr/core
2) lucene-solr/lucene/queryparser

Both of them use different parser files. I am interested in the QueryParser
at path:
lucene-solr/solr/core/src/java/org/apache/solr/parser/QueryParser.jj

this apparently is getting dropped at:
lucene-solr/solr/build/solr/src-export/solr/core/src/java/org/apache/solr/parser/QueryParser.jj

However, I am not sure what target drops it!


Nawab




On Fri, Jul 21, 2017 at 7:12 PM, Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> I know that we can make changes in the language by editing QueryParser.jj,
> however, how does it get generated into java code? Is there any ant target?
> 'compile' doesn't seem to generate java code for my changes (e.g., adding
> lower case logical operators).
>
>
> Regards
> Nawab
>


Graph Visualizing tool

2017-07-21 Thread mganeshs
Hello Solr Experts,

Does, any one used any tool or plugin to visualize the graph data based
node_ids and edge_ids ? 

Pls suggest,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Graph-Visualizing-tool-tp4347240.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Graph Visualizing tool

2017-07-21 Thread mganeshs
Tried this, but it's not working as expected.

http://solr.pl/en/2016/04/25/graph-visualization-using-solr-6/


Any of you used this or any other tool ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Graph-Visualizing-tool-tp4347240p4347241.html
Sent from the Solr - User mailing list archive at Nabble.com.