How are you ingesting documents? ExtractingRequestHandler? That loads all the work to the Solr node(s) you might want to consider using SolrJ as that gives you much more control as well as the ability to farm out the work to N clients.
Another blog: https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick P.S. Glad you found the problem, but it's a little weird. Solr already talks UTF-8 so this should "just work", but then I'm not familiar with all the details of your setup. On Sun, Jul 12, 2015 at 10:11 AM, Tarala, Magesh <mtar...@bh.com> wrote: > I narrowed down the cause. And it is a character issue! > > The .msg file content I'm extracting using Tika parser has this text (daƱos) > If I remove the character n with the tilde, it works. > > Explicitly convert to UTF-8 before sending it to solr? > > Erick - I'm in the QA phase. I'll be ingesting around 800K documents total > (word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates > when we first go to prod end of month. i.e., capture all the new and modified > documents on a daily basis and update solr. Once we get a grasp of things, we > want to go near real time. Thanks for the link to your post. It is very > helpful. > > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Sunday, July 12, 2015 11:24 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr cloud error during document ingestion > > Probably not related to your problem, but if you're sending lots of docs at > Solr, committing every 100 is very aggressive. > I'm assuming you're committing from the client, which, while OK doesn't scale > very well if you ever decide to have more than > 1 client sending docs. > > I'd recommend setting your hard commit to a minute or so and just leaving it > at that if possible, with soft committing to make the docs visible. > > Here's more than you ever wanted to know about soft commits, hard commits and > such: > https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Best, > Erick > > On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev > <mkhlud...@griddynamics.com> wrote: >> I suggest to check >> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2 >> <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?up >> date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2 >> Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2> >> logs to find root cause. >> >> On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh <mtar...@bh.com> wrote: >> >>> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection >>> with 3 shards and 2 replicas each. >>> I'm ingesting solr documents via solrj. >>> >>> While ingesting the documents, I get the following error: >>> >>> 264147944 [updateExecutor-1-thread-268] ERROR >>> org.apache.solr.update.StreamingSolrServers ? error >>> org.apache.solr.common.SolrException: Bad Request >>> >>> request: >>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 >>> at >>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> I commit after every 100 documents in solrj. >>> And I also have the following solrconfig.xml setting: >>> <autoCommit> >>> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> >>> <openSearcher>false</openSearcher> >>> </autoCommit> >>> >>> >>> IMO, tlogs (for serviceorder_shard1_replica2) are not too big >>> -rw-r--r-- 1 solr users 8338 Jul 11 21:40 tlog.0000000000000000364 >>> -rw-r--r-- 1 solr users 6385 Jul 11 21:40 tlog.0000000000000000365 >>> -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.0000000000000000366 >>> -rw-r--r-- 1 solr users 5981 Jul 11 21:41 tlog.0000000000000000367 >>> -rw-r--r-- 1 solr users 2682 Jul 11 21:41 tlog.0000000000000000368 >>> -rw-r--r-- 1 solr users 8515 Jul 11 21:42 tlog.0000000000000000369 >>> -rw-r--r-- 1 solr users 7373 Jul 11 21:42 tlog.0000000000000000370 >>> -rw-r--r-- 1 solr users 6907 Jul 11 21:42 tlog.0000000000000000371 >>> -rw-r--r-- 1 solr users 5524 Jul 11 21:42 tlog.0000000000000000372 >>> -rw-r--r-- 1 solr users 5600 Jul 11 21:43 tlog.0000000000000000373 >>> >>> >>> So far I've not been able to resolve this issue. Any ideas / pointers >>> would be greatly appreciated! >>> >>> Thanks, >>> Magesh >>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Principal Engineer, >> Grid Dynamics >> >> <http://www.griddynamics.com> >> <mkhlud...@griddynamics.com>