Re: Using solr(cloud) as source-of-truth for data (with no backing external db)

2016-11-18 Thread Dorian Hoxha
@alex That makes sense, but it can be ~fixed by just storing every field that you need. @Walter Many of those things are missing from many nosql dbs yet they're used as source of data. As long as the backup is "point in time", meaning consistent timestamp across all shards it ~should be ok for man

Combined Dismax and Block Join Scoring on nested documents

2016-11-18 Thread Mike Allen
Apologies if I'm doing something incredibly stupid as I'm new to Solr. I am having an issue with scoring child documents in a block join query when including a dismax query. I'm actually a little unclear on whether or not that's a complete oxymoron, combining dismax and block join. Problem stat

Re: Using solr(cloud) as source-of-truth for data (with no backing external db)

2016-11-18 Thread Alexandre Rafalovitch
Sure. And the people do it. Especially for their first deployment. I have some prototypes/proof-of-concepts like that myself. Just later don't say you didn't ask and we didn't tell :-) Regards, Alex. Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG New

json facet api and facet.threads

2016-11-18 Thread Michael Aleythe, Sternwald
Hi Everybody, can anyone point me in the right direction for using "facet.threads" with the json faceting-api? Does it only work if terms facets are exclusively used in the query? Best regards Michael Aleythe Java Entwickler | STERNWALD SYSTEMS GMBH

Re: Combined Dismax and Block Join Scoring on nested documents

2016-11-18 Thread Mikhail Khludnev
Hello Mike, Structured queries in Solr are way cumbersome. Start from: q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&... beside of "explain" there is a parsed query entry in debug that's more useful for trou

SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Sebastian Riemer
Hi all, I am looking to improve indexing speed when loading many documents as part of an import. I am using the SolrJ-Client and currently I add the documents one-by-one using HttpSolrClient and its method add(SolrInputDocument doc, int commitWithinMs). My first step would be to change that t

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Shawn Heisey
On 11/18/2016 6:00 AM, Sebastian Riemer wrote: > I am looking to improve indexing speed when loading many documents as part of > an import. I am using the SolrJ-Client and currently I add the documents > one-by-one using HttpSolrClient and its method add(SolrInputDocument doc, > int commitWithi

Re: Bkd tree numbers/geo on solr 6.3 ?

2016-11-18 Thread Dorian Hoxha
Looks like it needs https://issues.apache.org/jira/browse/SOLR-8396 . On Thu, Nov 17, 2016 at 2:41 PM, Dorian Hoxha wrote: > Hi, > > I've read that lucene 6 has fancy bkd-tree implementation for numbers. But > on latest cwiki I only see TrieNumbers. Aren't they implemented or did I > miss someth

Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Marek Ščevlík
Hello. My name is Marek Scevlik. Currently I am working for a small company where we are interested in implementing your Sorl 6.3 search engine. We are hoping to take out from the original source package the Data Import Request Handler into its own project and create a usable .jar file out of

RE: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Davis, Daniel (NIH/NLM) [C]
Marek, I've wanted to do something like this in the past as well. However, a rewrite that supports the same XML syntax might be better. There are several problems with the design of the Data Import Handler that make it not quite suitable: - Not designed for Multi-threading - Bad implementati

Re: field set up help

2016-11-18 Thread Comcast
Perfect. Just had to wrap the pho curl request URL with urlencode and it worked Sent from my iPhone > On Nov 17, 2016, at 5:56 PM, Kris Musshorn wrote: > > This q={!prefix f=metatag.date}2016-10 returns zero records > > -Original Message- > From: KRIS MUSSHORN [mailto:mussho...@comcast

Re: Detecting schema errors while adding documents

2016-11-18 Thread Shawn Heisey
On 11/16/2016 11:02 AM, Mike Thomsen wrote: > We're stuck on Solr 4.10.3 (Cloudera bundle). Is there any way to detect > with SolrJ when a document added to the index violated the schema? All we > see when we look at the stacktrace for the SolrException that comes back is > that it contains message

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Erick Erickson
Here's some numbers for batching improvements: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ And I totally agree with Shawn that for 40K documents anything more complex is probably overkill. Best, Erick On Fri, Nov 18, 2016 at 6:02 AM, Shawn Heisey wrote: > On 11/18/2016

Best Way to Read A Nested Structure from Solr?

2016-11-18 Thread Jennifer Coston
Hello, I am sure there have been many discussions on the best way to do this, but I am lost and need your advice. I have a nested Solr Document containing multiple levels of sub-documents. Here is a JSON example so you can see the full structure: { "id": "Test Library", "description": "exa

Index and search on PDF text using Solr

2016-11-18 Thread vascaino90
Hello, i'm new in Solr and i have a big problem. I have many text documents in PDF format (more than 1) and I need to create a site with this PDFs. In this site, I have to create a search by any terms in this PDFs. I don't have idea how to start. Anyone can help me? Thank you so much. -- Vi

Re: Index and search on PDF text using Solr

2016-11-18 Thread Erick Erickson
see the section in the Solr Reference Guide: "Uploading Data with Solr Cell using Apache Tika" here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika to get a start. The basic idea is to use Apache Tika to parse the PDF file and then stuff the data

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Alexandre Rafalovitch
Is your goal to still index into Solr? It was not clear. If yes, then it has been discussed quite a bit. The challenge is that DIH is integrated into AdminUI, which makes it easier to see the progress and set some flags. Plus the required jars are loaded via solrconfig.xml, just like all other ext

CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Chetas Joshi
Hi, I have a SolrCloud (on HDFS) of 50 nodes and a ZK quorum of 5 nodes. The SolrCloud is having difficulties talking to ZK when I am ingesting data into the collections. At that time I am also running queries (that return millions of docs). The ingest job is crying with the the following exceptio

Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Erick Erickson
The clusterstate on Zookeeper shouldn't be changing very often, only when nodes come and go. bq: At that time I am also running queries (that return millions of docs). As in rows=milions? This is an anti-pattern, if that's true then you're probably network saturated and the like. If you mean your

Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Chetas Joshi
Thanks Erick. The numFound is millions but I was also trying with rows= 1 Million. I will reduce it to 500K. I am sorry. It is state.json. I am using Solr 5.5.0 One of the things I am not able to understand is why my ingestion job is complaining about "Cannot talk to ZooKeeper - Updates are disa

Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Shawn Heisey
On 11/18/2016 6:50 PM, Chetas Joshi wrote: > The numFound is millions but I was also trying with rows= 1 Million. I will > reduce it to 500K. > > I am sorry. It is state.json. I am using Solr 5.5.0 > > One of the things I am not able to understand is why my ingestion job is > complaining about "Ca