Best practice to setup schemas for documents having different structures

2014-11-04 Thread Vishal Sharma
This is something I have been thinking for a long time now. What is the best practice for setting up the Schemas for documents having different fields? Should we just create one schema with lot of fields or multiple schemas for different data structures? Here is an example: I have two objects st

CentOS 7 and solr init script

2014-11-04 Thread Shawn Heisey
This is only partially on-topic for this list ... I'm trying to start Solr. :) I have a handful of programs, one of which is Solr using the example jetty, for which I've written init scripts. They work very well on CentOS 6. I'm trying to set up a new dev solr server with CentOS 7, and none of m

indexing errors when storeOffsetsWithPositions="true" in solr 4.9.1

2014-11-04 Thread Min L
Hi All: I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got errors during indexing. I thought LUCENE-5111 has fixed issues with WordDelimitedFilter. The error is as below: Caused by: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must b

Re: A bad idea to store core data directory over NAS?

2014-11-04 Thread David Santamauro
Interestingly enough, one of our installations has a 16-node cluster using 4 NAS devices (xen as virtualization backbone). The data drive for the individual node that holds the index is a stripe of 2x 500GB disks. Each disk of the stripe is on a different NAS device (scattered pattern). With

Re: A bad idea to store core data directory over NAS?

2014-11-04 Thread Jack Krupansky
Think of Solr/SolrCloud itself as a SAN - smart networked machines that intensely manage local storage. Have two levels of "SAN" is counterproductive. -- Jack Krupansky -Original Message- From: Gili Nachum Sent: Tuesday, November 4, 2014 4:57 PM To: solr-user@lucene.apache.org Subjec

Re: A bad idea to store core data directory over NAS?

2014-11-04 Thread Walter Underwood
I did that once by accident. It was 100X slower. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 4, 2014, at 1:57 PM, Gili Nachum wrote: > My data center is out of SAN or local disk storage - is it a big no-no to > store Solr core data folder over NAS? > Tha

A bad idea to store core data directory over NAS?

2014-11-04 Thread Gili Nachum
My data center is out of SAN or local disk storage - is it a big no-no to store Solr core data folder over NAS? That means 1. Lucene index 2. Transaction log. The NAS mount would be accessed by a single machine. I do care about performance. If I do go with NAS. Should I expect index corruption an

Re: Best Practices for open source pipeline/connectors

2014-11-04 Thread Jürgen Wagner (DVT)
Hello Dan, ManifoldCF is a connector framework, not a processing framework. Therefore, you may try your own lightweight connectors (which usually are not really rocket science and may take less time to write than time to configure a super-generic connector of some sort), any connector out there (

Re: Best Practices for open source pipeline/connectors

2014-11-04 Thread Dan Davis
We are looking at LucidWorks, but also want to see what we can do on our own so we can evaluate the value-add of Lucid Works among other products. On Tue, Nov 4, 2014 at 4:13 PM, Alexandre Rafalovitch wrote: > And, just to get the stupid question out of the way, you prefer to pay > in developer

Re: Best Practices for open source pipeline/connectors

2014-11-04 Thread Alexandre Rafalovitch
And, just to get the stupid question out of the way, you prefer to pay in developer integration time rather than in purchase/maintenance fees? Because, otherwise, I would look at LucidWorks commercial offering first, even to just have a comparison. Regards, Alex. Personal: http://www.outerthou

Re: import 2 mysql tables into Solr 4

2014-11-04 Thread Tim Dunphy
> > Looks right. Just remember to double check the preImportDeleteQuery > documentation to avoid surprises later Sure! Sounds good. Thanks again. On Tue, Nov 4, 2014 at 4:10 PM, Alexandre Rafalovitch wrote: > Looks right. Just remember to double check the preImportDeleteQuery > documentation t

Re: import 2 mysql tables into Solr 4

2014-11-04 Thread Alexandre Rafalovitch
Looks right. Just remember to double check the preImportDeleteQuery documentation to avoid surprises later Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linke

Re: Tika Integration problem with DIH and JDBC

2014-11-04 Thread Alexandre Rafalovitch
Yes, DIH (and used to be Solr schema parser too) is great at ignoring the things it does not know about and just using defaults instead. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularize

Best Practices for open source pipeline/connectors

2014-11-04 Thread Dan Davis
I'm trying to do research for my organization on the best practices for open source pipeline/connectors. Since we need Web Crawls, File System crawls, and Databases, it seems to me that Manifold CF might be the best case. Has anyone combined ManifestCF with Solr UpdateRequestProcessors or DataIm

Re: import 2 mysql tables into Solr 4

2014-11-04 Thread Tim Dunphy
Hey Alexandre, Thanks for the example! This is what worked for me:

Re: Tika Integration problem with DIH and JDBC

2014-11-04 Thread Dan Davis
All, The problem here was that I gave driver="BinURLDataSource" rather than type="BinURLDataSource". Of course, saying driver="BinURLDataSource" caused it not to be able to find it.

Re: Solr Cloud Management Tools

2014-11-04 Thread Alexandre Rafalovitch
SemaText products are usually a good place to start fine tuning your requirements: http://sematext.com/index.html I believe they do trials as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr

Re: import 2 mysql tables into Solr 4

2014-11-04 Thread Alexandre Rafalovitch
I can't remember what document element does, but I am quite sure the entities just need to be side-by-side. See the example from my book: https://github.com/arafalov/solr-indexing-book/blob/master/published/dihdb/conf/dih-definition.xml Notice that you need preImportDeleteQuery for each definitio

Re: Solr Cloud Management Tools

2014-11-04 Thread Michael Della Bitta
http://sematext.com/spm/ Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions | g+: plus.google.com/appinions

Solr Cloud Management Tools

2014-11-04 Thread elangovan palani
Hello.  Can someone suggest SolrCloud Management tool  I'm Looking to gather Collection/Docuements/Shares Metrics and also  to collect Data about the cluster usage on Mem,ReadWrites etc..  Thanks Elan

import 2 mysql tables into Solr 4

2014-11-04 Thread Tim Dunphy
Hey all, I finally got mysql data into Solr 4 with your help. First off, thank you for that! But now I'm hoping to refine the resulting process a bit. What I'm trying to do, now that mysql imports are working, is to import 2 separate tables from the same mysql database. I tried this in my xml

Re: Consul instead of ZooKeeper anyone?

2014-11-04 Thread Jürgen Wagner (DVT)
Hello Greg, we run Zookeeper not on dedicated Zookeeper machines, but rather on admin nodes in search application clusters (that makes two instances), plus on at least one more node that does not have much load (e.g., a crawling node). Also, as long as you don't stuff too much data into Zookeeper

Re: Consul instead of ZooKeeper anyone?

2014-11-04 Thread Shawn Heisey
On 11/4/2014 12:23 PM, Greg Solovyev wrote: > Thanks for the answers Erick. I can see that this is a significant effort and > I am certainly not asking the community to undertake this work. I was > actually going to take a stab at it myself. Regarding $$ savings from not > requiring ZK my assump

Re: Consul instead of ZooKeeper anyone?

2014-11-04 Thread Greg Solovyev
Thanks for the answers Erick. I can see that this is a significant effort and I am certainly not asking the community to undertake this work. I was actually going to take a stab at it myself. Regarding $$ savings from not requiring ZK my assumption is that ZK in production demands a dedicated ho

RE: Missing Records

2014-11-04 Thread AJ Lemke
Another round of tests this morning. Ten rounds of imports all done on the non-leader node: 902294 900089 899267 898127 901945 901055 899638 899392 899880 901812 The expected number of records is 903990 I am getting this error: org.apache.solr.common.SolrException: Bad Request request: http:/

recovery process - node with stale data elected leader

2014-11-04 Thread francois.grollier
Hi, I'm running solrCloud 4.6.0 and I have a question/issue regarding the recovery process. My cluster is made of 2 shards with 2 replicas each. Nodes A1 and B1 are leaders, A2 and B2 followers. I start indexing docs and kill A2. I keep indexing for a while and then kill A1. At this point, th

Re: custom sorting of search result

2014-11-04 Thread Alexandre Rafalovitch
Latest versions of Solr have collapsing and expanding plugins, reranking plugins and post-filters. Some combinations of these seem like it might be relevant. And, of course, there is always carrot2 clustering. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources

Re: Solr authentication

2014-11-04 Thread Alexandre Rafalovitch
Whichever way you run, I just want to remind people that if people have access to Solr, they can issue delete commands and - probably - bunch of other things. If performance is not a critical aspect, I would look at isolating Solr in something like Docker container. Regards, Alex. Personal: ht

Re: Solr authentication

2014-11-04 Thread Chris Hostetter
I am not a security expert, but in my opinion the safest way to run solr "securely" is to forget all about usernames & passwords and instead use SSL with client SSL certificates... https://cwiki.apache.org/confluence/display/solr/Enabling+SSL : Date: Tue, 4 Nov 2014 12:53:30 + : From: Sh

Re: Missing log entries with log4j log rotation

2014-11-04 Thread Shawn Heisey
On 11/4/2014 7:45 AM, Michael Sokolov wrote: > Shawn this is really weird -- we run log4j in lots of installations > and have never seen an issue like this. > > I wonder if you might be running some other log rotation software > (like logrotate) that is somehow getting in the way or conflicting? I

Re: DIH transformer problems

2014-11-04 Thread Alexandre Rafalovitch
On 4 November 2014 10:42, Lemke, Michael ST/HZA-ZSW wrote: > On Tuesday, November 04, 2014 4:07 PM > Alexandre Rafalovitch wrote: >> >>What are you actually trying to do on a business level? > > I am importing a wiki extract and the goal here is to extract the > wiki's language from the filename.

RE: DIH transformer problems

2014-11-04 Thread Lemke, Michael ST/HZA-ZSW
On Tuesday, November 04, 2014 4:07 PM Alexandre Rafalovitch wrote: > >What are you actually trying to do on a business level? I am importing a wiki extract and the goal here is to extract the wiki's language from the filename. The language is also in an attribute within the imported xml but it

Re: DIH transformer problems

2014-11-04 Thread Alexandre Rafalovitch
What are you actually trying to do on a business level? Maybe that's something that can be handled better by sticking an UpdateRequestProcessor chain _after_ DIH? As to your configuration, you have xxCONTENT column definition twice. It might be working, but I think it is non-deterministic. For ila

Re: Missing log entries with log4j log rotation

2014-11-04 Thread Michael Sokolov
Shawn this is really weird -- we run log4j in lots of installations and have never seen an issue like this. I wonder if you might be running some other log rotation software (like logrotate) that is somehow getting in the way or conflicting? -Mike On 11/01/2014 01:45 PM, Shawn Heisey wrote:

Re: Solr authentication

2014-11-04 Thread Tim Dunphy
Shay, > Thanks for the quick response. No problem. > > 1. I'm using Solr with Jetty. > Yes. I got that from the fact that you were running Solr over port 8983. That's the Jetty port. I just didn't mention that in the email cuz I thought it was pretty obvious. :) But what I am sayin

DIH transformer problems

2014-11-04 Thread Lemke, Michael ST/HZA-ZSW
I am having a little fight with the DataImportHandler and the application of RegexTransformer and TemplateTransformer. A stripped down version of what I try in data-config.xml, which is taken pretty much from the various solr wikis:

RE: Solr authentication

2014-11-04 Thread Shay Sofer
Thanks for the quick response. 1. I'm using Solr with Jetty. 2. I'm using Java to access Solr, so I need a way to pass / add this authentication as well. -Original Message- From: Tim Dunphy [mailto:bluethu...@gmail.com] Sent: Tuesday, November 04, 2014 3:22 PM To: so

Re: Solr authentication

2014-11-04 Thread Tim Dunphy
Hi Shay, I'm new to using Solr myself. But what I've done to solve this problem is to run Solr via Tomcat. Then I put Apache in front of Tomcat using mod_jk and made Solr accessible via SSL on port 443. I also put basic authentication in front of Apache. That way you have to enter a username an

Solr authentication

2014-11-04 Thread Shay Sofer
Hi, I want that my Solr web connection will be protected by username and password. When someone try to get to - 1.1.1.1:8983/Solr, he can do it only after login (with known users). Is it possible ? Thanks, Shay.

Analytics result for each Result Group

2014-11-04 Thread Talat Uyarer
Hi folks, We use Analytics Component for median, max etc. I wonder if I use "group.field" parameter with analytics component, How to calculate analytics for each result group ? Thanks -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.