Re: How to limit the number of result sets of the 'export' handler

2015-01-06 Thread Sandy Ding
Thanks Alexandre. I actually need the whole result set. But it is large(perhaps 10m-100m) and I find select is slow. How does export differ from select except that select will make distributed requests and do the merge? Will select with ‘distrib=false’ have comparable performance with export? 201

Re: cloudsolrserver

2015-01-06 Thread tharpa
Thanks Anshum. If you say that "connect using CloudSolrServer" is more correct than saying, "connect to CloudSolrServer", I believe you. -- View this message in context: http://lucene.472066.n3.nabble.com/cloudsolrserver-tp4177724p4177728.html Sent from the Solr - User mailing list archive at

Re: cloudsolrserver

2015-01-06 Thread Anshum Gupta
To get started, the ref guide should be helpful. https://cwiki.apache.org/confluence/display/solr/Using+SolrJ You just need to pass the Zk host string to the constructor and then use the server. Also, what do you mean by *connect to CloudSolrServer*? you mean connect using, right? On Tue, Jan

Re: Vertical search Engine

2015-01-06 Thread Dominique Bejean
Hi, You can have a look at www.crawl-anywhere.com A web crawler on top of Solr. Used for following vertical search engines : http://www.hurisearch.org/ http://www.searchamnesty.org/ Regards Dominique 2015-01-06 15:22 GMT+01:00 Ahmet Arslan : > Hi, > > http://manifoldcf.apache.org is another

cloudsolrserver

2015-01-06 Thread tharpa
We are switching from a direct HTTP connection to use cloudsolrserver. I have looked and failed for an example of code for connecting to cloudsolrserver. Are there any tutorials or code examples? -- View this message in context: http://lucene.472066.n3.nabble.com/cloudsolrserver-tp4177724.htm

Re: SOLR - any open source framework

2015-01-06 Thread Vishal Swaroop
Thanks a lot... We are in the process of analyzing what to use with SOLR... On Jan 6, 2015 5:30 PM, "Roman Chyla" wrote: > We've compared several projects before starting - AngularJS was on them, > it is great for stuff where you could find components (already prepared) > but writing custom comp

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
We've compared several projects before starting - AngularJS was on them, it is great for stuff where you could find components (already prepared) but writing custom components was easier in other framworks (you need to take this statement with grain of salt: it was specific to our situation), but

Re: SOLR - any open source framework

2015-01-06 Thread Vishal Swaroop
Thanks Roman... I will check it... Maybe it's off topic but how about Angular... On Jan 6, 2015 5:17 PM, "Roman Chyla" wrote: > Hi Vishal, Alexandre, > > Here is another one, using Backbone, just released v1.0.16 > > https://github.com/adsabs/bumblebee > > you can see it in action: http://ui.adsl

Re: SOLR - any open source framework

2015-01-06 Thread Vishal Swaroop
Great... Thanks for the inputs... I explored Velocity respond writer some posts suggest it is good for prototyping but not for production... On Jan 6, 2015 4:59 PM, "Alexandre Rafalovitch" wrote: > That's very general question. So, the following are three random ideas > just to get you started to

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
Hi Vishal, Alexandre, Here is another one, using Backbone, just released v1.0.16 https://github.com/adsabs/bumblebee you can see it in action: http://ui.adslabs.org/ While it primarily serves our own needs, I tried to architect it to be extendible (within reasonable limits of code, man power)

Re: SOLR - any open source framework

2015-01-06 Thread Erick Erickson
There's also the VelocityResponseWriter that comes with Solr. It takes some effort to modify, but not a lot. It's useful for very fast iterations. Best, Erick On Tue, Jan 6, 2015 at 1:58 PM, Alexandre Rafalovitch wrote: > That's very general question. So, the following are three random ideas > j

Re: SOLR - any open source framework

2015-01-06 Thread Alexandre Rafalovitch
That's very general question. So, the following are three random ideas just to get you started to think of options. *) spring.io (Spring Data Solr) + Vaadin *) http://gethue.com/ (it's primarily Hadoop, but has Solr UI builder too) *) http://projectblacklight.org/ Regards, Alex. Sign up

Re: solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Chris Hostetter
: #1 is a trade off against being possibly more available to writes in the case : of a single down node. In the cloud case, you're still open for business. In : the classical replication case, you're no longer available for writes if the : downed node is the master. or to put it another way: clas

Re: solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Michael Della Bitta
The downsides that come to mind: 1. Every write gets amplified by the number of nodes in the cloud. 1000 write requests end up creating 1000*N HTTP calls as the leader forwards those writes individually to all of the followers in the cloud. Contrast that with classical replication where only c

SOLR - any open source framework

2015-01-06 Thread Vishal Swaroop
I am new to SOLR and was able to configure, run samples as well as able to index data using DIH (from database). Just wondering if there are open source framework to query and display/visualize. Regards

solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Will Milspec
Hi all, We have a smallish index that performs well for searches and are considering using solrcloud --but just for high availability/redundancy, i.e. without any sharding. The indexes would be replicated, but not distributed. I know that "there are no stupid questions..Only stupid people"...but

Re: .htaccess / password

2015-01-06 Thread Michael Della Bitta
The Jetty servlet container that Solr uses doesn't understand those files. It would not use them to determine access, and would likely make them accessible to web requests in plain text. On 1/6/15 16:01, Craig Hoffman wrote: Thanks Otis. Do think a .htaccess / .passwd file in the Solr admin di

Re: .htaccess / password

2015-01-06 Thread Craig Hoffman
Thanks Otis. Do think a .htaccess / .passwd file in the Solr admin dir would interfere with its operation? -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman > On Jan 6, 2015, at 1:09 PM, Otis G

facet.contains

2015-01-06 Thread Will Butler
https://issues.apache.org/jira/browse/SOLR-1387 contains a patch to support facet.contains and facet.contains.ignoreCase, making it possible to easily filter facet results without the facet.prefix limitations. I know that it is possible to appro

RE: Solr Memory Usage - How to reduce memory footprint for solr

2015-01-06 Thread Toke Eskildsen
Abhishek Sharma [abhishe...@unbxd.com] wrote: > *Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i > keep this low, my CPU hits 100% and response time for indexing increases a > lot.. And i have hit OOM Error as well when this value is low.. [...] > 2. Index Size - 2 g >

Solr Memory Usage - How to reduce memory footprint for solr

2015-01-06 Thread Abhishek Sharma
*Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i keep this low, my CPU hits 100% and response time for indexing increases a lot.. And i have hit OOM Error as well when this value is low.. Is this too high? If so, how can I reduce this? *Machine Details* 4 G RAM, SSD *Solr

Re: .htaccess / password

2015-01-06 Thread Otis Gospodnetic
Hi Craig, If you want to protect Solr, put it behind something like Apache / Nginx / HAProxy and put .htaccess at that level, in front of Solr. Or try something like http://blog.jelastic.com/2013/06/17/secure-access-to-your-jetty-web-application/ Otis -- Monitoring * Alerting * Anomaly Detection

RE: .htaccess / password

2015-01-06 Thread Ganesh.Yadav
Craig, 1. What is .htaccess file meant for? 2. What are the contents inside this file? 3. How will you or how Solr knows that it needs to look for this file to bring in the needed security to this (which) area? 4. What event is causing for you to re-index the engine eve

RE: PDF search functionality using Solr Schema.xml and SolrConfig.xml question

2015-01-06 Thread Ganesh.Yadav
Still looking for answer on Schema.xml and SolrConfig.xml 1. Do I need to tell Solr, to extract Title from PDF, go look for Title word and extract entire line after the Tag and collect all such occurrence’s from hundreds of PDFs and build the Title column data and index it? 2. How

Re: htaccess

2015-01-06 Thread Gora Mohanty
Hi, Your message seems quite confused (even the URL is not right for most normal Solr setup), and it is not clear as to what you mean by "function properly". Solr is a search engine, and has no idea about .htacess files. Are you asking whether Solr respects directives in .htaccess files? I am pre

Re: How large is your solr index?

2015-01-06 Thread Erick Erickson
Have you considered pre-supposing SolrCloud and using the SPLITSHARD API command? Even after that's done, the sub-shard needs to be physically moved to another machine (probably), but that too could be scripted. May not be desirable, but I thought I'd mention it. Best, Erick On Tue, Jan 6, 2015

Re: PDF search functionality using Solr

2015-01-06 Thread Erick Erickson
Seconding Jürgen's comment. 4G docs are almost, but not quite totally useless to search How many JIRA's each? That's _one_ document unless you do some fancy dancing. Pulling the data directly using the JIRA API sounds far superior. If you _must_ use the JIRA->PDF->Solr option, consider the followi

Re: How large is your solr index?

2015-01-06 Thread Peter Sturge
Yes, totally agree. We run 500m+ docs in a (non-cloud) Solr4, and it even performs reasonably well on commodity hardware with lots of faceting and concurrent indexing! Ok, you need a lot of RAM to keep faceting happy, but it works. ++1 for the automagic shard creator. We've been looking into doing

RE: Running Multiple Solr Instances

2015-01-06 Thread Ganesh.Yadav
Nishanth, 1. I understand you are implementing clustering for the web apps which is running the same application on multiple different instances on one or more machines. 2. If each of your web apps start pointing to the different index directory, how it will switch to the next web

.htaccess / password

2015-01-06 Thread Craig Hoffman
Quick question: If put a .htaccess file in www.mydomin.com/8983/solr/#/ will Solr continue to function properly? One thing to note, I will have a CRON job that runs nightly that re-indexes the engine. In a nutshell I’m looking for a way to secure this area. Thanks, Craig -- Craig Hoffman w: ht

RE: PDF search functionality using Solr Schema.xml and SolrConfig.xml question

2015-01-06 Thread Ganesh.Yadav
Thanks Jürgen for your quick reply. Still looking for answer on Schema.xml and SolrConfig.xml 1. Do I need to tell Solr, to extract Title from PDF, go look for Title word and extract entire line after the Tag and collect all such occurrence’s from hundreds of PDFs and build the Title co

Re: Running Multiple Solr Instances

2015-01-06 Thread Nishanth S
Thanks a lot guys.As a begineer these are very helpful fo rme. Thanks, Nishanth On Tue, Jan 6, 2015 at 5:12 AM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > I would do one of either: > > 1. Set a different Solr home for each instance. I'd use the > -Dsolr.solr.home=/d/2 comm

htaccess

2015-01-06 Thread Craig Hoffman
Quick question: If put a .htaccess file in www.mydomin.com/8983/solr/#/ will Solr continue to function properly? One thing to note, I will have a CRON job that runs nightly that re-indexes the engine. In a nutshell I’m looking for a way to secure this area. Thanks, Craig -- Craig Hoffman w: ht

Re: PDF search functionality using Solr

2015-01-06 Thread Jürgen Wagner (DVT)
Hello, no matter which search platform you will use, this will pose two challenges: - The size of the documents will render search less and less useful as the likelihood of matches increases with document size. So, without a proper semantic extraction (e.g., using decent NER or relationship extr

Re: Solr on HDFS in a Hadoop cluster

2015-01-06 Thread Otis Gospodnetic
Oh, and https://issues.apache.org/jira/browse/SOLR-6743 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 12:52 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi Charles, > > See

Re: Solr on HDFS in a Hadoop cluster

2015-01-06 Thread Otis Gospodnetic
Hi Charles, See http://search-lucene.com/?q=solr+hdfs and https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 11:02 AM, Cha

PDF search functionality using Solr

2015-01-06 Thread Ganesh.Yadav
Hello Solr-users and developers, Can you please suggest, 1. What I should do to index PDF content information column wise? 2. Do I need to extract the contents using one of the Analyzer, Tokenize and Filter combination and then add it to Index? How can test the results on command pr

Re: IstvanKulcsar - Wiki Solr

2015-01-06 Thread Shawn Heisey
On 1/6/2015 7:28 AM, ikulc...@precognox.com wrote: > I would like suggest pages which use SOLR and developed my company. > > Please put this page this site: > http://wiki.apache.org/solr/PublicServers > > http://www.odrportal.hu/kereso/ > http://idea.unideb.hu/idealista/ > http://www.jobmonitor.h

Solr on HDFS in a Hadoop cluster

2015-01-06 Thread Charles VALLEE
I am considering using Solr to extend Hortonworks Data Platform capabilities to search. - I found tutorials to index documents into a Solr instance from HDFS, but I guess this solution would require a Solr cluster distinct to the Hadoop cluster. Is it possible to have a Solr integrated into the

IstvanKulcsar - Wiki Solr

2015-01-06 Thread ikulcsar
Hy, I would like suggest pages which use SOLR and developed my company. Please put this page this site: http://wiki.apache.org/solr/PublicServers http://www.odrportal.hu/kereso/ http://idea.unideb.hu/idealista/ http://www.jobmonitor.hu http://www.profession.hu/ http://webicina.com/ http://www.c

Re: Vertical search Engine

2015-01-06 Thread Ahmet Arslan
Hi, http://manifoldcf.apache.org is another option to consider. It is useful to crawl projected pages. Free resources : http://www.manning.com/wright/ManifoldCFinAction_manuscript.pdf https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/ Ahmet On Tuesday, January 6, 2015 1:56 PM, Jack

RE: Frequent deletions

2015-01-06 Thread Amey Jadiye
Well, we are doing same thing(in a way). we have to do frequent deletions in mass, at a time we are deleting around 20M+ documents.All i am doing is after deletion i am firing the below command on each of our solr node and keep some patience as it take way much time. curl -vvv "http://node1.so

Re: How to limit the number of result sets of the 'export' handler

2015-01-06 Thread Alexandre Rafalovitch
Export was specifically designed to get everything which is very expensive otherwise. If you just want the subset, you might be better off with normal queries and/or with deep paging (cursor). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 6 Jan

Re: Running Multiple Solr Instances

2015-01-06 Thread Michael Della Bitta
I would do one of either: 1. Set a different Solr home for each instance. I'd use the -Dsolr.solr.home=/d/2 command line switch when launching Solr to do so. 2. RAID 10 the drives. If you expect the Solr instances to get uneven traffic, pooling the drives will allow a given Solr instance to s

Re: Vertical search Engine

2015-01-06 Thread Jack Krupansky
Consider the Fusion product from LucidWorks: http://lucidworks.com/product/fusion/ Structuring of your data should be driven by your queries and access patterns - what are the most common queries and what are the most extreme and complex queries that you expect to handle, both tin terms of the que

Re: edismax with multiple words for keyword tokenizer splitting on space

2015-01-06 Thread Jack Krupansky
You need to escape the space in your query (using backslash or quotes around the term) - the query parser doesn't parse based on the analyzer/tokenizer for each field. -- Jack Krupansky On Tue, Jan 6, 2015 at 4:05 AM, Sankalp Gupta wrote: > Hi > I come across this weird behaviour in solr. I'm n

Re: FOSDEM Open source search devroom

2015-01-06 Thread Charlie Hull
On 02/01/2015 08:37, Bram Van Dam wrote: Hi folks, There will be an Open source search devroom[1] at this year's FOSDEM in Brussels, 31st of January & 1st of February. I don't know if there will be a Lucene/Solr presence (there's no schedule for the dev room yet), but this seems like a good pla

Solr startup script in version 4.10.3

2015-01-06 Thread Dominique Bejean
Hi, In release 4.10.3, the following lines were removed from solr starting script (bin/solr) # TODO: see SOLR-3619, need to support server or example # depending on the version of Solr if [ -e "$SOLR_TIP/server/start.jar" ]; then DEFAULT_SERVER_DIR=$SOLR_TIP/server else DEFAULT_SERVER_DIR=$SO

edismax with multiple words for keyword tokenizer splitting on space

2015-01-06 Thread Sankalp Gupta
Hi I come across this weird behaviour in solr. I'm not sure that why this is desired in solr. I have filed this on stackoverflow. Please check http://stackoverflow.com/questions/27795177/edismax-with-multiple-words-for-keyword-tokenizer-splitting-on-space Thanks Sankalp Gupta

Re: Vertical search Engine

2015-01-06 Thread Furkan KAMACI
Hi, You should estimate the size of the data you will index before you decide crawler. Crawler is out of scope at this mail list. If you will crawl big size of data you can check Apache Nutch user list. Furkan KAMACI 2015-01-06 10:39 GMT+02:00 klunwebale : > hello > > i want to create a vertica

Vertical search Engine

2015-01-06 Thread klunwebale
hello i want to create a vertical search engine like trovit.com. I have installed solr and solarium. What else to i need can you recommend a suitable crawler and how to structure my data to be indexed -- View this message in context: http://lucene.472066.n3.nabble.com/Vertical-search-Eng

Re: Running Multiple Solr Instances

2015-01-06 Thread Shawn Heisey
On 1/5/2015 9:31 PM, Nishanth S wrote: > I am running multiple solr instances (Solr 4.10.3 on tomcat 8).There are > 3 physical machines and I have 4 solr instances running on each machine > on ports 8080,8081,8082 and 8083.The set up is well up to this point.Now I > want to point each of thes