Re: how to index 20 MB plain-text xml

2014-03-30 Thread primoz . skale
Hi! I had the same issue with XML files. Even small XML files produced OOM exception. I read that the way XMLs are parsed can sometimes blow up memory requirements to such values that java runs out of heap. My solution was: 1. Don't parse XML files 2. Parse only small XML files and hope for th

Re: Updating an entry in Solr

2013-11-13 Thread primoz . skale
Yes, that's correct. You can also update document "per field" but all fields need to be stored=true, because Solr (version >= 4.0) first gets your document from the index, creates new document with modified field, and adds it again to the index... Primoz From: gohome190 To: solr-user@

Re: Adding a server to an existing SOLR cloud cluster

2013-11-11 Thread primoz . skale
According to the wiki pages it should, but I have not really tried it yet - I like to make the "bookeeping" myself :) I am sorry but someones with more knowledge of Solr will have to answer your question. Primoz From: ade-b To: solr-user@lucene.apache.org Date: 11.11.2013 15:44 Subj

Re: Adding a server to an existing SOLR cloud cluster

2013-11-11 Thread primoz . skale
Try manually creating shard replicas on the new server. I think the new server is only used automatically when you start you Solr server instance with "correct command line" option (aka. -DnumShards) - I never liked this kind of behaviour. The server is not present in clusterstate.json file,

Re: A few questions about solr and tika

2013-10-17 Thread primoz . skale
Everythink about Tika extraction is written under those links. Basicaly what you need is the following: 1) requestHandler for Tika in solrconfig.xml 2) keep all the fields in schema.xml that are needed for Tika (they are marked in example schema.xml) and set those you don't need to indexed=fals

Re: A few questions about solr and tika

2013-10-17 Thread primoz . skale
Why don't you check these: - Content extraction with Apache Tika ( http://www.youtube.com/watch?v=ifgFjAeTOws) - ExtractingRequestHandler ( http://wiki.apache.org/solr/ExtractingRequestHandler) - Uploading Data with Solr Cell using Apache Tika ( https://cwiki.apache.org/confluence/display/solr/Upl

Re: SolrCloud Performance Issue

2013-10-16 Thread primoz . skale
Query result cache hit might be low due to using NOW in bf. NOW is always translated to current time and that of course changes from ms to ms... :) Primoz From: Shamik Bandopadhyay To: solr-user@lucene.apache.org Date: 17.10.2013 00:14 Subject:SolrCloud Performance Issue Hi

Re: howto increase indexing speed?

2013-10-16 Thread primoz . skale
I think DIH uses only one core per instance. IMHO 300 doc/sec is quite good. If you would like to use more cores you need to use solrj. Or maybe more than one DIH and more cores of course. Primoz From: Giovanni Bricconi To: solr-user Date: 16.10.2013 16:25 Subject:howto incr

Re: Error when i want to create a CORE

2013-10-16 Thread primoz . skale
Can you try with a directory path that contains *no* spaces. Primoz From: raige To: solr-user@lucene.apache.org Date: 16.10.2013 14:46 Subject:Error when i want to create a CORE I install the version solr 4.5 on windows. I launch with Jetty web server the example. I have no

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
Hm, good question. I haven't really done any upgrading yet, because I just reinstall and reindex everything. I would replace jars with the new ones (if needed - check release notes for version 4.4.0 and 4.5.0 where all the versions of external tools [tika, maven, etc.] are stated) and deploy the

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
>>> Also, another issue that needs to be raised is the creation of cores from >>> the "core admin" section of the gui, doesnt really work well, it creates >>> files but then they do not work (again i am using 4.4) >From my experience "core admin" section of the GUI does not work well in SolrClo

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
Yap, you are right - I only created extra replicas with cores API. For a new shard I had to use "split shard" command. My apologies. Primož From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 10:45 Subject:Re: Regarding Solr Cloud issue... If the i

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
If I am not mistaken the only way to create a new shard from a collection in 4.4.0 was to use cores API. That worked fine for me until I used *other* cores API commands. Those usually produced null ranges. In 4.5.0 this is fixed with newly added commands "createshard" etc. to the collections A

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-15 Thread primoz . skale
I will certainly try, but give me some time :) Primoz From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 07:05 Subject:Re: Cores with lot of folders with prefix index.XXX I think that's an acceptable strategy. Can you put up a patch? On Tue, O

Re: Regarding Solr Cloud issue...

2013-10-15 Thread primoz . skale
I sometimes also do get null ranges when doing colletions/cores API actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed because zkCli had problems with "putfile" command, but in 4.5.0 it works OK. All you have to do is "download" clusterstate.json from ZK ("get /clusterstate

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-15 Thread primoz . skale
I have a question for developers of Solr regarding the issue of "left-over" index folders when replication fails. Could be this issue resolved quickly if when replication starts Solr creates a "flag file" in "index." folder and when replication ends (and commits) this file is deleted? In th

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Thanks, I guess I was wrong after all in my last post. Primož From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 11.10.2013 12:43 Subject:Re: Cores with lot of folders with prefix index.XXX There are open issues related to extra index.XXX folders lying ar

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Honestly I don't know for sure if you can delete then. Maybe make a backup then delete them and see if it still works :) Replication works differently in SolrCloud world as I currently know. I don't think there are any additional index.* folders because fallback does not work in SolrCloud (some

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
I think this is connected to replications being made? I also have quite some of them but currently I am not worried :) Primož From: yriveiro To: solr-user@lucene.apache.org Date: 11.10.2013 11:54 Subject:Cores with lot of folders with prefix index.XXX Hi, I have some c

Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
If you want to deploy basic authentication in a way that a login is required when creating collections it is only a simple matter of constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this link will help: http://stackoverflow.com/questions/5323855/jetty-webserver-security/533204

Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
One possible solution is to "firewall" access to SolrCloud server(s). Only proxy/load-balacing servers should have unrestricted access to Solr infrastructure. Then you can implement basic/advanced authentication on the proxy/LB side. Primož From: maephisto To: solr-user@lucene.apache.

Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check this site: http://wiki.apache.org/solr/SolrSecurity Even "master-slave replication architecture" (*not* SolrCloud) works for me. There could be some problems with *cross-shard* queries etc. though (see SOLR-1861, SOLR-3421).

Re: Collection API wrong configuration

2013-10-09 Thread primoz . skale
Works fine at my end. I use Solr 4.5.0 on Windows 7. I tried: >zkcli.bat -cmd upconfig -zkhost localhost:9000 -d ..\solr\collection2\conf -n my_custom_collection >java -Djetty.port=8001 -DzkHost=localhost:9000 -jar start.jar and finally http://localhost:8001/solr/admin/collections?action=CRE

Re: Hardware dimension for new SolrCloud cluster

2013-10-08 Thread primoz . skale
I think Mr. Erickson summarized the issue of hardware sizing quite well in the following article: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best regards, Primož From: Henrik Ossipoff Hansen To: "solr-user@lucene.apache.org"