Re: Error creating collection
I am also facing this issue recently. Any solution to fix this issue? I have almost 3000+ core created and adding some more. Please suggest if there is restriction on the core numbers and shard and collection. Here is trace: Jun 23, 2014 9:01:45 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'test_core_3005': Could not get shard_id for core: test_core_3005 at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:521) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.solr.common.SolrException: Could not get shard_id for core: core_t3nant778_com at org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:995) at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1053) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:662) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:517) ... 21 more -- View this message in context: http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4143444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error creating collection
Thanks Eric for your suggestion. It helped me by increasing the znode data size from 1M to 2M. Here is the reference for the same to change this configuration: https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html I used this parameter in the JAVA_OPTS -Djute.maxbuffer=2M which helped me to get going. Will also go over other suggestions you mentioned about reducing the core number and other. Thanks again. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4143621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Hello, Andy, so did you get final answer to your quetion? I am also trying to do something similar. Please give me pointers if you have any. Basically even I need to use Ngram with WhitespaceTokenizer any help will be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/NGramFilterFactory-for-auto-complete-that-matches-the-middle-of-multi-lingual-tags-tp1619234p2459466.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.5 Optimization takes index file size almost double
Hi Viresh, How much free disc space do you have? if you have dont have enough space on disc, optimization process stops and rollsback to some intermediate state. Pravin On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi wrote: > Hi Rafal > > Here i attached solr index file snapshot as well .. > So can you look into this and any another information required regarding > it then let me know. > > > Thanks& Regards, > Viresh modi > Mobile: 91 (0) 9714567430 > > > On 13 June 2013 17:41, Rafał Kuć wrote: > >> Hello! >> >> Do you have some backup after commit in your configuration? It would >> also be good to see how your index directory looks like, can you list >> that ? >> >> -- >> Regards, >> Rafał Kuć >> Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch >> >> > Thanks Rafal for reply... >> >> > I agree with you. But Actually After optimization , it does not reduce >> size >> > and it remains double. so is there any thing we missed or need to do for >> > achieving index size reduction ? >> >> > Is there any special setting we need to configure for replication? >> >> >> >> >> > On 13 June 2013 16:53, Rafał Kuć wrote: >> >> >> Hello! >> >> >> >> Optimize command needs to rewrite the segments, so while it is >> >> still working you may see the index size to be doubled. However after >> >> it is finished the index size will be usually lowered comparing to the >> >> index size before optimize. >> >> >> >> -- >> >> Regards, >> >> Rafał Kuć >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch >> >> >> >> > Hi, >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I >> upgrade >> >> solr >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains same. >> >> > But when optimize index for Solr 3.5.0 instance its size reaches >> 791GB.so >> >> > what is solutions for size remains same or lesser. >> >> > I optimize Solr 3.5 with Query: >> >> > /update?optimize=true&commit=true >> >> >> >> > Thanks & regards >> >> > Viresh Modi >> >> >> >> >> >> > > -- > This email and its attachments are intended for the above named only and > may be confidential. If they have come to you in error you must take no > action based on them, nor must you copy or show them to anyone; please > reply to this email and highlight the error. >
Re: Solr 3.5 Optimization takes index file size almost double
One thing that you can try is optimize incrementally. Instead of optimizing to 1 segment, optimize to 100, then 50 , 25, 10 ,5 ,2 ,1 After each step, the index size should go down. This way you dont have to wait 7 hours to get some results. Pravin On Fri, Jun 14, 2013 at 10:45 AM, Viresh Modi < viresh.m...@highqsolutions.com> wrote: > Hi pravin > > I have nearly 2 TB Disk space for optimization.And after optimization get > response of Qtime nearly 7hours (Obvious which in milisecond).So i think > not issue of disk space. > > > Thanks& Regards, > Viresh modi > Mobile: 91 (0) 9714567430 > > > On 14 June 2013 20:10, Pravin Bhutada wrote: > > > Hi Viresh, > > > > How much free disc space do you have? if you have dont have enough space > > on disc, optimization process stops and rollsback to some intermediate > > state. > > > > > > Pravin > > > > > > > > > > On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi < > > viresh.m...@highqsolutions.com > > > wrote: > > > > > Hi Rafal > > > > > > Here i attached solr index file snapshot as well .. > > > So can you look into this and any another information required > regarding > > > it then let me know. > > > > > > > > > Thanks& Regards, > > > Viresh modi > > > Mobile: 91 (0) 9714567430 > > > > > > > > > On 13 June 2013 17:41, Rafał Kuć wrote: > > > > > >> Hello! > > >> > > >> Do you have some backup after commit in your configuration? It would > > >> also be good to see how your index directory looks like, can you list > > >> that ? > > >> > > >> -- > > >> Regards, > > >> Rafał Kuć > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch > > >> > > >> > Thanks Rafal for reply... > > >> > > >> > I agree with you. But Actually After optimization , it does not > reduce > > >> size > > >> > and it remains double. so is there any thing we missed or need to do > > for > > >> > achieving index size reduction ? > > >> > > >> > Is there any special setting we need to configure for replication? > > >> > > >> > > >> > > >> > > >> > On 13 June 2013 16:53, Rafał Kuć wrote: > > >> > > >> >> Hello! > > >> >> > > >> >> Optimize command needs to rewrite the segments, so while it is > > >> >> still working you may see the index size to be doubled. However > after > > >> >> it is finished the index size will be usually lowered comparing to > > the > > >> >> index size before optimize. > > >> >> > > >> >> -- > > >> >> Regards, > > >> >> Rafał Kuć > > >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch > > >> >> > > >> >> > Hi, > > >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I > > >> upgrade > > >> >> solr > > >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains > > same. > > >> >> > But when optimize index for Solr 3.5.0 instance its size reaches > > >> 791GB.so > > >> >> > what is solutions for size remains same or lesser. > > >> >> > I optimize Solr 3.5 with Query: > > >> >> > /update?optimize=true&commit=true > > >> >> > > >> >> > Thanks & regards > > >> >> > Viresh Modi > > >> >> > > >> >> > > >> > > >> > > > > > > -- > > > This email and its attachments are intended for the above named only > and > > > may be confidential. If they have come to you in error you must take no > > > action based on them, nor must you copy or show them to anyone; please > > > reply to this email and highlight the error. > > > > > > > -- > > -- > This email and its attachments are intended for the above named only and > may be confidential. If they have come to you in error you must take no > action based on them, nor must you copy or show them to anyone; please > reply to this email and highlight the error. >
Spellchecker issue related to exact match of query in spellcheck index
Hi All, I am trying to use file based spellchecker in solr 3.4 version and facing below issue. My dictionary file contains following terms abcd abcde abcdef abcdefg However, when checking spelling for abcd, it gives suggestion abcde even though the word abcd is present in dictionary file. Here is sample output. http://10.88.36.192:8080/solr/spell?spellcheck.build=true&spellcheck=true&spellcheck.collate=true&q=abcd − − − 1 0 4 − abcde abcde I am expecting spell checker to give no suggestion if the word is already present in the dictionary, however it’s not the case as given above. I am using configuration as given below. Please let me know if I am missing something or its expected behavior. Also please let me know what should be done to get my desired output (i.e. no suggestion if word is already in dictionary). Thanks in advance. Configuration: spellcheck_text solr.FileBasedSpellChecker default score spellings.txt UTF-8 . /spellcheckerFile false false 1 spellcheck Schema.xml has following fieldtype Thanks Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Spellchecker issue related to exact match of query in spellcheck index
Hi James, Thanks a lot for your reply. The workaround that you suggested is working fine of me. Hope to see this enhancement in future releases of solr. -Pravin From: Dyer, James [james.d...@ingrambook.com] Sent: Monday, December 19, 2011 11:11 PM To: solr-user@lucene.apache.org Subject: RE: Spellchecker issue related to exact match of query in spellcheck index Pravin, When using the "file-based" spell checking option, it will try to give you suggestions for every query term regardless of whether or not thwy are in your spelling dictionary. Getting the behavior you want would seem to be a worthy enhancement, but I don't think it is currently supported. You might be able to work around this if you could get your dictionary terms in the index and then use the "index-based" option instead. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Pravin Agrawal [mailto:pravin_agra...@persistent.co.in] Sent: Saturday, December 17, 2011 4:51 AM To: solr-user@lucene.apache.org Cc: Tushar Adeshara Subject: Spellchecker issue related to exact match of query in spellcheck index Hi All, I am trying to use file based spellchecker in solr 3.4 version and facing below issue. My dictionary file contains following terms abcd abcde abcdef abcdefg However, when checking spelling for abcd, it gives suggestion abcde even though the word abcd is present in dictionary file. Here is sample output. http://10.88.36.192:8080/solr/spell?spellcheck.build=true&spellcheck=true&spellcheck.collate=true&q=abcd − − − 1 0 4 − abcde abcde I am expecting spell checker to give no suggestion if the word is already present in the dictionary, however it’s not the case as given above. I am using configuration as given below. Please let me know if I am missing something or its expected behavior. Also please let me know what should be done to get my desired output (i.e. no suggestion if word is already in dictionary). Thanks in advance. Configuration: spellcheck_text solr.FileBasedSpellChecker default score spellings.txt UTF-8 . /spellcheckerFile false false 1 spellcheck Schema.xml has following fieldtype Thanks Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Performance improvement for solr faceting on large index
Hi All, We are using solr 3.4 with following schema fields. --- --- The index on above schema is distributed on two solr shards with each index size of about 1.2 million, and size on disk of about 195GB per shard. We want to retrieve (site, autoSuggestContent term, frequency of the term) information from our above main solr index. The site is a field in document and contains name of site to which that document belongs. The terms are retrieved from multivalued field autoSuggestContent which is created using shingles from content and title of the web page. As of now, we are using facet query to retrieve (term, frequency of term) for each site. Below is a sample query (you may ignore initial part of query) http://localhost:8080/solr/select?indent=on&q=*:*&fq=site:www.abc.com&start=0&rows=0&fl=id&qt=dismax&facet=true&facet.field=autoSuggestContent&facet.mincount=25&facet.limit=-1&facet.method=enum&facet.sort=index The problem is that with increase in index size, this method has started taking huge time. It used to take 7 minutes per site with index size of 0.4 million docs but takes around 60-90 minutes for index size of 2.5 million(). With this speed, it will take around 5-6 days to index complete 1500 sites. Also we are expecting the index size to grow with more documents and more sites and as such time to get the above information will increase further. Please let us know if there is any better way to extract (site, term, frequency) information compare to current method. Thanks, Pravin Agrawal DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Performance improvement for solr faceting on large index
Thanks Yuval and Otis for the reply. Yuval: I tried different combination of facet.method (fc and enum) and filtercache size but there was not much improvement in the processing time. Otis: We have a plan in future to move this processing out of solr but it will be a large code change at this point in time. I know that outputting unitgram can be expensive, but we need to keep them :(. The memory of the solr server that we are using is 128GB out of which we have assigned 64 GB to solr. We observed that solr threads are using 100% CPU when request is in process. We are trying to divide this index further on 4 shards to reduce the index size per shard. Need to ask few more questions that we have a large number of unique terms in our index so whether facet method fc is better or enum? and Can a large facet.enum.cache.minDf value help ? Thanks, Pravin Agrawal -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Friday, November 23, 2012 6:37 AM To: solr-user@lucene.apache.org Subject: Re: Performance improvement for solr faceting on large index Hi, I don't quite follow what you are trying gyroscope do, but it almost sounds like you may be better off using something other than Solr if all you are doing is filtering by site and counting something. I see unigrams in what looks like it could be a big field and that's a red flag. Your index is quite big - how much memory have you got? Do those queries produce a lot of disk IO. I have a feeling they do. If so, your shards may be too large for your hardware. Otis -- _ From: Yuval Dotan [yuvaldo...@gmail.com] Sent: Thursday, November 22, 2012 7:34 PM To: solr-user@lucene.apache.org Subject: Re: Performance improvement for solr faceting on large index you could always try the fc facet method and maybe increase the filtercache size On Thu, Nov 22, 2012 at 2:53 PM, Pravin Agrawal < pravin_agra...@persistent.co.in> wrote: > Hi All, > > We are using solr 3.4 with following schema fields. > > > --- > > positionIncrementGap="100"> > > > > maxShingleSize="5" outputUnigrams="true"/> > pattern="^([0-9. ])*$" replacement="" > replace="all"/> > > > > > > > > > > indexed="true" multiValued="true"/> > > > > > > > > > --- > > The index on above schema is distributed on two solr shards with each > index size of about 1.2 million, and size on disk of about 195GB per shard. > > We want to retrieve (site, autoSuggestContent term, frequency of the term) > information from our above main solr index. The site is a field in document > and contains name of site to which that document belongs. The terms are > retrieved from multivalued field autoSuggestContent which is created using > shingles from content and title of the web page. > > As of now, we are using facet query to retrieve (term, frequency of term) > for each site. Below is a sample query (you may ignore initial part of > query) > > > http://localhost:8080/solr/select?indent=on&q=*:*&fq=site:www.abc.com&start=0&rows=0&fl=id&qt=dismax&facet=true&facet.field=autoSuggestContent&facet.mincount=25&facet.limit=-1&facet.method=enum&facet.sort=index > > The problem is that with increase in index size, this method has started > taking huge time. It used to take 7 minutes per site with index size of > 0.4 million docs but takes around 60-90 minutes for index size of 2.5 > million(). With this speed, it will take around 5-6 days to index complete > 1500 sites. Also we are expecting the index size to grow with more > documents and more sites and as such time to get the above information will > increase further. > > Please let us know if there is any better way to extract (site, term, > frequency) information compare to current method. > > Thanks, > Pravin Agrawal > > > > > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is > the property of Persistent Systems Ltd. It is intended only for the use of > the individual or entity to which it is addressed. If you are not the > intended recipient, you are not authorized to read, retain, copy, print, > distribute or use this message. If you have received this communication in > error, please noti
Problem Multi word synonyms in solr 3.4
Hi All, I am trying to use synonyms in solr 3.4 and facing below issue with multiword synonyms. I am using edismax query parser with following fields in qf and pf qf: name^1.2,name_synonym^0.5 pf: phrase_name^3 The analyzers that I am using for name_synonym is as follows With above configuration the below type of synonyms works fine foobar => foo bar FnB => foo and bar aaa,bbb,ccc However for following multiword synonym, the dismax query is incorrectly formed for qf field xxx zzz, aaa bbb, mmm nnn, aaabbb The parsedquery_tostring that gets formed for the query aaabbb is as follows +(name:aaabbb^1.2 | name_synonym:" xxx zzz aaa bbb mmm (nnn aaabbb)"^0.5)~0.5 (phrase_name:" xxx zzz aaa bbb mmm (nnn aaabbb)"~5^3.0)~0.5 I am expecting a query like +(name:aaabbb^1.2 | ((name_synonym:xxx zzz name_synonym:aaa bbb name_synonym:mmm nnn name_synonym:aaabbb)^0.5))~0.5 Similarly for query xxx zzz I am getting following parsedquery_tostring from dismax +((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0)~0.5 But I m expecting following query +((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0 | phrase_name:"aaa bbb"~5^3.0 | phrase_name:"mmm nnn"~5^3.0 | phrase_name:"aaabbb"~5^3.0)~0.5 However it's not the case. Please let me know if I am missing something or its expected behavior. Also please let me know what should be done to get my desired output. Thanks in advance. Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Performance problem with DIH in solr 3.3
Hi All, I am using Delta import handler(solr 3.3) to index data from my database (using 19 tables) Total Number of solr documents that get created from these 19 table is 444 Total number of request send to data source during clean full import is 91083. My problem is that, DIH makes too many calls and puts load on my database. 1. Can we batch these calls ? 2. Can we use view instead? If yes can I get some examples to use view with DIH 3. What kind of locks SOLR DIH acquire while querying DB? Note: we are using both Full-import and delta-import handler. Thanks in advance Pravin Agrawal DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: delay while adding document to solr index
Swapna, Your answers are inline. 2009/9/30 swapna_here : > > hi all, > > I have indexed 10 documents (daily around 5000 documents will be indexed > one at a time to solr) > at the same time daily few(around 2000) indexed documents (added 30 days > back) will be deleted using DeleteByQuery of SolrJ > Previously each document used to be indexed within 5ms.. > but recently i am facing a delay (sometimes 2min to 10 min) while adding > document to index. > And my index (folder) size is also increased to 625MB which is very large > Previously it was around 230MB > > My Questions are: > > 1) is solr not deleting the older documents(added 30 days back) permenently > from index event after committing Have you run optimize? > 2)Why the index size is increased If 5000 docs are added daily and only 2000 deleted, the index size would increase because of the remaining 3000 documents. > 3)reason for delay (2min to 10 mins) while adding the document one at a time > to index I don't know why this would happen. Is your disk nearly full? Which OS are you running on? What is the configuration of Solr? > Help is appreciated > > Thanks in advance.. > > -- > View this message in context: > http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html > Sent from the Solr - User mailing list archive at Nabble.com. > > Hope this helps Pravin
Re: Solr Porting to .Net
You may want to check out - http://code.google.com/p/solrnet/ 2009/9/30 Antonio Calò : > Hi All > > I'm wondering if is already available a Solr version for .Net or if it is > still under development/planning. I've searched on Solr website but I've > found only info on Lucene .Net project. > > Best Regards > > Antonio > > -- > Antonio Calò > -- > Software Developer Engineer > @ Intellisemantic > Mail anton.c...@gmail.com > Tel. 011-56.90.429 > -- >
Re: delay while adding document to solr index
Also, what is your merge factor set to? Pravin 2009/9/30 Pravin Paratey : > Swapna, > > Your answers are inline. > > 2009/9/30 swapna_here : >> >> hi all, >> >> I have indexed 10 documents (daily around 5000 documents will be indexed >> one at a time to solr) >> at the same time daily few(around 2000) indexed documents (added 30 days >> back) will be deleted using DeleteByQuery of SolrJ >> Previously each document used to be indexed within 5ms.. >> but recently i am facing a delay (sometimes 2min to 10 min) while adding >> document to index. >> And my index (folder) size is also increased to 625MB which is very large >> Previously it was around 230MB >> >> My Questions are: >> >> 1) is solr not deleting the older documents(added 30 days back) permenently >> from index event after committing > > Have you run optimize? > >> 2)Why the index size is increased > > If 5000 docs are added daily and only 2000 deleted, the index size > would increase because of the remaining 3000 documents. > >> 3)reason for delay (2min to 10 mins) while adding the document one at a time >> to index > > I don't know why this would happen. Is your disk nearly full? Which OS > are you running on? What is the configuration of Solr? > >> Help is appreciated >> >> Thanks in advance.. >> >> -- >> View this message in context: >> http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > Hope this helps > Pravin >
Re: delay while adding document to solr index
Swapna While the disk space does increase during the process of optimization, it should almost always return to the original size or slightly less. This is a silly question. But off the top of my head, I can't think of any other reason why the index size would increase - Are you running a after adding documents? If you are, you might want to compare the size of each document being currently indexed with the ones you indexed a few months back. To optimize the index, simply post to Solr. Or read [http://wiki.apache.org/solr/SolrOperationsTools] Pravin 2009/9/30 swapna_here : > > thanks for your reply > i have not optimized at all > my knowledge is optimize improves the query performance but it will take > more disk space > except that i have no idea how to use it > > previously for 10 documents the size occupied was around 250MB > > But after 2 months it is 625MB > > why this happened ? > is it because i have not optimized the index > can any body tell me when and how to optimize the index(with configuration > details) . > -- > View this message in context: > http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25678531.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Solr Quries
Hi, I am new to solr. I have following queries : 1. Is solr work in distributed environment ? if yes, how to configure it? 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) Thanks in advance -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
how to post(index) large file of 5 GB or greater than this
Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Solr Quries
Thanks for your help. Can you please provide detail configuration for solr distributed environment. How to setup master and slave ? for this in which file/s I have to do changes ? What are the shard parameters ? Can we integrate zookeeper with this ? Please provide details for this. Thanks in advance. -Pravin -Original Message- From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com] Sent: Wednesday, October 07, 2009 4:29 PM To: solr-user@lucene.apache.org Subject: Re: Solr Quries Hi Pravin, 1. Is solr work in distributed environment ? if yes, how to configure it? Yep. You can achieve this with Sharding. For example: Install and Configure Solr on two machines and declare any one of those as master. Insert shard parameters while you index and search your data. 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) Sorry. No idea. 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) I think, XML is not the best way. I don't suggest it. If you have that 1 TB data in a database you can achieve this simply using full import command. Configure your DB details in solr-config.xml and data-config.xml and add you DB driver jar to solr lib directory. Now import the data in slices (say dept wise, or in some category wise..). In future, you can import the data from a DB or you can index the data directly using client-API with simple java beans. Hope this info helps you. Regards, Sandeep Tagore -- View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Solr Quries
Thanks for your reply. I have one more query regarding solr distributed environment. I have configured solr on to machine as per http://wiki.apache.org/solr/DistributedSearch But I have following test case - Suppose I have two machine ,Sever1 ,Server2 I have post record with id 1 on sever1 and put other record on server2 with same id i.e. 1 So when I gives query like http://sever1:8983/solr/select?shards=server1:8983/solr,server2:8983/solr&; &q=1 this gives result from server1 http://server2:8983/solr/select?shards=server2:8983/solr,server1/solr&q=1 this gives result from server2 how to solve this.. Is any other setting is required for this ? Thanks in advance -Pravin -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, October 07, 2009 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Solr Quries First, please do not cross-post messages to both solr-dev and solr-user. Solr-dev is only for development related discussions. Comments inline: On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne wrote: > Hi, > I am new to solr. I have following queries : > > > 1. Is solr work in distributed environment ? if yes, how to configure > it? > Yes, Solr works in distributed environment. See http://wiki.apache.org/solr/DistributedSearch > > > > 2. Is solr have Hadoop support? if yes, how to setup it with > Hadoop/HDFS? (Note: I am familiar with Hadoop) > > Not currently. There is some work going on at https://issues.apache.org/jira/browse/SOLR-1457 > > > 3. I have employee information(id, name ,address, cell no, personal > info) of 1 TB ,To post(index)this data on solr server, shall I have to > create xml file with this data and then post it to solr server? Or is there > any other optimal way? In future my data will grow upto 10 TB , then how > can I index this data ?(because creating xml is more headache ) > > XML is just one way. You could use also CSV. If you use, the Solrj java client with Solr 1.4 (soon to be released), it uses an efficient binary format for posting data to Solr. -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
how to deploy index on solr
Hi I have index data with Lucene. I want to deploy this indexes on solr for search. Generally we index and search data with Solr, but now I want to just search with Lucene indexes. How can we do this ? -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
dose solr sopport distribute index storage ?
Hi, I am new to solr. I have configured solr successfully and its working smoothly. I have one query: I want index large data(around 100GB).So can we store these indexes on different machine as distributed system. So there will be one master and more slave . Also we have to keep these data in sync over all the node. So when I send update request solr will update that record from corresponding node. In short I want to create scalable and optimal search system. Is this possible with solr? Please help in this. Any pointer regarding this will be highly appreciated. Thanks in advance -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: dose solr sopport distribute index storage ?
How to set master/slave setup for solr. What are the configuration steps for this? -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, October 09, 2009 6:51 PM To: solr-user@lucene.apache.org Subject: Re: dose solr sopport distribute index storage ? On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne wrote: > Hi, > I am new to solr. I have configured solr successfully and its working > smoothly. > > I have one query: > > I want index large data(around 100GB).So can we store these indexes on > different machine as distributed system. > > Are you talking about one large index with 100GB of data? Or do you plan to shard the data into multiple smaller indexes and use Solr's distributed search? > So there will be one master and more slave . Also we have to keep these > data in sync over all the node. > > So when I send update request solr will update that record from > corresponding node. > > Solr will not update corresponding node automatically. You have to make sure to send the add/delete request to the master of the correct shard. Solr does not support update operation (it is always a replace by uniqueKey). > In short I want to create scalable and optimal search system. > > Is this possible with solr? > > Of course you can create a scalable and optimal search system with Solr. We do that all the time ;) -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: dose solr sopport distribute index storage ?
I am looking for one large index with 100GB of data. How to store this on distribute system. -Thanks -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, October 09, 2009 6:51 PM To: solr-user@lucene.apache.org Subject: Re: dose solr sopport distribute index storage ? On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne wrote: > Hi, > I am new to solr. I have configured solr successfully and its working > smoothly. > > I have one query: > > I want index large data(around 100GB).So can we store these indexes on > different machine as distributed system. > > Are you talking about one large index with 100GB of data? Or do you plan to shard the data into multiple smaller indexes and use Solr's distributed search? > So there will be one master and more slave . Also we have to keep these > data in sync over all the node. > > So when I send update request solr will update that record from > corresponding node. > > Solr will not update corresponding node automatically. You have to make sure to send the add/delete request to the master of the correct shard. Solr does not support update operation (it is always a replace by uniqueKey). > In short I want to create scalable and optimal search system. > > Is this possible with solr? > > Of course you can create a scalable and optimal search system with Solr. We do that all the time ;) -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
hadoop configuarions for SOLR-1301 patch
Hi, I am using SOLR-1301 path. I have build the solr with given patch. But I am not able to configure Hadoop for above war. I want to run solr(create index) with 3 nodes (1+2) cluster. How to do the Hadoop configurations for above patch? How to set master and slave? Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: hadoop configuarions for SOLR-1301 patch
Hi, Patch(SOLR-1301) provides distributed indexing (using Hadoop). Now I have Hadoop cluster with 1 master and 2 slaves. Also I have applied above path to solr and build solr. So how I integrate above solr executables with Hadoop cluster? Can u please tell what are the steps for this. Shall I just have copy solr war to Hadoop cluster or what else ? (Note: I have two setup : 1. Hadoop setup 2. Solr setup) So to run distributed indexing how to bridge these two setup? Thanks -Pravin -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Friday, October 16, 2009 7:45 AM To: solr-user@lucene.apache.org Subject: Re: hadoop configuarions for SOLR-1301 patch Hi Pravin, You'll need to setup a Hadoop cluster which is independent of SOLR-1301. 1301 is for building Solr indexes only, so there isn't a master and slave. After building the indexes one needs to provision the indexes to Solr servers. In my case I only have slaves because I'm not incrementally indexing on the Hadoop generated shards. 1301 does need a Hadoop specific unit test, which I got started and need to complete, that could help a little in understanding. -J On Wed, Oct 14, 2009 at 5:45 AM, Pravin Karne wrote: > Hi, > I am using SOLR-1301 path. I have build the solr with given patch. > But I am not able to configure Hadoop for above war. > > I want to run solr(create index) with 3 nodes (1+2) cluster. > > How to do the Hadoop configurations for above patch? > How to set master and slave? > > > Thanks > -Pravin > > > > > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. > DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Solr configuration with Text files
AFAIK, you're going to have to code something up. Do remember to add CDATA tags to your xml. On Tue, Mar 10, 2009 at 11:31 PM, KennyN wrote: > > This functionality is possible 'out of the box', right? Or am I going to > need > to code up something that reads in the id named files and generates the xml > file? > -- > View this message in context: > http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html > Sent from the Solr - User mailing list archive at Nabble.com. > >