Re: Error creating collection

2014-06-23 Thread pravin
I am also facing this issue recently. 
Any solution to fix this issue? I have almost 3000+ core created and adding
some more.
Please suggest if there is restriction on the core numbers and shard and
collection.

Here is trace:

Jun 23, 2014 9:01:45 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore
'test_core_3005': Could not get shard_id for core: test_core_3005
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:521)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.solr.common.SolrException: Could not get shard_id for
core: core_t3nant778_com
at
org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:995)
at
org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1053)
at
org.apache.solr.core.CoreContainer.register(CoreContainer.java:662)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:517)
... 21 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4143444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error creating collection

2014-06-23 Thread pravin
Thanks Eric for your suggestion.
It helped me by increasing the znode data size from 1M to 2M. 
Here is the reference for the same to change this configuration:
https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html

I used this parameter in the JAVA_OPTS -Djute.maxbuffer=2M which helped me
to get going.

Will also go over other suggestions you mentioned about reducing the core
number and other. Thanks again.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4143621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2011-02-09 Thread pravin

Hello,
Andy, so did you get final answer to your quetion?
I am also trying to do something similar. Please give me pointers if you
have any.
Basically even I need to use Ngram with WhitespaceTokenizer any help will be
appreciated.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/NGramFilterFactory-for-auto-complete-that-matches-the-middle-of-multi-lingual-tags-tp1619234p2459466.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.5 Optimization takes index file size almost double

2013-06-14 Thread Pravin Bhutada
Hi Viresh,

How much free disc space do you have?  if you have dont have enough space
on disc, optimization process stops and rollsback to some intermediate
state.


Pravin




On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi  wrote:

> Hi Rafal
>
> Here i attached solr index file snapshot as well ..
> So can you look into this and any another information required regarding
> it then let me know.
>
>
> Thanks&  Regards,
> Viresh modi
> Mobile: 91 (0) 9714567430
>
>
> On 13 June 2013 17:41, Rafał Kuć  wrote:
>
>> Hello!
>>
>> Do you have some backup after commit in your configuration? It would
>> also be good to see how your index directory looks like, can you list
>> that ?
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>>
>> > Thanks Rafal for reply...
>>
>> > I agree with you. But Actually After optimization , it does not reduce
>> size
>> > and it remains double. so is there any thing we missed or need to do for
>> > achieving index size reduction ?
>>
>> > Is there any special setting we need to configure for replication?
>>
>>
>>
>>
>> > On 13 June 2013 16:53, Rafał Kuć  wrote:
>>
>> >> Hello!
>> >>
>> >> Optimize command needs to rewrite the segments, so while it is
>> >> still working you may see the index size to be doubled. However after
>> >> it is finished the index size will be usually lowered comparing to the
>> >> index size before optimize.
>> >>
>> >> --
>> >> Regards,
>> >>  Rafał Kuć
>> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>> >>
>> >> > Hi,
>> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
>> upgrade
>> >> solr
>> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains same.
>> >> > But when optimize index for Solr 3.5.0 instance its size reaches
>> 791GB.so
>> >> > what is solutions for size remains same or lesser.
>> >> > I optimize Solr 3.5 with Query:
>> >> > /update?optimize=true&commit=true
>> >>
>> >> > Thanks & regards
>> >> > Viresh Modi
>> >>
>> >>
>>
>>
>
> --
> This email and its attachments are intended for the above named only and
> may be confidential. If they have come to you in error you must take no
> action based on them, nor must you copy or show them to anyone; please
> reply to this email and highlight the error.
>


Re: Solr 3.5 Optimization takes index file size almost double

2013-06-14 Thread Pravin Bhutada
One thing that you can try is optimize incrementally. Instead of optimizing
to 1 segment, optimize to 100, then 50 , 25, 10 ,5 ,2 ,1
After each step, the index size should go down. This way you dont have to
wait 7 hours to get some results.


Pravin


On Fri, Jun 14, 2013 at 10:45 AM, Viresh Modi <
viresh.m...@highqsolutions.com> wrote:

> Hi pravin
>
> I have nearly 2 TB Disk space for optimization.And  after optimization get
> response of Qtime nearly 7hours (Obvious which  in milisecond).So i think
> not issue of disk space.
>
>
> Thanks&  Regards,
> Viresh modi
> Mobile: 91 (0) 9714567430
>
>
> On 14 June 2013 20:10, Pravin Bhutada  wrote:
>
> > Hi Viresh,
> >
> > How much free disc space do you have?  if you have dont have enough space
> > on disc, optimization process stops and rollsback to some intermediate
> > state.
> >
> >
> > Pravin
> >
> >
> >
> >
> > On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi <
> > viresh.m...@highqsolutions.com
> > > wrote:
> >
> > > Hi Rafal
> > >
> > > Here i attached solr index file snapshot as well ..
> > > So can you look into this and any another information required
> regarding
> > > it then let me know.
> > >
> > >
> > > Thanks&  Regards,
> > > Viresh modi
> > > Mobile: 91 (0) 9714567430
> > >
> > >
> > > On 13 June 2013 17:41, Rafał Kuć  wrote:
> > >
> > >> Hello!
> > >>
> > >> Do you have some backup after commit in your configuration? It would
> > >> also be good to see how your index directory looks like, can you list
> > >> that ?
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> > >>
> > >> > Thanks Rafal for reply...
> > >>
> > >> > I agree with you. But Actually After optimization , it does not
> reduce
> > >> size
> > >> > and it remains double. so is there any thing we missed or need to do
> > for
> > >> > achieving index size reduction ?
> > >>
> > >> > Is there any special setting we need to configure for replication?
> > >>
> > >>
> > >>
> > >>
> > >> > On 13 June 2013 16:53, Rafał Kuć  wrote:
> > >>
> > >> >> Hello!
> > >> >>
> > >> >> Optimize command needs to rewrite the segments, so while it is
> > >> >> still working you may see the index size to be doubled. However
> after
> > >> >> it is finished the index size will be usually lowered comparing to
> > the
> > >> >> index size before optimize.
> > >> >>
> > >> >> --
> > >> >> Regards,
> > >> >>  Rafał Kuć
> > >> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> > >> >>
> > >> >> > Hi,
> > >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
> > >> upgrade
> > >> >> solr
> > >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains
> > same.
> > >> >> > But when optimize index for Solr 3.5.0 instance its size reaches
> > >> 791GB.so
> > >> >> > what is solutions for size remains same or lesser.
> > >> >> > I optimize Solr 3.5 with Query:
> > >> >> > /update?optimize=true&commit=true
> > >> >>
> > >> >> > Thanks & regards
> > >> >> > Viresh Modi
> > >> >>
> > >> >>
> > >>
> > >>
> > >
> > > --
> > > This email and its attachments are intended for the above named only
> and
> > > may be confidential. If they have come to you in error you must take no
> > > action based on them, nor must you copy or show them to anyone; please
> > > reply to this email and highlight the error.
> > >
> >
>
> --
>
> --
> This email and its attachments are intended for the above named only and
> may be confidential. If they have come to you in error you must take no
> action based on them, nor must you copy or show them to anyone; please
> reply to this email and highlight the error.
>


Spellchecker issue related to exact match of query in spellcheck index

2011-12-17 Thread Pravin Agrawal
Hi All,

I am trying to use file based spellchecker in solr 3.4 version and facing below 
issue.

My dictionary file contains following terms

abcd
abcde
abcdef
abcdefg


However, when checking spelling for abcd, it gives suggestion abcde even though 
the word abcd is present in dictionary file. Here is sample output.

http://10.88.36.192:8080/solr/spell?spellcheck.build=true&spellcheck=true&spellcheck.collate=true&q=abcd


−

−

−

1
0
4
−

abcde


abcde





I am expecting spell checker to give no suggestion if the word is already 
present in the dictionary, however it’s not the case as given above. I am using 
configuration as given below. Please let me know if I am missing something or 
its expected behavior. Also please let me know what should be done to get my 
desired output  (i.e. no suggestion if word is already in dictionary).

Thanks in advance.


Configuration:

spellcheck_text

  solr.FileBasedSpellChecker
  default
  score
  spellings.txt
  UTF-8
  . /spellcheckerFile

  

  

  false
  false
  1


  spellcheck

  

Schema.xml has following fieldtype






 


Thanks
Pravin



DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Spellchecker issue related to exact match of query in spellcheck index

2011-12-21 Thread Pravin Agrawal
Hi James,

Thanks a lot for your reply. The workaround that you suggested is working fine 
of me. Hope to see this enhancement in future releases of solr.

-Pravin

From: Dyer, James [james.d...@ingrambook.com]
Sent: Monday, December 19, 2011 11:11 PM
To: solr-user@lucene.apache.org
Subject: RE: Spellchecker issue related to exact match of query in spellcheck 
index

Pravin,

When using the "file-based" spell checking option, it will try to give you 
suggestions for every query term regardless of whether or not thwy are in your 
spelling dictionary. Getting the behavior you want would seem to be a worthy 
enhancement, but I don't think it is currently supported.  You might be able to 
work around this if you could get your dictionary terms in the index and then 
use the "index-based" option instead.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Pravin Agrawal [mailto:pravin_agra...@persistent.co.in]
Sent: Saturday, December 17, 2011 4:51 AM
To: solr-user@lucene.apache.org
Cc: Tushar Adeshara
Subject: Spellchecker issue related to exact match of query in spellcheck index

Hi All,

I am trying to use file based spellchecker in solr 3.4 version and facing below 
issue.

My dictionary file contains following terms

abcd
abcde
abcdef
abcdefg


However, when checking spelling for abcd, it gives suggestion abcde even though 
the word abcd is present in dictionary file. Here is sample output.

http://10.88.36.192:8080/solr/spell?spellcheck.build=true&spellcheck=true&spellcheck.collate=true&q=abcd


−

−

−

1
0
4
−

abcde


abcde





I am expecting spell checker to give no suggestion if the word is already 
present in the dictionary, however it’s not the case as given above. I am using 
configuration as given below. Please let me know if I am missing something or 
its expected behavior. Also please let me know what should be done to get my 
desired output  (i.e. no suggestion if word is already in dictionary).

Thanks in advance.


Configuration:

spellcheck_text

  solr.FileBasedSpellChecker
  default
  score
  spellings.txt
  UTF-8
  . /spellcheckerFile

  

  

  false
  false
  1


  spellcheck

  

Schema.xml has following fieldtype




    

 


Thanks
Pravin



DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Performance improvement for solr faceting on large index

2012-11-22 Thread Pravin Agrawal
Hi All,

We are using solr 3.4 with following schema fields.

---
























---

The index on above schema is distributed on two solr shards with each index 
size of about 1.2 million, and size on disk of about 195GB per shard.

We want to retrieve (site, autoSuggestContent term, frequency of the term) 
information from our above main solr index. The site is a field in document and 
contains name of site to which that document belongs. The terms are retrieved 
from multivalued field autoSuggestContent which is created using shingles from 
content and title of the web page.

As of now, we are using facet query to retrieve (term, frequency of term)  for 
each site. Below is a sample query (you may ignore initial part of query)

http://localhost:8080/solr/select?indent=on&q=*:*&fq=site:www.abc.com&start=0&rows=0&fl=id&qt=dismax&facet=true&facet.field=autoSuggestContent&facet.mincount=25&facet.limit=-1&facet.method=enum&facet.sort=index

The problem is that with increase in index size, this method has started taking 
huge time. It used to take 7 minutes per site with index size of
0.4 million docs but takes around 60-90 minutes for index size of 2.5 
million(). With this speed, it will take around 5-6 days to index complete 1500 
sites. Also we are expecting the index size to grow with more documents and 
more sites and as such time to get the above information will increase further.

Please let us know if there is any better way to extract (site, term, 
frequency) information compare to current method.

Thanks,
Pravin Agrawal




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Performance improvement for solr faceting on large index

2012-11-23 Thread Pravin Agrawal
Thanks Yuval and Otis for the reply.

Yuval: I tried different combination of facet.method (fc and enum) and 
filtercache size but there was not much improvement in the processing time.

Otis: We have a plan in future to move this processing out of solr but it will 
be a large code change at this point in time.
I know that outputting unitgram can be expensive, but we need to keep them :(.
The memory of the solr server that we are using is 128GB out of which we have 
assigned 64 GB to solr. We observed that solr threads are using 100% CPU when 
request is in process.
We are trying to divide this index further on 4 shards to reduce the index size 
per shard.

Need to ask few more questions that we have a large number of unique terms in 
our index so whether facet method fc is better or enum? and
Can a large facet.enum.cache.minDf value help ?


Thanks,
Pravin Agrawal

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Friday, November 23, 2012 6:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance improvement for solr faceting on large index

Hi,

I don't quite follow what you are trying gyroscope do, but it almost sounds
like you may be better off using something other than Solr if all you are
doing is filtering by site and counting something.
I see unigrams in what looks like it could be a big field and that's a red
flag.
Your index is quite big - how much memory have you got?  Do those queries
produce a lot of disk IO. I have a feeling they do. If so, your shards may
be too large for your hardware.

Otis
--
_
From: Yuval Dotan [yuvaldo...@gmail.com]
Sent: Thursday, November 22, 2012 7:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance improvement for solr faceting on large index

you could always try the fc facet method and maybe increase the filtercache
size

On Thu, Nov 22, 2012 at 2:53 PM, Pravin Agrawal <
pravin_agra...@persistent.co.in> wrote:

> Hi All,
>
> We are using solr 3.4 with following schema fields.
>
>
> ---
>
>  positionIncrementGap="100">
> 
> 
> 
>  maxShingleSize="5" outputUnigrams="true"/>
>  pattern="^([0-9. ])*$" replacement=""
> replace="all"/>
> 
> 
> 
> 
> 
> 
> 
>
> 
>  indexed="true" multiValued="true"/>
> 
> 
>
> 
> 
> 
>
>
> ---
>
> The index on above schema is distributed on two solr shards with each
> index size of about 1.2 million, and size on disk of about 195GB per shard.
>
> We want to retrieve (site, autoSuggestContent term, frequency of the term)
> information from our above main solr index. The site is a field in document
> and contains name of site to which that document belongs. The terms are
> retrieved from multivalued field autoSuggestContent which is created using
> shingles from content and title of the web page.
>
> As of now, we are using facet query to retrieve (term, frequency of term)
>  for each site. Below is a sample query (you may ignore initial part of
> query)
>
>
> http://localhost:8080/solr/select?indent=on&q=*:*&fq=site:www.abc.com&start=0&rows=0&fl=id&qt=dismax&facet=true&facet.field=autoSuggestContent&facet.mincount=25&facet.limit=-1&facet.method=enum&facet.sort=index
>
> The problem is that with increase in index size, this method has started
> taking huge time. It used to take 7 minutes per site with index size of
> 0.4 million docs but takes around 60-90 minutes for index size of 2.5
> million(). With this speed, it will take around 5-6 days to index complete
> 1500 sites. Also we are expecting the index size to grow with more
> documents and more sites and as such time to get the above information will
> increase further.
>
> Please let us know if there is any better way to extract (site, term,
> frequency) information compare to current method.
>
> Thanks,
> Pravin Agrawal
>
>
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please noti

Problem Multi word synonyms in solr 3.4

2012-02-09 Thread Pravin Agrawal
Hi All,



I am trying to use synonyms in solr 3.4 and facing below issue with multiword 
synonyms.



I am using edismax query parser with following fields in qf and pf



qf: name^1.2,name_synonym^0.5

pf: phrase_name^3



The analyzers that I am using for name_synonym is as follows





























With above configuration the below type of synonyms works fine

foobar => foo bar

FnB => foo and bar

aaa,bbb,ccc





However for following multiword synonym, the dismax query is incorrectly formed 
for qf field

xxx zzz, aaa bbb, mmm nnn, aaabbb





The parsedquery_tostring that gets formed for the query aaabbb is as follows



+(name:aaabbb^1.2 | name_synonym:" xxx zzz aaa bbb mmm (nnn aaabbb)"^0.5)~0.5 
(phrase_name:" xxx zzz aaa bbb mmm (nnn aaabbb)"~5^3.0)~0.5



I am expecting a query like



+(name:aaabbb^1.2 | ((name_synonym:xxx zzz name_synonym:aaa bbb 
name_synonym:mmm nnn name_synonym:aaabbb)^0.5))~0.5



Similarly for query xxx zzz I am getting following parsedquery_tostring from 
dismax



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0)~0.5



But I m expecting following query



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0 | phrase_name:"aaa 
bbb"~5^3.0 | phrase_name:"mmm nnn"~5^3.0 | phrase_name:"aaabbb"~5^3.0)~0.5





However it's not the case.

Please let me know if I am missing something or its expected behavior. Also 
please let me know what should be done to get my desired output.



Thanks in advance.

Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Performance problem with DIH in solr 3.3

2012-04-23 Thread Pravin Agrawal
Hi All,

I am using Delta import handler(solr 3.3) to index data from my database (using 
19 tables)
  Total Number of solr documents that get created from these 19 table is 444
  Total number of request send to data source during clean full import is 91083.

 My problem is that, DIH makes too many calls and puts load on my database.
  1. Can we batch these calls ?
  2. Can we use view instead? If yes can I get some examples to use view with 
DIH
  3. What kind of locks SOLR DIH acquire while querying DB?

Note: we are using both Full-import and delta-import handler.

Thanks in advance
Pravin Agrawal

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: delay while adding document to solr index

2009-09-30 Thread Pravin Paratey
Swapna,

Your answers are inline.

2009/9/30 swapna_here :
>
> hi all,
>
> I have indexed 10 documents (daily around 5000 documents will be indexed
> one at a time to solr)
> at the same time daily few(around 2000) indexed documents (added 30 days
> back) will be deleted using DeleteByQuery of SolrJ
> Previously each document used to be indexed within 5ms..
> but recently i am facing a delay (sometimes 2min to 10 min) while adding
> document to index.
> And my index (folder) size is also increased to 625MB which is very large
> Previously it was around 230MB
>
> My Questions are:
>
> 1) is solr not deleting the older documents(added 30 days back) permenently
> from index event after committing

Have you run optimize?

> 2)Why the index size is increased

If 5000 docs are added daily and only 2000 deleted, the index size
would increase because of the remaining 3000 documents.

> 3)reason for delay (2min to 10 mins) while adding the document one at a time
> to index

I don't know why this would happen. Is your disk nearly full? Which OS
are you running on? What is the configuration of Solr?

> Help is appreciated
>
> Thanks in advance..
>
> --
> View this message in context: 
> http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Hope this helps
Pravin


Re: Solr Porting to .Net

2009-09-30 Thread Pravin Paratey
You may want to check out - http://code.google.com/p/solrnet/

2009/9/30 Antonio Calò :
> Hi All
>
> I'm wondering if is already available a Solr version for .Net or if it is
> still under development/planning. I've searched on Solr website but I've
> found only info on Lucene .Net project.
>
> Best Regards
>
> Antonio
>
> --
> Antonio Calò
> --
> Software Developer Engineer
> @ Intellisemantic
> Mail anton.c...@gmail.com
> Tel. 011-56.90.429
> --
>


Re: delay while adding document to solr index

2009-09-30 Thread Pravin Paratey
Also, what is your merge factor set to?

Pravin

2009/9/30 Pravin Paratey :
> Swapna,
>
> Your answers are inline.
>
> 2009/9/30 swapna_here :
>>
>> hi all,
>>
>> I have indexed 10 documents (daily around 5000 documents will be indexed
>> one at a time to solr)
>> at the same time daily few(around 2000) indexed documents (added 30 days
>> back) will be deleted using DeleteByQuery of SolrJ
>> Previously each document used to be indexed within 5ms..
>> but recently i am facing a delay (sometimes 2min to 10 min) while adding
>> document to index.
>> And my index (folder) size is also increased to 625MB which is very large
>> Previously it was around 230MB
>>
>> My Questions are:
>>
>> 1) is solr not deleting the older documents(added 30 days back) permenently
>> from index event after committing
>
> Have you run optimize?
>
>> 2)Why the index size is increased
>
> If 5000 docs are added daily and only 2000 deleted, the index size
> would increase because of the remaining 3000 documents.
>
>> 3)reason for delay (2min to 10 mins) while adding the document one at a time
>> to index
>
> I don't know why this would happen. Is your disk nearly full? Which OS
> are you running on? What is the configuration of Solr?
>
>> Help is appreciated
>>
>> Thanks in advance..
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
> Hope this helps
> Pravin
>


Re: delay while adding document to solr index

2009-09-30 Thread Pravin Paratey
Swapna

While the disk space does increase during the process of optimization,
it should almost always return to the original size or slightly less.

This is a silly question. But off the top of my head, I can't think of
any other reason why the index size would increase - Are you running a
 after adding documents?

If you are, you might want to compare the size of each document being
currently indexed with the ones you indexed a few months back.

To optimize the index, simply post  to Solr. Or read
[http://wiki.apache.org/solr/SolrOperationsTools]

Pravin

2009/9/30 swapna_here :
>
> thanks for your reply
> i have not optimized at all
> my knowledge is optimize improves the query performance but it will take
> more disk space
> except that i have no idea how to use it
>
> previously for 10 documents the size occupied was around 250MB
>
> But after 2 months it is 625MB
>
> why this happened ?
> is it because i have not optimized the index
> can any body tell me when and how to optimize the index(with configuration
> details) .
> --
> View this message in context: 
> http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25678531.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Solr Quries

2009-10-06 Thread Pravin Karne
Hi,
I am new to solr. I have following queries :


1.   Is solr work in distributed environment ? if yes, how to configure it?



2.   Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? 
(Note: I am familiar with Hadoop)



3.   I have employee information(id, name ,address, cell no, personal info) 
of 1 TB ,To post(index)this data on solr server, shall I have to create xml 
file with this data and then post it to solr server? Or is there any other 
optimal way?  In future my data will grow upto 10 TB , then how can I index 
this data ?(because creating xml is more headache )





Thanks in advance

-Pravin




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Pravin Karne
Hi,
I am new to solr. I am able to index, search and update with small size(around 
500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
heap exception.
While investigation I found that post jar or post.sh load whole file in memory.

I use one work around with dividing small file in small files..and it's working

Is there any other way to post large file as above work around is not feasible 
for 1 TB file

Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Solr Quries

2009-10-08 Thread Pravin Karne
Thanks for your help.
Can you please provide detail configuration for solr distributed environment.
How to setup master and slave ? for this in which  file/s I have to do changes ?
What are the shard parameters ?

Can we integrate zookeeper with this ?

Please provide details for this.

Thanks in advance.
-Pravin

-Original Message-
From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com]
Sent: Wednesday, October 07, 2009 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries


Hi Pravin,

1. Is solr work in distributed environment ? if yes, how to configure it?
Yep. You can achieve this with Sharding.
For example: Install and Configure Solr on two machines and declare any one
of those as master. Insert shard parameters while you index and search your
data.

2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS?
(Note: I am familiar with Hadoop)
Sorry. No idea.

3. I have employee information(id, name ,address, cell no, personal info) of
1 TB ,To post(index)this data on solr server, shall I have to create xml
file with this data and then post it to solr server? Or is there any other
optimal way?  In future my data will grow upto 10 TB , then how can I index
this data ?(because creating xml is more headache )
I think, XML is not the best way. I don't suggest it. If you have that 1 TB
data in a database you can achieve this simply using full import command.
Configure your DB details in solr-config.xml and data-config.xml and add you
DB driver jar to solr lib directory. Now import the data in slices (say dept
wise, or in some category wise..). In future, you can import the data from a
DB or you can index the data directly using client-API with simple java
beans.

Hope this info helps you.

Regards,
Sandeep Tagore
--
View this message in context: 
http://www.nabble.com/Solr-Quries-tp25780371p25783891.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Solr Quries

2009-10-08 Thread Pravin Karne
Thanks for your reply.
I have one more query regarding solr distributed environment.

I have configured solr on to machine as per 
http://wiki.apache.org/solr/DistributedSearch

But I have following test case -

Suppose I have two machine ,Sever1 ,Server2

I have post record with id 1 on sever1 and put other record on server2 with 
same id i.e. 1

So when I gives query like 
http://sever1:8983/solr/select?shards=server1:8983/solr,server2:8983/solr&; &q=1
this gives result from server1



http://server2:8983/solr/select?shards=server2:8983/solr,server1/solr&q=1
this gives result from server2

how to solve this..

Is any other setting is required for this ?

Thanks in advance
-Pravin

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, October 07, 2009 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries

First, please do not cross-post messages to both solr-dev and solr-user.
Solr-dev is only for development related discussions.

Comments inline:

On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have following queries :
>
>
> 1.   Is solr work in distributed environment ? if yes, how to configure
> it?
>

Yes, Solr works in distributed environment. See
http://wiki.apache.org/solr/DistributedSearch


>
>
>
> 2.   Is solr have Hadoop support? if yes, how to setup it with
> Hadoop/HDFS? (Note: I am familiar with Hadoop)
>
>
Not currently. There is some work going on at
https://issues.apache.org/jira/browse/SOLR-1457


>
>
> 3.   I have employee information(id, name ,address, cell no, personal
> info) of 1 TB ,To post(index)this data on solr server, shall I have to
> create xml file with this data and then post it to solr server? Or is there
> any other optimal way?  In future my data will grow upto 10 TB , then how
> can I index this data ?(because creating xml is more headache )
>
>
XML is just one way. You could use also CSV. If you use, the Solrj java
client with Solr 1.4 (soon to be released), it uses an efficient binary
format for posting data to Solr.

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


how to deploy index on solr

2009-10-09 Thread Pravin Karne
Hi
I have index data with Lucene. I want to deploy this indexes on solr for search.

Generally we  index and search data with Solr, but now I want to just search 
with Lucene indexes.

How can we do this ?

-Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


dose solr sopport distribute index storage ?

2009-10-09 Thread Pravin Karne
Hi,
I am new to solr. I have configured solr successfully and its working smoothly.

I have one query:

I want index large data(around 100GB).So can we store these indexes on 
different machine as distributed system.

So there will be one master and more slave . Also we have to keep these data in 
sync over all the node.

So when I send update request solr will update that record from corresponding 
node.

In short I want to create scalable and optimal search system.

Is this possible with solr?

Please help in this. Any pointer  regarding this will be highly appreciated.

Thanks in advance


-Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: dose solr sopport distribute index storage ?

2009-10-11 Thread Pravin Karne
How to set master/slave setup for solr.

What are the configuration steps for this?


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, October 09, 2009 6:51 PM
To: solr-user@lucene.apache.org
Subject: Re: dose solr sopport distribute index storage ?

On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have configured solr successfully and its working
> smoothly.
>
> I have one query:
>
> I want index large data(around 100GB).So can we store these indexes on
> different machine as distributed system.
>
>
Are you talking about one large index with 100GB of data? Or do you plan to
shard the data into multiple smaller indexes and use Solr's distributed
search?


> So there will be one master and more slave . Also we have to keep these
> data in sync over all the node.
>
> So when I send update request solr will update that record from
> corresponding node.
>
>
Solr will not update corresponding node automatically. You have to make sure
to send the add/delete request to the master of the correct shard. Solr does
not support update operation (it is always a replace by uniqueKey).


> In short I want to create scalable and optimal search system.
>
> Is this possible with solr?
>
>
Of course you can create a scalable and optimal search system with Solr. We
do that all the time ;)

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: dose solr sopport distribute index storage ?

2009-10-11 Thread Pravin Karne
I am looking for one large index with 100GB of data.

How to store this on distribute system.

-Thanks

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, October 09, 2009 6:51 PM
To: solr-user@lucene.apache.org
Subject: Re: dose solr sopport distribute index storage ?

On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have configured solr successfully and its working
> smoothly.
>
> I have one query:
>
> I want index large data(around 100GB).So can we store these indexes on
> different machine as distributed system.
>
>
Are you talking about one large index with 100GB of data? Or do you plan to
shard the data into multiple smaller indexes and use Solr's distributed
search?




> So there will be one master and more slave . Also we have to keep these
> data in sync over all the node.
>
> So when I send update request solr will update that record from
> corresponding node.
>
>
Solr will not update corresponding node automatically. You have to make sure
to send the add/delete request to the master of the correct shard. Solr does
not support update operation (it is always a replace by uniqueKey).


> In short I want to create scalable and optimal search system.
>
> Is this possible with solr?
>
>
Of course you can create a scalable and optimal search system with Solr. We
do that all the time ;)

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


hadoop configuarions for SOLR-1301 patch

2009-10-14 Thread Pravin Karne
Hi,
I am using SOLR-1301 path. I have build the solr with given patch.
But I am not able to configure Hadoop for above war.

I want to run solr(create index) with 3 nodes (1+2) cluster.

How to do the Hadoop configurations for above patch?
How to set master and slave?


Thanks
-Pravin




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: hadoop configuarions for SOLR-1301 patch

2009-10-15 Thread Pravin Karne
Hi,
Patch(SOLR-1301) provides distributed indexing (using Hadoop).

Now I have Hadoop cluster with 1 master and 2 slaves.

Also I have applied above path to solr and build solr.

So how I integrate above solr executables with Hadoop cluster?

Can u please tell what are the steps for this.

Shall I just have copy solr war to Hadoop  cluster or what else ?

(Note: I have two setup :
  1. Hadoop setup
  2. Solr setup)

So to run distributed indexing how to bridge these two setup?

Thanks
-Pravin
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Friday, October 16, 2009 7:45 AM
To: solr-user@lucene.apache.org
Subject: Re: hadoop configuarions for SOLR-1301 patch

Hi Pravin,

You'll need to setup a Hadoop cluster which is independent of
SOLR-1301. 1301 is for building Solr indexes only, so there
isn't a master and slave. After building the indexes one needs
to provision the indexes to Solr servers. In my case I only have
slaves because I'm not incrementally indexing on the Hadoop
generated shards.

1301 does need a Hadoop specific unit test, which I got started
and need to complete, that could help a little in understanding.

-J

On Wed, Oct 14, 2009 at 5:45 AM, Pravin Karne
 wrote:
> Hi,
> I am using SOLR-1301 path. I have build the solr with given patch.
> But I am not able to configure Hadoop for above war.
>
> I want to run solr(create index) with 3 nodes (1+2) cluster.
>
> How to do the Hadoop configurations for above patch?
> How to set master and slave?
>
>
> Thanks
> -Pravin
>
>
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
>

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: Solr configuration with Text files

2009-03-10 Thread Pravin Paratey
AFAIK, you're going to have to code something up. Do remember to add CDATA
tags to your xml.

On Tue, Mar 10, 2009 at 11:31 PM, KennyN  wrote:

>
> This functionality is possible 'out of the box', right? Or am I going to
> need
> to code up something that reads in the id named files and generates the xml
> file?
> --
> View this message in context:
> http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>