Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen

Hi,

Basically i need to post something like this using curl in php

The example of php explained in earlier thread,

curl http://localhost:8983/solr/update?commit=true -H "Content-Type:
text/xml" --data-binary 'testdoc'

Should we need to create a temp file and using put command 


can we do it using post 

Regards
Naveen 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-on-posting-update-request-using-CURL-in-php-tp3047312p3047372.html
Sent from the Solr - User mailing list archive at Nabble.com.


FW: SolrCloud App Unit Testing

2016-03-18 Thread Madhire, Naveen

Hi,

I am writing a Solr Application, can anyone please let me know how to Unit test 
the application?

I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
about how to use that for Unit testing.

How should I create a embedded server for unit testing?



Thanks,
Naveen


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Sold MiniSolrCloudCluster Issue

2015-10-27 Thread Madhire, Naveen
Hi,

I am using MiniSolrCloudCluster class in writing unit test cases for testing 
the solr application.

Looks like there is a HTTPClient library mismatch with the solr version and I 
am getting the below error,

java.lang.VerifyError: Bad return type
Exception Details:
  Location:

org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
 @57: areturn
  Reason:
Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, 
stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttp

I am using Solr 5.3.1
I see a similar issue here https://issues.apache.org/jira/browse/SOLR-7948 but 
there doesn’t seem to be a work around for this.

Can anyone please tell me how to fix this issue?


Below is code snippet,


dataDir = tempFolder.newFolder();
File solrXml = new File("src/test/resources/solr.xml”);

MiniSolrCloudCluster cluster = new 
MiniSolrCloudCluster(1,null,workingDir,solrXml,null,null);


Thanks.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Question about CloudSolrServer

2016-06-08 Thread Naveen Pajjuri
Hi,
Trying to migrate from HttpSolrServer to CloudSolrServer. getting the
following exception while adding docs using CloudSolrServer.


org.apache.solr.common.SolrException: Unknown document router
'{name=compositeId}'

at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46)

whereas my cluterstate json says --

  "maxShardsPerNode":"1",
"router":{"name":"compositeId"},
"replicationFactor":"1".


please advice.

PS : i'm using solr 4.10.4.

Thanks,
Naveen.


Re: Question about CloudSolrServer

2016-06-09 Thread Naveen Pajjuri
Thanks *Shawn.*
i was using older version of solrj. upgrading it to newer version worked.

Thank you.

On Thu, Jun 9, 2016 at 11:41 AM, Shawn Heisey  wrote:

> On 6/8/2016 11:44 PM, Naveen Pajjuri wrote:
> > Trying to migrate from HttpSolrServer to CloudSolrServer. getting the
> > following exception while adding docs using CloudSolrServer.
> >
> >
> > org.apache.solr.common.SolrException: Unknown document router
> > '{name=compositeId}'
> >
> > at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46)
> >
> > whereas my cluterstate json says --
> >
> >   "maxShardsPerNode":"1",
> > "router":{"name":"compositeId"},
> > "replicationFactor":"1".
>
> I am guessing that you are using a much older version of SolrJ than the
> Solr version it is talking to.  The '{"name":"compositeId"}' structure
> appears to be the way that newer versions of Solr record the router in
> zookeeper, which is something that the older versions of SolrJ will not
> know how to handle.
>
> Mixing different versions of Solr and SolrJ will work very well, as long
> as you're not using the cloud client.  That client is so tightly coupled
> to SolrCloud internals that it does not work well with a large version
> difference, especially if the client is older than the server.
>
> Most likely you'll need to upgrade your SolrJ version.  At the same
> time, switching to CloudSolrClient is probably a good idea -- the class
> names that end in Server are deprecated in 5.x and gone in 6.x.
>
> Thanks,
> Shawn
>
>


Boosting exact match fields.

2016-06-14 Thread Naveen Pajjuri
Hi,

I have documents with a field (data type definition for that field is
below) values as ear phones, sony ear phones, philips ear phones. when i
query for earphones sony ear phones is the top result where as i want ear
phones as top result. please suggest how to boost exact matches. PS: I have
earphones => ear phones in my synonyms.txt and the datatype definition for
that field keywords is REGARDS,
Naveen


CloudSolrServer with multiple zookeeper cluster setup.

2016-07-06 Thread Naveen Pajjuri
Hi,
In our production we have a solr cloud setup with zookeeper cluster setup.
I want to shift to CloudSolrServer from httpsolrserver is there any way to
specify all the ip addresses of zookeeper machines while
instantiating CloudSolrServer, so that i will have an automatic fallback
mechanism.

PS : right now i'm instantiating CloudSolrServer with one of the zookeeper
machine's ip from the cluster. But if zookeeper on this machine dies my
production systems may break.


Thanks,
Naveen.


Sorting in solr

2016-07-11 Thread Naveen Pajjuri
Hi,
If i apply some sorting order on solr. when are the Documents sorted.

   1. are documents sorted after fetching the results  ?
   2. or we get sorted documents ?

Regards,
Naveen


CloudSolrServer instead of httpSolrServer

2016-07-30 Thread Naveen Pajjuri
Hi,
While sending updates to solr cloud i randomly send updates to one of the
node (in my cloud) directly using httpSolrServer. if i use cloudSolrServer
(by passing zk ip's), instead of httpSolrServer can i expect any improvment
in performance.

my baisc question is how does updates propagate when i directly send
updates to one of the node using httpSolrServer in cloud model.

   - will the update bounces back to leader direclty??
   - or will it send to every node till it finds leader??


Thank You.


custom field types in solr 6.1.0

2016-08-02 Thread Naveen Pajjuri
Hi,
Im trying to move from 4.10.4 to 6.1.0.
I want to define and use custom field types. but i read that its not
advisable to modify managed-schema file. how do i create custom field types
??

Thanks in advance,
Naveen Reddy


Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Naveen Pajjuri
Hi,
I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk
data (version folder) and restarted solr and zookeeper. I started getting
this error.


   - *sample_shard1_replica1:*
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
   Specified config does not exist in ZooKeeper: sample.


Please let me know what i'm missing.

Regards,
Naveen Reddy.


Re: Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Naveen Pajjuri
Here sample is the name of my collection.

Thanks

On Sun, Aug 7, 2016 at 3:10 PM, Naveen Pajjuri 
wrote:

> Hi,
> I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk
> data (version folder) and restarted solr and zookeeper. I started getting
> this error.
>
>
>- *sample_shard1_replica1:* org.apache.solr.common.cloud.
>ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
>Specified config does not exist in ZooKeeper: sample.
>
>
> Please let me know what i'm missing.
>
> Regards,
> Naveen Reddy.
>


How to exclude stop words in spellcheck collations

2017-07-16 Thread Naveen Pajjuri
Hi,
Is there any way i can exclude stop words from the collations and
sugesstions from spell check component ?

Regards,
Naveen Pajjuri.


tika and solr 3,1 integration

2011-06-02 Thread Naveen Gupta
Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default



  
  text
  true
  ignored_

  
  true
  links
  ignored_

  





* curl "
http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true";
-F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*


Apache Tomcat/6.0.18 - Error report<!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}-->
HTTP Status 400 - ERROR:unknown field 'attr_meta'type Status reportmessage
ERROR:unknown field 'attr_meta'description The
request sent by the client was syntactically incorrect (ERROR:unknown field
'attr_meta').Apache
Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#


Please note

i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine
and using solr cell

calling the program works fine without any changes in configuration.

Thanks
Naveen


tika and solr 3,1 integration error

2011-06-02 Thread Naveen Gupta
Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default


> 
>   
>   text
>   true
>   ignored_
>
>   
>   true
>   links
>   ignored_
> 
>   
>
>
>
>
>
> * curl "
> http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true";
> -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*
>
>
> Apache Tomcat/6.0.18 - Error report<!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> H3
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status reportmessage
> ERROR:unknown field 'attr_meta'description The
> request sent by the client was syntactically incorrect (ERROR:unknown field
> 'attr_meta').Apache
> Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#
>
>
> Please note
>
> i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
> machine and using solr cell
>
> calling the program works fine without any changes in configuration.
>
> Thanks
> Naveen
>

>


Re: tika and solr 3,1 integration

2011-06-02 Thread Naveen Gupta
Hi

This is fixed .. yes, schema.xml was the culprit and i fixed it looking at
the sample schema provided in the sample.

But in windows, i am getting slf4j (illegalacess exception) which looks like
jar problem. looking at the fixes, suggested in their FAQs, they are
suggesting to use 1.5.5 version, which is already there in lib folder ..

i have been finding a lot of jars to be deployed .. i am afraid if that is
causing the problem ..

Has somebody experienced the same ?

Thanks
Naveen


On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande  wrote:

> Hi Naveen,
>
> Check if there is a dynamic field named "attr_*" in the schema. The
> "uprefix=attr_" parameter means that if Solr can't find an extracted field
> in the schema, it'll add the prefix "attr_" and try again.
>
> *Juan*
>
>
>
> On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta  wrote:
>
> > Hi
> >
> > I am trying to integrate solr 3.1 and tika (which comes default with the
> > version)
> >
> > and using curl command trying to index few of the documents, i am getting
> > this error. the error is attr_meta field is unknown. i checked the
> > solrconfig, it looks perfect to me.
> >
> > can you please tell me what i am missing.
> >
> > I copied all the jars from contrib/extraction/lib to solr/lib folder that
> > is
> > there in same place where conf is there 
> >
> >
> > I am using the same request handler which is coming with default
> >
> >  >  startup="lazy"
> >  class="solr.extraction.ExtractingRequestHandler" >
> >
> >  
> >  text
> >  true
> >  ignored_
> >
> >  
> >  true
> >  links
> >  ignored_
> >
> >  
> >
> >
> >
> >
> >
> > * curl "
> >
> >
> http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true
> > "
> > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"*
> >
> >
> > Apache Tomcat/6.0.18 - Error
> report<!--H1
> >
> >
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> > H2
> >
> >
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> > H3
> >
> >
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> > BODY
> > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
> B
> >
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> > P
> >
> >
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> > {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> > HTTP Status 400 - ERROR:unknown field
> 'attr_meta' > size="1" noshade="noshade">type Status
> > reportmessage
> > ERROR:unknown field 'attr_meta'description The
> > request sent by the client was syntactically incorrect (ERROR:unknown
> field
> > 'attr_meta').Apache
> > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib#
> >
> >
> > Please note
> >
> > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
> > machine
> > and using solr cell
> >
> > calling the program works fine without any changes in configuration.
> >
> > Thanks
> > Naveen
> >
>


Strategy --> Frequent updates in our application

2011-06-02 Thread Naveen Gupta
Hi

We are having an application where every 10 mins, we are doing indexing of
users docs repository, and eventually, if some thread is being added in that
particular discussion, we need to index the thread again (please note we are
not doing blind indexing each time, we have various rules to filter out
which thread is new and thus that is a candidate for indexing plus new ones
which has arrived).

So we are doing updates for each user docs repository .. the performance is
not looking so far very good. the future is that we are going to get hits in
volume(1000 to 10,000 hits per mins), so looking for strategy where we can
tune solr in order to index the data in real time

and what about NRT, is it fine to apply in this case of scenario. i read
that solr NRT is not very good in performance, but i am not going to believe
it since it is one of the best open sources ..so it is going to have this
problem sorted in near future ..but if any benchmark is there, kindly share
with me ... we would like to analyze with our requirements.

Is there any way to add incremental indexes which we generally find in other
search engine like endeca and etc? i don't know much in detail about solr...
since i am newbie, so can you please tell me if we can have some settings
which can keep track of incremental indexing?


Thanks
Naveen


different indexes for multitenant approach

2011-06-02 Thread Naveen Gupta
Hi

I want to implement different index strategy where we want to keep indexes
with respect to each tennant and we want to maintain indexes separately ...

first level of category -- company name

second level of category - company name + fields to be indexed

then further categories - group of different company name based on some
heuristic (hashing) (if it grows furhter)

i want to do in the same solr instance. can it be possible ?

Thanks
Naveen


Re: How to display search results of solr in to other application.

2011-06-02 Thread Naveen Gupta
Hi Romi

As per me, you need to understand how ajax with jquery works .. then go for
json and then jsonp (if you are fetching from different)

query here is dynamic query which you will be trying to hit solr .. (it
could be simple text, or more advanced query string)

http://wiki.apache.org/solr/CommonQueryParameters

Callback is the method name which you will define .. after getting response,
this method will be called (callback mechanism)

using the response from solr (json format), you need to show the response or
analyze the response as per your business need.

Thanks
Naveen


On Fri, Jun 3, 2011 at 12:00 PM, Romi  wrote:

> $.getJSON(
>   "http://[server]:[port]/solr/select/?jsoncallback=?";,
>   {"q": queryString,
>   "version": "2.2",
>   "start": "0",
>   "rows": "10",
>   "indent": "on",
>   "json.wrf": "callbackFunctionToDoSomethingWithOurData",
>   "wt": "json",
>   "fl": "field1"}
>   );
>
> would you please explain what are  queryString and "json.wrf":
> "callbackFunctionToDoSomethingWithOurData". and what if i want to change my
> query string each time.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Hi

We want to post to solr server with some of the files (rtf,doc,etc) using
php .. one way is to post using curl

is there any client like java client (solrcell)

urls will also help

Thanks
Naveen


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Naveen Gupta
Hi Pravesh

We don't have that setup right now .. we are thinking of doing that 

for writes we are going to have one instance and for read, we are going to
have another...

do you have other design in mind .. kindly share

Thanks
Naveen

On Fri, Jun 3, 2011 at 2:50 PM, pravesh  wrote:

> You can use DataImportHandler for your full/incremental indexing. Now NRT
> indexing could vary as per business requirements (i mean delay cud be
> 5-mins
> ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
> be indexed incrementally.
> BTW, r u having Master+Slave SOLR setup?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Yes,

that one i used and it is working fine .thanks to nabble ..

Thanks
Naveen

On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty  wrote:

> On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta  wrote:
> > Hi
> >
> > We want to post to solr server with some of the files (rtf,doc,etc) using
> > php .. one way is to post using curl
>
> Do not normally use PHP, and have not tried it myself.
> However, there is a PHP extension for Solr:
>  http://wiki.apache.org/solr/SolPHP
>  http://php.net/manual/en/book.solr.php
>
> Regards,
> Gora
>


TIKA INTEGRATION PERFORMANCE

2011-06-05 Thread Naveen Gupta
Hi

Since it is php, we are using solphp for calling curl based call,

what my concern here is that for each user, we might be having 20-40
attachments needed to be indexed each day, and there are various users
..daily we are targeting around 500-1000 users ..

right now if you see, we

http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true');
 curl_setopt ($ch, CURLOPT_POST, 1);
 curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf"));
 $result= curl_exec ($ch);
?>

also we are planning to use other fields which are to be indexed and stored
...


There are couple of questions here

1. what would be the best strategies for commit. if we take all the
documents in an array and iterating one by one and fire the curl and for the
last doc, if we commit, will it work or for each doc, we need to commit?

2. we are having several fields which are already defined in schema and few
of the them are required earlier, but for this purpose, we don't want, how
to have two requirement together in the same schema?

3. since it is frequent commit, how to use solr multicore for write and read
operations separately ?

Thanks
Naveen


Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Naveen Gupta
Hi Tomas,

1. Regarding SolrInputDocument,

We are not using java client, rather we are using php solr, wrapping content
in SolrInputDocument, i am not sure how to do in PHP client? In this case,
we need tika related jars to avail the metadata such as content .. we
certainly don't want to handle all these things in PHP client.

 Secondly, what i was asking about commit strategy --

what about suppose you have 100 docs

iterate over 99 docs and fire curl without commit in url

and for 100th doc, we will use commit 

so doing so, will it also update the indexes for last 99 docs 

while(upto 99){
 curl_command = url without commit;
}

when i = 100, url would be commit

i wanted to achieve something similar to optimize kind of thing 

why these kind of use cases which are general purpose not included in
example (especially in other language ...java guys can easily do using API)

I am basically a Java Guy, so i can feel the problem

Thanks
Naveen
2011/6/6 Tomás Fernández Löbbe 

> 1. About the commit strategy, all the ExtractingRequestHandler (request
> handler that uses Tika to extract content from the input file) will do is
> extract the content of your file and add it to a SolrInputDocument. The
> commit strategy should not change because of this, compared to other
> documents you might be indexing. It is usually not recommended to commit on
> every new / updated document.
>
> 2. Don't know if I understand the question. you can add all the static
> fields you want to the document by adding the "literal." prefix to the name
> of the fields when using ExtractingRequestHandler (as you are doing with "
> literal.id"). You can also leave empty fields if they are not marked as
> "required" at the schema.xml file. See:
> http://wiki.apache.org/solr/ExtractingRequestHandler#Literals
>
> 3. Solr cores can work almost as completely different Solr instances. You
> could tell one core to replicate from another core. I don't think this
> would
> be of any help here. If you want to separate the indexing operations from
> the query operations, you could probably use different machines, that's
> usually a better option. Configure the indexing box as master and the query
> box as slave. Here you have some more information about it:
> http://wiki.apache.org/solr/SolrReplication
>
> Were this the answers you were looking for or did I misunderstand your
> questions?
>
> Tomás
>
> On Mon, Jun 6, 2011 at 2:54 AM, Naveen Gupta  wrote:
>
> > Hi
> >
> > Since it is php, we are using solphp for calling curl based call,
> >
> > what my concern here is that for each user, we might be having 20-40
> > attachments needed to be indexed each day, and there are various users
> > ..daily we are targeting around 500-1000 users ..
> >
> > right now if you see, we
> >
> >  > $ch = curl_init('
> > http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true');
> >  curl_setopt ($ch, CURLOPT_POST, 1);
> >  curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf"));
> >  $result= curl_exec ($ch);
> > ?>
> >
> > also we are planning to use other fields which are to be indexed and
> stored
> > ...
> >
> >
> > There are couple of questions here
> >
> > 1. what would be the best strategies for commit. if we take all the
> > documents in an array and iterating one by one and fire the curl and for
> > the
> > last doc, if we commit, will it work or for each doc, we need to commit?
> >
> > 2. we are having several fields which are already defined in schema and
> few
> > of the them are required earlier, but for this purpose, we don't want,
> how
> > to have two requirement together in the same schema?
> >
> > 3. since it is frequent commit, how to use solr multicore for write and
> > read
> > operations separately ?
> >
> > Thanks
> > Naveen
> >
>


getting numberformat exception while using tika

2011-06-07 Thread Naveen Gupta
Hi

We are using requestextractinghandler and we are getting following error. we
are giving microsoft docx file for indexing.

I think that this is something to do with field date definition .. but now
very sure ...what field type should we use?

2. we are trying to index jpg (when we search over the name of the jpg, it
is not coming .. though in id i am passing one)

3. what about zip files or rar files.. does tika with solr handle this one ?

java.lang.NumberFormatException: For input string:
"2011-01-27T07:18:00Z"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:412)
at java.lang.Long.parseLong(Long.java:461)
at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Thanks
Naveen


tika integration exception and other related queries

2011-06-07 Thread Naveen Gupta
Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.

java.lang.
>
> NumberFormatException: For input string: "2011-01-27T07:18:00Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:412)
> at java.lang.Long.parseLong(Long.java:461)
> at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
> at
> org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
> at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
>


Thanks
Naveen



On Tue, Jun 7, 2011 at 3:33 PM, Naveen Gupta  wrote:

> Hi
>
> We are using requestextractinghandler and we are getting following error.
> we are giving microsoft docx file for indexing.
>
> I think that this is something to do with field date definition .. but now
> very sure ...what field type should we use?
>
> 2. we are trying to index jpg (when we search over the name of the jpg, it
> is not coming .. though in id i am passing one)
>
> 3. what about zip files or rar files.. does tika with solr handle this one
> ?
>




>
> java.lang.NumberFormatException: For input string:
> "2011-01-27T07:18:00Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:412)
> at java.lang.Long.parseLong(Long.java:461)
> at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
> at
> org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
> at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHa

Re: tika integration exception and other related queries

2011-06-08 Thread Naveen Gupta
Hi Gary

It started working .. though i did not test for Zip files, but for rar
files, it is working fine ..

only thing what i wanted to do is to index the metadata (text mapped to
content) not store the data  Also in search result, i want to filter the
stuffs ... and it started working fine .. i don't want to show the content
stuffs to the end user, since the way it extracts the information is not
very helpful to the user .. although we can apply few of the analyzers and
filters to remove the unnecessary tags ..still the information would not be
of much help .. looking for your opinion ... what you did in order to filter
out the content or are you showing the content extracted to the end user?

Even in case, we are showing the text part to the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...

Thanks
Naveen

On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor  wrote:

> Naveen,
>
> For indexing Zip files with Tika, take a look at the following thread :
>
>
> http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html
>
> I got it to work with the 3.1 source and a couple of patches.
>
> Hope this helps.
>
> Regards,
> Gary.
>
>
>
> On 08/06/2011 04:12, Naveen Gupta wrote:
>
>> Hi Can somebody answer this ...
>>
>> 3. can somebody tell me an idea how to do indexing for a zip file ?
>>
>> 1. while sending docx, we are getting following error.
>>
>
>


Re: tika integration exception and other related queries

2011-06-09 Thread Naveen Gupta
Hi Gary,

Similar thing we are doing, but we are not creating an XML doc, rather we
are leaving TIKA to extract the content and depends on dynamic fields. We
are not storing the text as well. But not sure if in future that would be
the case.

What about microsoft 7 and later related attachments. Is this working for
you, because we are always getting number format exception. I posted as well
in the community, but till now no response has some.

Thanks
Naveen

On Thu, Jun 9, 2011 at 6:43 PM, Gary Taylor  wrote:

> Naveen,
>
> Not sure our requirement matches yours, but one of the things we index is a
> "comment" item that can have one or more files attached to it.  To index the
> whole thing as a single Solr document we create a zipfile containing a file
> with the comment details in it and any additional attached files.  This is
> submitted to Solr as a TEXT field in an XML doc, along with other meta-data
> fields from the comment.  In our schema the TEXT field is indexed but not
> stored, so when we search and get a match back it doesn't contain all of the
> contents from the attached files etc., only the stored fields in our schema.
>   Admittedly, the user can therefore get back a "comment" match with no
> indication as to WHERE the match occurred (ie. was it in the meta-data or
> the contents of the attached files), but at the moment we're only interested
> in getting appropriate matches, not explaining where the match is.
>
> Hope that helps.
>
> Kind regards,
> Gary.
>
>
>
>
> On 09/06/2011 03:00, Naveen Gupta wrote:
>
>> Hi Gary
>>
>> It started working .. though i did not test for Zip files, but for rar
>> files, it is working fine ..
>>
>> only thing what i wanted to do is to index the metadata (text mapped to
>> content) not store the data  Also in search result, i want to filter
>> the
>> stuffs ... and it started working fine .. i don't want to show the content
>> stuffs to the end user, since the way it extracts the information is not
>> very helpful to the user .. although we can apply few of the analyzers and
>> filters to remove the unnecessary tags ..still the information would not
>> be
>> of much help .. looking for your opinion ... what you did in order to
>> filter
>> out the content or are you showing the content extracted to the end user?
>>
>> Even in case, we are showing the text part to the end user, how can i
>> limit
>> the number of characters while querying the search results ... is there
>> any
>> feature where we can achieve this ... the concept of snippet kind of thing
>> ...
>>
>> Thanks
>> Naveen
>>
>> On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor  wrote:
>>
>>  Naveen,
>>>
>>> For indexing Zip files with Tika, take a look at the following thread :
>>>
>>>
>>>
>>> http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html
>>>
>>> I got it to work with the 3.1 source and a couple of patches.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Gary.
>>>
>>>
>>>
>>> On 08/06/2011 04:12, Naveen Gupta wrote:
>>>
>>>  Hi Can somebody answer this ...
>>>>
>>>> 3. can somebody tell me an idea how to do indexing for a zip file ?
>>>>
>>>> 1. while sending docx, we are getting following error.
>>>>
>>>>
>


ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi

This is my document

in php

$xmldoc = 'F_14674gmail.com121sample.pptx';

  $ch = curl_init("http://localhost:8080/solr/update";);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
  curl_setopt ($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type:
text/xml") );
  curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc);

   $result= curl_exec($ch);
   if(!curl_errno($ch))
   {
   $info = curl_getinfo($ch);
   $header = substr($response, 0, $info['header_size']);
   echo 'Took ' . $info['total_time'] . ' seconds to send a
request to ' . $info['url'];
 }else{
 print_r('no idea');
}
println('result of query'.'  '.' -> '.$result);

It is throwing error

 Apache Tomcat/6.0.18 - Error report<!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}-->
HTTP Status 400 - Unexpected character ''' (code 39) in
prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]type Status reportmessage
Unexpected character ''' (code 39) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]description The
request sent by the client was syntactically incorrect (Unexpected character
''' (code 39) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]).Apache Tomcat/6.0.18


Thanks
Naveen


Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi,


curl http://localhost:8983/solr/update?commit=true -H "Content-Type:
text/xml" --data-binary 'testdoc'

Regards
Naveen

On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta  wrote:

> Hi
>
> This is my document
>
> in php
>
> $xmldoc = 'F_146 name="userid">74gmail.com name="attachment_size">121 name="attachment_name">sample.pptx';
>
>   $ch = curl_init("http://localhost:8080/solr/update";);
>   curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
>   curl_setopt ($ch, CURLOPT_POST, 1);
>   curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type:
> text/xml") );
>   curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc);
>
>$result= curl_exec($ch);
>if(!curl_errno($ch))
>{
>$info = curl_getinfo($ch);
>$header = substr($response, 0, $info['header_size']);
>echo 'Took ' . $info['total_time'] . ' seconds to send a
> request to ' . $info['url'];
>  }else{
>  print_r('no idea');
> }
> println('result of query'.'  '.' -> '.$result);
>
> It is throwing error
>
>  Apache Tomcat/6.0.18 - Error
> report<!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> H3
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> HTTP Status 400 - Unexpected character ''' (code 39) in
> prolog; expected '<'
>  at [row,col {unknown-source}]: [1,1] noshade="noshade">type Status reportmessage
> Unexpected character ''' (code 39) in prolog; expected '<'
>  at [row,col {unknown-source}]: [1,1]description The
> request sent by the client was syntactically incorrect (Unexpected character
> ''' (code 39) in prolog; expected '<'
>  at [row,col {unknown-source}]: [1,1]). noshade="noshade">Apache Tomcat/6.0.18
>
>
> Thanks
> Naveen
>
>
>


relevant result for query with boost factor on parameters

2011-06-18 Thread Naveen Gupta
Hi,
I am trying to achieve this use case with following expectation

three fields

1. field1
2. field2
3. field3

field1 should have the max relevance

field2 should have the next

field3 is the last

the term will be entered by end user (say* rock roll*)

i want to show the results which will contain *rock and roll* both in field1
(first)

i want to show the results which will contain *rock and roll* both in field
2 (first)

these should be only done for a given* field3 (x...@gmail.com)*

but if suppose field1 does not contain both the term *"rock" and "roll",
*
*special attention *then field 2 results should take the priority (show
the results which has both the terms first and then show the results with
respect to boost factor or relevance)

if both the fields do not contain these terms together (show as normal one
with field1 having more relevance than field2)

how to join the results for field3

that means for a given field3, the above results should be filtered.

I am trying this one, giving satisfactory results, but not the best one,

field1:(rock roll)^20 field2:(rock roll)^4 field3:x...@gmail.com

i was thinking of givning

filed1 field2 && field3

but not working.

Can you help in this regard?

What other config should i consider in terms of given context ?


Thanks
Naveen


indexing taking very long time

2011-08-02 Thread Naveen Gupta
Hi

We have a requirement where we are indexing all the messages of a a thread,
a thread may have attachment too . We are adding to the solr for indexing
and searching for applying few business rule.

For a user, we have almost many threads (100k) in number and each thread may
be having 10-20 messages.

Now what we are finding is that it is taking 30 mins to index the entire
threads.

When we run optimize then it is taking faster time.

The question here is that how frequently this optimize should be called and
when ?

Please note that we are following commit strategy (that is every after 10k
threads, commit is called). we are not calling commit after every doc.

Secondly how can we use multi threading from solr perspective in order to
improve jvm and other utilization ?


Thanks
Naveen


Re: IMP: indexing taking very long time

2011-08-02 Thread Naveen Gupta
Can somebody answer this?

What should be the best strategy for optimize (when million of messages we
are indexing for a new registered user)

Thanks
Naveen

On Tue, Aug 2, 2011 at 5:36 PM, Naveen Gupta  wrote:

> Hi
>
> We have a requirement where we are indexing all the messages of a a thread,
> a thread may have attachment too . We are adding to the solr for indexing
> and searching for applying few business rule.
>
> For a user, we have almost many threads (100k) in number and each thread
> may be having 10-20 messages.
>
> Now what we are finding is that it is taking 30 mins to index the entire
> threads.
>
> When we run optimize then it is taking faster time.
>
> The question here is that how frequently this optimize should be called and
> when ?
>
> Please note that we are following commit strategy (that is every after 10k
> threads, commit is called). we are not calling commit after every doc.
>
> Secondly how can we use multi threading from solr perspective in order to
> improve jvm and other utilization ?
>
>
> Thanks
> Naveen
>


merge factor performance

2011-08-04 Thread Naveen Gupta
Hi,

We are having a requirement where we are having almost 100,000 documents to
be indexed (atleast 20 fields). These fields are not having length greater
than 10 KB.

Also we are running parallel search for the same index.

We found that it is taking almost 3 min to index the entire documents.

Strategy what we are doing is that

We are making a commit after  15000 docs (single large xml doc)

We are having merge factor of 10 as if now

I am wondering if increasing the merge factor to 25 or 50 would increase the
performance.

also what about RAM Size (default is 32 MB) ?

Which other factors we need to consider ?

When should we consider optimize ?

Any other deviation from default would help us in achieving the target.

We are allocating JVM max heap size allocation 512 MB, default concurrent
mark sweep is set for garbage collection.


Thanks
Naveen


Re: merge factor performance

2011-08-04 Thread Naveen Gupta
Sorry for 15k Docs, it is taking 3 mins.

On Thu, Aug 4, 2011 at 10:07 PM, Naveen Gupta  wrote:

> Hi,
>
> We are having a requirement where we are having almost 100,000 documents to
> be indexed (atleast 20 fields). These fields are not having length greater
> than 10 KB.
>
> Also we are running parallel search for the same index.
>
> We found that it is taking almost 3 min to index the entire documents.
>
> Strategy what we are doing is that
>
> We are making a commit after  15000 docs (single large xml doc)
>
> We are having merge factor of 10 as if now
>
> I am wondering if increasing the merge factor to 25 or 50 would increase
> the performance.
>
> also what about RAM Size (default is 32 MB) ?
>
> Which other factors we need to consider ?
>
> When should we consider optimize ?
>
> Any other deviation from default would help us in achieving the target.
>
> We are allocating JVM max heap size allocation 512 MB, default concurrent
> mark sweep is set for garbage collection.
>
>
> Thanks
> Naveen
>
>
>
>


Re: indexing taking very long time

2011-08-05 Thread Naveen Gupta
Hi Erick,

We are having a requirement where we are having almost 100,000 documents to
be indexed (atleast 20 fields). These fields are not having length greater
than 10 KB.

Also we are running parallel search for the same index.

We found that it is taking almost 3 min to index the entire documents.

Strategy what we are doing is that

We are making a commit after  15000 docs (single large xml doc) (update
streaming using curl in php)

We are having merge factor of 10 as if now

I am wondering if increasing the merge factor to 25 or 50 would increase the
performance.

also what about RAM Size (default is 32 MB) ?

Which other factors we need to consider ?

When should we consider optimize ?

Any other deviation from default would help us in achieving the target.

We are allocating JVM max heap size allocation 512 MB, default concurrent
mark sweep is set for garbage collection.

One more thing, we have CPU utilization (20-25 % in all 4 cores) (using
htop)
Thanks
Naveen

On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson wrote:

> What version of Solr are you using? If it's a recent version, then
> optimizing is not that  essential, you can do it during off hours, perhaps
> nightly or weekly.
>
> As far as indexing speed, have you profiled your application to see whether
> it's Solr or your indexing process that's the bottleneck? A quick check
> would be to monitor the CPU utilization on the server and see if it's high.
>
> As far as multithreading, one option is to simply have multiple clients
> indexing simultaneously. But you haven't indicated how the indexing is
> being
> done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to
> provide those kinds of details to get meaningful help.
>
> Best
> Erick
> On Aug 2, 2011 8:06 AM, "Naveen Gupta"  wrote:
> > Hi
> >
> > We have a requirement where we are indexing all the messages of a a
> thread,
> > a thread may have attachment too . We are adding to the solr for indexing
> > and searching for applying few business rule.
> >
> > For a user, we have almost many threads (100k) in number and each thread
> may
> > be having 10-20 messages.
> >
> > Now what we are finding is that it is taking 30 mins to index the entire
> > threads.
> >
> > When we run optimize then it is taking faster time.
> >
> > The question here is that how frequently this optimize should be called
> and
> > when ?
> >
> > Please note that we are following commit strategy (that is every after
> 10k
> > threads, commit is called). we are not calling commit after every doc.
> >
> > Secondly how can we use multi threading from solr perspective in order to
> > improve jvm and other utilization ?
> >
> >
> > Thanks
> > Naveen
>


Re: indexing taking very long time

2011-08-05 Thread Naveen Gupta
Hi ERick,

Version of SOLR 3.0

We are indexing the data using CURL call from C interface to SOLR server
using REST.

We are merging 15,000 docs in a single XML doc and directly using CURL to
index the data and then calling commit. (update)

For each of the client, we are creating a new connection .(a php script uses
exec() command to start new C process for every user) and hitting the SOLR
server.

We are using default solrconfig except few of the fields changes.inschema.xml

Max JVM heap allocation (512 MB RAM) (512 MB RAM is for linux box as well)

Initially i increased merge factor 50 and Ram size of 50 MB, but needed to
reduce since we were getting
java.lang.OutOfMemoryError: Java heap space

it is taking 3 mins to index 15,000 docs  ( a client can have 100 000 docs
and we have many multiple clients). Also we run in parallel search query
from other client to this index as well.

its the time between curl was called and the time response came back

When we commit, CPU usage goes upto 25 % (not all the cores, but yeah few of
them). The total number of cores is 4.

Can you please advise where to start from tuning perspective.

Some blog i was going through, it clearly says that it should take 40 secs
to index 100,000 docs (if you have 10-12 fields defined). I forgot the link.


They talked about increasing the merge factor.

Thanks
Naveen

On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson wrote:

> What version of Solr are you using? If it's a recent version, then
> optimizing is not that  essential, you can do it during off hours, perhaps
> nightly or weekly.
>
> As far as indexing speed, have you profiled your application to see whether
> it's Solr or your indexing process that's the bottleneck? A quick check
> would be to monitor the CPU utilization on the server and see if it's high.
>
> As far as multithreading, one option is to simply have multiple clients
> indexing simultaneously. But you haven't indicated how the indexing is
> being
> done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to
> provide those kinds of details to get meaningful help.
>
> Best
> Erick
> On Aug 2, 2011 8:06 AM, "Naveen Gupta"  wrote:
> > Hi
> >
> > We have a requirement where we are indexing all the messages of a a
> thread,
> > a thread may have attachment too . We are adding to the solr for indexing
> > and searching for applying few business rule.
> >
> > For a user, we have almost many threads (100k) in number and each thread
> may
> > be having 10-20 messages.
> >
> > Now what we are finding is that it is taking 30 mins to index the entire
> > threads.
> >
> > When we run optimize then it is taking faster time.
> >
> > The question here is that how frequently this optimize should be called
> and
> > when ?
> >
> > Please note that we are following commit strategy (that is every after
> 10k
> > threads, commit is called). we are not calling commit after every doc.
> >
> > Secondly how can we use multi threading from solr perspective in order to
> > improve jvm and other utilization ?
> >
> >
> > Thanks
> > Naveen
>


LockObtainFailedException

2011-08-10 Thread Naveen Gupta
:59:56 PM org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug
-- POSSIBLE RESOURCE LEAK!!!

Kindly tell me where it is failing

We have increased timelockout. But still it is giving the same problem

Thanks
Naveen


Re: LockObtainFailedException

2011-08-11 Thread Naveen Gupta
Yes this was happening because of JVM heap size

But the real issue is that if our index size is growing (very high)

then indexing time is taking very long (using streaming)

earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
was taking 3 mins 20 secs time,

after deleting the index data, it is taking 9 secs

What would be approach to have better indexing performance as well as index
size should also at the same time.

The index size was around 4.5 GB

Thanks
Naveen

On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge wrote:

> Hi,
>
> When you get this exception with no other error or explananation in
> the logs, this is almost always because the JVM has run out of memory.
> Have you checked/profiled your mem usage/GC during the stream operation?
>
>
>
> On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta  wrote:
> > Hi,
> >
> > We are doing streaming update to solr for multiple user,
> >
> > We are getting
> >
> >
> > Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
> >
> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
> timed
> > out: NativeFSLock@/var/lib/solr/data/index/write.lock
> >at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> >at
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
> >at
> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
> >at
> >
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
> >at
> >
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
> >at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
> >at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> >at
> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
> >at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> >at
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> >at
> >
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> >at org.apache.tomcat.util.net.JIoEndpoint
> >
> > Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
> timed
> > out: NativeFSLock@/var/lib/solr/data/index/write.lock
> >at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> >at
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
> >at
> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
> >at
> >
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
> >at
> >
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
> >at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
> >at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> >a

Re: LockObtainFailedException

2011-08-12 Thread Naveen Gupta
HI Peter

I found the issue,

Actually we were getting this exception because of JVM space. I allocated
512 xms and 1024 xmx .. finally increased the time limit for write lock to
20 secs .. things are working fine ... but still it did not help ...

On closely analysis of doc which we were indexing, we were using
commitWithin as 10 secs, which was the root cause of taking so long for
indexing the document because of so many segments to be committed.

On separate commit command using curl solved the issue.

The performance improved from 3 mins to 1.5 secs :)

Thanks a lot
Naveen

On Thu, Aug 11, 2011 at 6:27 PM, Peter Sturge wrote:

> Optimizing indexing time is a very different question.
> I'm guessing your 3mins+ time you refer to is the commit time.
>
> There are a whole host of things to take into account regarding
> indexing, like: number of segments, schema, how many fields, storing
> fields, omitting norms, caching, autowarming, search activity etc. -
> the list goes on...
> The trouble is, you can look at 100 different Solr installations with
> slow indexing, and find 200 different reasons why each is slow.
>
> The best place to start is to get a full understanding of precisely
> how your data is being stored in the index, starting with adding docs,
> going through your schema, Lucene segments, solrconfig.xml etc,
> looking at caches, commit triggers etc. - really getting to know how
> each step is affecting performance.
> Once you really have a handle on all the indexing steps, you'll be
> able to spot the bottlenecks that relate to your particular
> environment.
>
> An index of 4.5GB isn't that big (but the number of documents tends to
> have more of an effect than the physical size), so the bottleneck(s)
> should be findable once you trace through the indexing operations.
>
>
>
> On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta  wrote:
> > Yes this was happening because of JVM heap size
> >
> > But the real issue is that if our index size is growing (very high)
> >
> > then indexing time is taking very long (using streaming)
> >
> > earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
> > was taking 3 mins 20 secs time,
> >
> > after deleting the index data, it is taking 9 secs
> >
> > What would be approach to have better indexing performance as well as
> index
> > size should also at the same time.
> >
> > The index size was around 4.5 GB
> >
> > Thanks
> > Naveen
> >
> > On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge  >wrote:
> >
> >> Hi,
> >>
> >> When you get this exception with no other error or explananation in
> >> the logs, this is almost always because the JVM has run out of memory.
> >> Have you checked/profiled your mem usage/GC during the stream operation?
> >>
> >>
> >>
> >> On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta 
> wrote:
> >> > Hi,
> >> >
> >> > We are doing streaming update to solr for multiple user,
> >> >
> >> > We are getting
> >> >
> >> >
> >> > Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
> >> >
> >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
> >> timed
> >> > out: NativeFSLock@/var/lib/solr/data/index/write.lock
> >> >at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> >> >at
> >> org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
> >> >at
> >> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
> >> >at
> >> >
> >>
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
> >> >at
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
> >> >at
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
> >> >at
> >> >
> >>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> >> >at
> >> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
> >> >at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
> >> >at
> >> >
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
> >> >at
> >> >
> >>
> org.ap

exceeded limit of maxWarmingSearchers ERROR

2011-08-13 Thread Naveen Gupta
Hi,

Most of the settings are default.

We have single node (Memory 1 GB, Index Size 4GB)

We have a requirement where we are doing very fast commit. This is kind of
real time requirement where we are polling many threads from third party and
indexes into our system.

We want these results to be available soon.

We are committing for each user (may have 10k threads and inside that 1
thread may have 10 messages). So overall documents per user will be having
around .1 million (10)

Earlier we were using commit Within  as 10 milliseconds inside the document,
but that was slowing the indexing and we were not getting any error.

As we removed the commit Within, indexing became very fast. But after that
we started experiencing in the system

As i read many forums, everybody told that this is happening because of very
fast commit rate, but what is the solution for our problem?

We are using CURL to post the data and commit

Also till now we are using default solrconfig.

Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=2, try again later.
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Naveen Gupta
Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins for
15,000 docs) per user. There are many paraller user committing to our SOLR.

So we removed the commitWithin, and hence performance was much much better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is submitted
for indexing.

The question here is what is difference between commitWithin and commit
(apart from the fact that commit takes memory and processes and additional
hardware usage)

Why we want it to be visible as soon as possible, since we are applying many
business rules on top of the results (older indexes as well as new one) and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then other
optimizations.

We will definitely try NRT. But please tell me other options which we can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson wrote:

> Ah, thanks, Mark... I must have been looking at the wrong JIRAs.
>
> Erick
>
> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller 
> wrote:
> >
> > On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
> >
> >> You either have to go to near real time (NRT), which is under
> >> development, but not committed to trunk yet
> >
> > NRT support is committed to trunk.
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-15 Thread Naveen Gupta
Nagendra

You wrote,

Naveen:

*NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable*. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have to be
cleared or the old searchers closed or  new searchers opened, and warmed
(error that you are facing).


Looking at the link which you mentioned is clearly what we wanted. But the
real thing is that you have "RA does need a commit for  a document to become
searchable" (please take a look at bold sentence) .

In future, for more loads, can it cater to Master Slave (Replication) and
etc to scale and perform better? If yes, we would like to go for NRT and
looking at the performance described in the article is acceptable. We were
expecting the same real time performance for a single user.

What about multiple users, should we wait for 1-2 secs before calling the
curl request to make SOLR perform better. Or internally it will handle with
multiple request (multithreaded and etc).

What would be doc size (10,000 docs) to allow JVM perform better? Have you
done any kind of benchmarking in terms of multi threaded and multi user for
NRT and also JVM tuning in terms of SOLR sever performance. Any kind of
performance analysis would help us to decide quickly to switch over to NRT.

Questions in terms for switching over to NRT,


1.Should we upgrade to SOLR 4.x ?

2. Any benchmarking (10,000 docs/secs).  The question here is more specific

the detail of individual doc (fields, number of fields, fields size,
parameters affecting performance with faceting or w/o faceting)

3. What about multiple users ?

A user in real time might be having an large doc size of .1 million. How to
break and analyze which one is better (though it is our task to do). But
still any kind of break up will help us. Imagine a user inbox.

4. JVM tuning and performance result based on Multithreaded environment.

5. Machine Details (RAM, CPU, and settings from SOLR perspective).

Hoping that you are getting my point. We want to benchmark the performance.
If you can involve me in your group, that would be great.

Thanks
Naveen



2011/8/15 Nagendra Nagarajayya 

> Bill:
>
> I did look at Marks performance tests. Looks very interesting.
>
> Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance:
> http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>
>
>
> Regards
>
> - Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.**org <http://rankingalgorithm.tgels.org>
>
>
>
> On 8/14/2011 7:47 PM, Bill Bell wrote:
>
>> I understand.
>>
>> Have you looked at Mark's patch? From his performance tests, it looks
>> pretty good.
>>
>> When would RA work better?
>>
>> Bill
>>
>>
>> On 8/14/11 8:40 PM, "Nagendra Nagarajayya"> transaxtions.com >
>> wrote:
>>
>>  Bill:
>>>
>>> The technical details of the NRT implementation in Apache Solr with
>>> RankingAlgorithm (SOLR-RA) is available here:
>>>
>>> http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf<http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf>
>>>
>>> (Some changes for Solr 3.x, but for most it is as above)
>>>
>>> Regarding support for 4.0 trunk, should happen sometime soon.
>>>
>>> Regards
>>>
>>> - Nagendra Nagarajayya
>>> http://solr-ra.tgels.org
>>> http://rankingalgorithm.tgels.**org <http://rankingalgorithm.tgels.org>
>>>
>>>
>>>
>>>
>>>
>>> On 8/14/2011 7:11 PM, Bill Bell wrote:
>>>
>>>> OK,
>>>>
>>>> I'll ask the elephant in the roomŠ.
>>>>
>>>> What is the difference between the new UpdateHandler from Mark and the
>>>> SOLR-RA?
>>>>
>>>> The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?
>>>>
>>>> Pros/Cons?
>>>>
>>>>
>>>> On 8/14/11 8:10 PM, "Nagendra
>>>> Nagarajayya"
>>>> >
>>>> wrote:
>>>>
>>>>  Naveen:
>>>>>
>>>>> NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
>>>>> document to become searchable. Any document that you add through update
>>>>> becomes  immediately searchable. So no need to commit from within your
>>>>> update client code.  Since there is no commit, the cache does not have
>>>>> 

Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-17 Thread Naveen Gupta
Hi Nagendra,

Thanks a lot .. i will start working on NRT today.. meanwhile old settings
(increased warmSearcher in Master) have not given me trouble till now ..

but NRT will be more suitable to us ... Will work on that one and will
analyze the performance and share with you.

Thanks
Naveen

2011/8/17 Nagendra Nagarajayya 

> Naveen:
>
> See below:
>
>> *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
>>
>> document to become searchable*. Any document that you add through update
>> becomes  immediately searchable. So no need to commit from within your
>> update client code.  Since there is no commit, the cache does not have to
>> be
>> cleared or the old searchers closed or  new searchers opened, and warmed
>> (error that you are facing).
>>
>>
>> Looking at the link which you mentioned is clearly what we wanted. But the
>> real thing is that you have "RA does need a commit for  a document to
>> become
>> searchable" (please take a look at bold sentence) .
>>
>>
> Yes, as said earlier you do not need a commit. A document becomes
> searchable as soon as you add it. Below is an example of adding a document
> with curl (this from the wiki at http://solr-ra.tgels.com/wiki/**
> en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>
> ):
>
> curl "http://localhost:8983/solr/**update/csv?stream.file=/tmp/**
> x1.csv&encapsulator=%1f<http://localhost:8983/solr/update/csv?stream.file=/tmp/x1.csv&encapsulator=%1f>
> "
>
>
> There is no commit included. The contents of the document become
> immediately searchable.
>
>
>  In future, for more loads, can it cater to Master Slave (Replication) and
>> etc to scale and perform better? If yes, we would like to go for NRT and
>> looking at the performance described in the article is acceptable. We were
>> expecting the same real time performance for a single user.
>>
>>
> There are no changes to Master/Slave (replication) process. So any changes
> you have currently will work as before or if you enable replication later,
> it should still work as without NRT.
>
>
>  What about multiple users, should we wait for 1-2 secs before calling the
>> curl request to make SOLR perform better. Or internally it will handle
>> with
>> multiple request (multithreaded and etc).
>>
>
> Again for updating documents, you do not have to change your current
> process or code. Everything remains the same, except that if you were
> including commit, you do not include commit in your update statements. There
> is no change to the existing update process so internally it will not queue
> or multi-thread updates. It is as in existing Solr functionality, there no
> changes to the existing setup.
>
> Regarding perform better, in the Wiki paper  every update through curl adds
> (streams) 500 documents. So you could take this approach. (this was
> something that I chose randomly to test the performance but seems to be
> good)
>
>
>  What would be doc size (10,000 docs) to allow JVM perform better? Have you
>> done any kind of benchmarking in terms of multi threaded and multi user
>> for
>> NRT and also JVM tuning in terms of SOLR sever performance. Any kind of
>> performance analysis would help us to decide quickly to switch over to
>> NRT.
>>
>>
> The performance discussed in the wiki paper uses the MBArtists index. The
> MBArtists index is the index used as one of the examples in the book, Solr
> 1.4 Enterprise Search Server. You can download and build this index if you
> have the book or can also download the contents from musicbrainz.org.
>  Each doc maybe about 100 bytes and has about 7 fields. Performance with
> wikipedia's xml dump, commenting out skipdoc field (include redirects) in
> the dataconfig.xml [ dataimport handler ], the update performance is about
> 15000 docs / sec (100 million docs), with the skipdoc enabled (does not skip
> redirects), the performance is about 1350 docs / sec [ time spent mostly
> converting validating/xml  than actual update ] (about 11 million docs ).
>  Documents in wikipedia can be quite big, at least avg size of about
> 2500-5000 bytes or more.
>
> I would suggest that you download and give NRT with Apache Solr 3.3 and
> RankingAlgorithm a try and get a feel of it as this would be the best way to
> see how your config works with it.
>
>
>  Questions in terms for switching over to NRT,
>>
>>
>> 1.Should we upgrade to SOLR 4.x ?
>>
>> 2. Any benchmarking (10,000 docs/secs).  The question here is more
>> specific
>>
>> the detail of indi

Disabling jvm properties from ui

2018-11-07 Thread Naveen M
Hi,

Is there a way to disable jvm properties from the solr UI.

It has some information which we don’t want to expose. Any pointers would
be helpful.


Thanks


Solr index writing to s3

2019-01-16 Thread Naveen M
hi,

My requirement is to write the index data into S3, we have solr installed
on aws instances. Please let me know if there is any documentation on how
to achieve writing the index data to s3.

Thanks