from:"vivek sar"

Re: Boosting for most recent documents

2009-08-03 Thread vivek sar

Hi,

 Related question to "getting the latest records first". After trying
few suggested ways (function query, index time boosting) of getting
the latest first I settled for simple "sort" parameter,

 sort=field+asc

As per wiki, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort),

Lucene would cache "4 bytes * the number of documents" plus unique
terms for the sorted field in fieldcache. This is done so subsequent
sort requests can be retrieved from cache. So the memory usage if I
got 1 billion records in one Indexer instance, for ex,

1) 1 billion records
2) sort on time stamp field (rounded to hour) - for 1 year - 8760
unique terms. (negligible)
3) Total memory requirement  for sorting on this single field would be
around  1G * 4 = 4GB

So, if I run only one sort query once in a day there would still be
4GB required at all time. Is there any way to tell Solr/Lucene to
release the memory once the query has been run? Basically I don't want
cache. I've commented out all the cache parameters in the
solrconfig.xml, but I still see the very first time I run the sort
query the memory jumps by 4 G and remains there.

Is there any way so Lucene/Solr doesn't use so much memory for sorting
so my application can scale (sorting memory requirement won't be
function of number of documents)?

Thanks,
-vivek

On Thu, Jul 16, 2009 at 3:10 PM, Chris
Hostetter wrote:
>
> :   Does anyone know if Solr supports sorting by internal document ids,
> : i.e, like Sort.INDEXORDER in Lucene? If so, how?
>
> It does not.  in Solr the decisison to make "score desc" the default
> search ment there is no way to request simple docId ordering.
>
> : Also, if anyone have any insight on if function query loads up unique
> : terms (like field sorts) in memory or not.
>
> It uses the exact same FieldCache as sorting.
>
>
>
>
> -Hoss
>

Replication over multi-core solr

2009-08-18 Thread vivek sar

Hi,

  We use multi-core setup for Solr, where new cores are added
dynamically to solr.xml. Only one core is active at a time. My
question is how can the replication be done for multi-core - so every
core is replicated on the slave?

I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
and few questions related to that,

1) How do we replicate solr.xml where we have list of cores? Wiki
says, "Only files in the 'conf' dir of solr instance is replicated. "
- since, solr.xml is in the home directory how do we replicate that?

2) Solrconfig.xml in slave takes a static core url,

http://localhost:port/solr/corename/replication

As in our case cores are created dynamically (new core created after
the active one reaches some capacity), how can we define master core
dynamically for replication? The only I see it is using "fetchIndex"
command and passing new core info there - is it right? If so, does the
slave application have write code to poll Master periodically and fire
"fetchIndex" command, but how would Slave know the Master corename -
as they are created dynamically on the Master?

Thanks,
-vivek

Re: Replication over multi-core solr

2009-08-19 Thread vivek sar

Licinio,

 Please open a separate thread - as it's a different issue - and I can
respond there.

-vivek

2009/8/19 Licinio Fernández Maurelo :
> Hi Vivek,
> currently we want to add cores dynamically when the active one reaches
> some capacity,
> can you give me some hints to achieve such this functionality? (Just
> wondering if you have used shell-scripting or you have code some 100%
> Java based solution)
>
> Thx
>
>
> 2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् :
>> On Wed, Aug 19, 2009 at 2:27 AM, vivek sar wrote:
>>> Hi,
>>>
>>>  We use multi-core setup for Solr, where new cores are added
>>> dynamically to solr.xml. Only one core is active at a time. My
>>> question is how can the replication be done for multi-core - so every
>>> core is replicated on the slave?
>>
>> replication does not handle new core creation. You will have to issue
>> the core creation command to each slave separately.
>>>
>>> I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
>>> and few questions related to that,
>>>
>>> 1) How do we replicate solr.xml where we have list of cores? Wiki
>>> says, "Only files in the 'conf' dir of solr instance is replicated. "
>>> - since, solr.xml is in the home directory how do we replicate that?
>> solr.xml canot be replicated. even if you did it is not reloaded.
>>>
>>> 2) Solrconfig.xml in slave takes a static core url,
>>>
>>>    >> name="masterUrl">http://localhost:port/solr/corename/replication
>>
>> put a placeholder like
>> > name="masterUrl">http://localhost:port/solr/${solr.core.name}/replication
>> so the corename is automatically replaced
>>
>>>
>>> As in our case cores are created dynamically (new core created after
>>> the active one reaches some capacity), how can we define master core
>>> dynamically for replication? The only I see it is using "fetchIndex"
>>> command and passing new core info there - is it right? If so, does the
>>> slave application have write code to poll Master periodically and fire
>>> "fetchIndex" command, but how would Slave know the Master corename -
>>> as they are created dynamically on the Master?
>>>
>>> Thanks,
>>> -vivek
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>
>
> --
> Lici
>

Re: Adding cores dynamically

2009-08-19 Thread vivek sar

Lici,

  We're doing similar thing with multi-core - when a core reaches
capacity (in our case 200 million records) we start a new core. We are
doing this via web service call (Create web service),

  http://wiki.apache.org/solr/CoreAdmin

This is all done in java code - before writing we check the number of
records in core - if reached it's capacity we create a new core and
then index there.

-vivek



2009/8/19 Licinio Fernández Maurelo :
> Hi there,
>
> currently we want to add cores dynamically when the active one reaches
> some capacity,
> can anyone give me some hints to achieve such this functionality? (Just
> wondering if you have used shell-scripting or you have code some 100%
> Java based solution)
>
> Thx
>
>
> --
> Lici
>

Re: Adding cores dynamically

2009-08-25 Thread vivek sar

There were two main reasons we went with multi-core solution,

1) We found the indexing speed starts dipping once the index grow to a
certain size - in our case around 50G. We don't optimize, but we have
to maintain a consistent index speed. The only way we could do that
was keep creating new cores (on the same box, though we do use
multiple boxes to scale horizontally as well) once it reaches its
capacity. The old core is not written to again once it reaches its
capacity.

2) Be able to drop the whole core for pruning purposes. We didn't want
to delete records from the index, so the best solution was to simply
delete the complete core directory (we do maintain the time period for
each core), which is much faster and easy to maintain.

So far things have been working fine. I'm not sure if there is any
inherent problem with this architecture given the above limitations
and requirements.

-vivek

On Tue, Aug 25, 2009 at 10:57 AM, Lance Norskog wrote:
> One problem is the IT logistics of handling the file set. At 200 million
> records you have at least 20G of data in one Lucene index. It takes hours to
> optimize this, and 10s of minutes to copy the optimized index around to
> query servers.
> Another problem is that indexing speed drops off after the index reaches a
> certain size. When making multiple indexes, you want to stop indexing before
> that size.
> Lance
>
> On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter
> wrote:
>
>>
>> :   We're doing similar thing with multi-core - when a core reaches
>> : capacity (in our case 200 million records) we start a new core. We are
>> : doing this via web service call (Create web service),
>>
>> this whole thread perplexes me ... while i can understand not wanting to
>> let an index grow without bound becuase of hardware limitation, i don't
>> understand what value you are gaining by creating a new core on the same
>> box -- you're using the same physical resources to search the same number
>> of documents, making multiple cores for this actaully seems like it would
>> take up *more* resources to search the same amount of content, because the
>> individual cores will be isolated and the term dictionaries can't be
>> shared (not to mention you have to do a multi-shard query to get results
>> from all the cores)
>>
>> are you doing something special with the old cores vs the new ones? (ie:
>> create the new cores on new machines, shutdown cores after a certian
>> amount of time has expired, etc...)
>>
>>
>> : > Hi there,
>> : >
>> : > currently we want to add cores dynamically when the active one reaches
>> : > some capacity,
>> : > can anyone give me some hints to achieve such this functionality? (Just
>> : > wondering if you have used shell-scripting or you have code some 100%
>> : > Java based solution)
>> : >
>> : > Thx
>> : >
>> : >
>> : > --
>> : > Lici
>> : >
>> :
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

How does ReplicationHandler backup work?

2009-08-28 Thread vivek sar

Hi,

  As one of our requirement we need to backup Master indexes to Slave
periodically. I've been able to successfully sync the index using
"fetchIndex" command,

   
http://localhost:9006/solr/audit_20090828_1/replication?command=fetchindex&masterUrl=http://localhost:8080/solr/audit_20090828_1/replication

now, I'm wondering how do I do the backup. Looking at the wiki,
http://wiki.apache.org/solr/SolrReplication, it seems there is a
backup command, but that says backup on Master. I tried replacing
command "fetchindex" to "backup", but that didn't work. How can do I
complete index backup (for a particular core) from Master to Slave?

Thanks,
-vivek

Partition index by time using Solr

2009-03-25 Thread vivek sar

Hi,

  I've used Lucene before, but new to Solr. I've gone through the
mailing list, but unable to find any clear idea on how to partition
Solr indexes. Here is what we want,

  1) Be able to partition indexes by timestamp - basically partition
per day (create a new index directory every day)

  2) Be  able to search partitions based on timestamp. All our queries
are time based, so instead of looking into all the partitions I want
to go directly to the partitions where the data might be.

  3) Be able to purge any data older than 6 months without bringing
down the application. Since, partitions would be marked by timestamp
we would just have to delete the old partitions.


  This is going to be a distributed system with 2 boxes each running
an instance of Solr. I don't  want to replicate data, but each box may
have same timestamp partition with different data. We would be
indexing on avg of  20 million documents (each document = 500 bytes)
with estimate of 10g in index size - evenly distributed across
machines
  (each machine would get roughly 5g of index everyday).

  My questions,

  1) Is this all possible using Solr? If not, should I just do this
using Lucene or is there any other out-of-box alternative?
  2) If it's possible in Solr how do we do this - configuration, setup etc.
  3) How would I optimize the partitions - would it be required when using Solr?

  Thanks,
  -vivek

Re: Partition index by time using Solr

2009-03-26 Thread vivek sar

Thanks Otis for the response. I'm still not clear on few things,

1) I thought Solr can work with only one index at a time. In order to
have multiple indexes you need multiple instances of Solr - isn't that
right? How can we make Solr to read/ write from and to multiple
indexes?

2) What does it mean by "partitioning outside of Solr"? If all the
data is indexed by Solr into one index - how would one parition it
outside Solr that is still searchable by Solr when needed?

Our main problem is scaling with Solr. Our indexes grow so big (like
10G-20G everyday) that it's hard to optimize them and search on large
indexes. That's why we are trying to partition them by time. We do
need to keep up to 6 months of data.

The only way I can think of limiting the index size is by running
multiple Solr instances, but even then it's not a scalable solution if
the indexes keep growing.

Thanks,
-vivek


On Wed, Mar 25, 2009 at 6:59 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Yes, you can use Solr for this, but index partitioning should be done outside 
> of Solr.  That is, your app will need to know where to send each doc based on 
> its timestamp, when and where to create new index (new Solr core), and so on. 
>  Similarly, deleting older than N days is done by you, using a delete by 
> query with a date-based open-ended range query.  The Solr setup is really 
> done the same as usual, since all the partitioning-related stuff lives 
> outside of Solr.  Of course, you could come up with a "Solr Proxy" component 
> that abstract some/all of this and pretends to be Solr.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, March 25, 2009 3:52:11 PM
>> Subject: Partition index by time using Solr
>>
>> Hi,
>>
>>   I've used Lucene before, but new to Solr. I've gone through the
>> mailing list, but unable to find any clear idea on how to partition
>> Solr indexes. Here is what we want,
>>
>>   1) Be able to partition indexes by timestamp - basically partition
>> per day (create a new index directory every day)
>>
>>   2) Be  able to search partitions based on timestamp. All our queries
>> are time based, so instead of looking into all the partitions I want
>> to go directly to the partitions where the data might be.
>>
>>   3) Be able to purge any data older than 6 months without bringing
>> down the application. Since, partitions would be marked by timestamp
>> we would just have to delete the old partitions.
>>
>>
>>   This is going to be a distributed system with 2 boxes each running
>> an instance of Solr. I don't  want to replicate data, but each box may
>> have same timestamp partition with different data. We would be
>> indexing on avg of  20 million documents (each document = 500 bytes)
>> with estimate of 10g in index size - evenly distributed across
>> machines
>>   (each machine would get roughly 5g of index everyday).
>>
>>   My questions,
>>
>>   1) Is this all possible using Solr? If not, should I just do this
>> using Lucene or is there any other out-of-box alternative?
>>   2) If it's possible in Solr how do we do this - configuration, setup etc.
>>   3) How would I optimize the partitions - would it be required when using 
>> Solr?
>>
>>   Thanks,
>>   -vivek
>
>

Re: Partition index by time using Solr

2009-03-26 Thread vivek sar

Thanks again Otis. Few more questions,

1) My app currently is a stand-alone java app (not part of Solr JVM)
that simply calls update webservice on Solr (running in a separate web
container) passing 10k documents at once.  In your example you
mentioned getting list of Indexers and adding document to them
manually - do you mean I use Lucene directly in my app to do the
indexing and use Solr just for search purposes? How can I simply write
to different cores (using Solr webservice) without putting Lucene code
in my app?

2) In MultiCore example on Wiki shows pre-configured cores in the
solr.xml. How can I create cores on fly from my app - is there a
command (or web service) to tell Solr to load new core? For ex., every
day I want to create a new core for that day on fly and index in that
core only. Also, would I be able to search on cores created on fly?

Currently, I'm using standard out-of-box requests and response
handlers for Solr. Would using multi-core require any custom handlers?

Thanks,
-vivek

On Thu, Mar 26, 2009 at 10:38 AM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> 1) Look for "multicore" on Solr Wiki
>
> 2) I meant to say you would not index it all in one index (that's what you 
> wanted to do, no?).  So in your app you'd do something like
> ts = doc.getTimestamp();
> indexer = getIndexer(ts); // gives you different indexer based on the ts.  
> You keep track of all the indexers (e.g. all instances of solr client you 
> have in your app, each of which points to a different solr server/core/index)
> indexer.index(doc);
>
>
> If your issue is large indices and search performance, then the solution is 
> not to have multiple solr cores/indices per machine as much as distributed 
> indexing (multiple servers).  Look at DistributedSearch page on the Wiki.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>

How to optimize Index Process?

2009-03-27 Thread vivek sar

Hi,

  We have a distributed Solr system (2-3 boxes with each running 2
instances of Solr and each Solr instance can write to multiple cores).
Our use case is high index volume - we can get up to 100 million
records (1 record = 500 bytes) per day, but very low query traffic
(only administrators may need to search for data - once an hour our
so). So, we need very fast index time. Here are the things I'm trying
to find out in order to optimize our index process,

1) What's the optimum index size? I've noticed as the index size grows
the indexing time starts increasing. In our test less than 10G index
size we could index over 2K/sec, but as it grows over 20G the index
rate drops to 1400/sec and keeps dropping as index size grows. I'm
trying to see whether we can partition (create new SolrCore) after
10G.
 - related question, is there a way to find the SolrCore size (any
web service for that?) - based on that information I can create a new
core and freeze the one which has reached 10G.

2) In our test, we noticed that after few hours (after 8 hours of
indexing) there is a period (3-4 hours period) where the indexing is
very-very slow (like 500 records/sec) and after that period indexing
returns back to normal rate (1500/sec). Does Solr run any optimize
command on its own? How can we find that out?  I'm not issuing any
optimize command - should I be doing that after certain time?

3) Every time I add new documents (10K at once) to the index I see
searcher closing and then re-opening/re-warming (in Catalina.out)
after commit is done. I'm not sure if this is an expensive operation.
Since, our search volume is very low can I configure Solr to not do
this? Would it make indexing any faster?

Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@33d9337c main
Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@46ba6905 main
Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main

4) Anything else (any other configuration in Solr - I'm currently
using all default settings in the solrconfig.xml and default handlers)
that could help optimize my indexing process?

Thanks,
-vivek

OOM at MultiSegmentReader.norms

2009-03-27 Thread vivek sar

Hi,

   I've index of size 50G (around 100 million documents) and growing -
around 2000 records (1 rec = 500 byes) are being written every second
continuously. If I make any search on this index I get OOM. I'm using
default cache settings (512,512,256) in the solrconfig.xml. The search
is using the admin interface (returning 10 rows) with no sorting,
faceting or highlighting. Max heap size is 1024m.

Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335)
at 
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

What could be the problem?

Thanks,
-vivek

Re: How to optimize Index Process?

2009-03-28 Thread vivek sar

Thanks Otis. This is very useful. I'll try all your suggestions and
post my findings (and improvements).

Thanks,
-vivek

On Fri, Mar 27, 2009 at 7:08 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Answers inlined.
>
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
>>   We have a distributed Solr system (2-3 boxes with each running 2
>> instances of Solr and each Solr instance can write to multiple cores).
>
> Is this really optimal?  How many CPU cores do your boxes have vs. the number 
> of Solr cores?
>
>> Our use case is high index volume - we can get up to 100 million
>> records (1 record = 500 bytes) per day, but very low query traffic
>> (only administrators may need to search for data - once an hour our
>> so). So, we need very fast index time. Here are the things I'm trying
>> to find out in order to optimize our index process,
>
> It's tarting to sound like you might be able to batch your data and use 
> http://wiki.apache.org/solr/UpdateCSV -- it's the fastest indexing method, I 
> believe.
>
>> 1) What's the optimum index size? I've noticed as the index size grows
>> the indexing time starts increasing. In our test less than 10G index
>> size we could index over 2K/sec, but as it grows over 20G the index
>> rate drops to 1400/sec and keeps dropping as index size grows. I'm
>> trying to see whether we can partition (create new SolrCore) after
>> 10G.
>
> That's likely due to Lucene's segment merging. You can make mergeFactor 
> bigger to make segment merging less frequent, but don't make it to high or 
> you'll run into open file descriptor limits (which you could raise, of 
> course).
>
>>      - related question, is there a way to find the SolrCore size (any
>> web service for that?) - based on that information I can create a new
>> core and freeze the one which has reached 10G.
>
> You can see the number of docs in an index via Admin Statistics page (the 
> response is actually XML, look at the source)
>
>> 2) In our test, we noticed that after few hours (after 8 hours of
>> indexing) there is a period (3-4 hours period) where the indexing is
>> very-very slow (like 500 records/sec) and after that period indexing
>> returns back to normal rate (1500/sec). Does Solr run any optimize
>> command on its own? How can we find that out?  I'm not issuing any
>> optimize command - should I be doing that after certain time?
>
> No, it doesn't run optimize on its own.  It could be running auto-commit, but 
> you should comment that out anyway.  Try doing a thread dump to see what's 
> doing on and watching the system with top, vmstat.
> No, you shouldn't optimize until you are completely done.
>
>> 3) Every time I add new documents (10K at once) to the index I see
>> searcher closing and then re-opening/re-warming (in Catalina.out)
>> after commit is done. I'm not sure if this is an expensive operation.
>> Since, our search volume is very low can I configure Solr to not do
>> this? Would it make indexing any faster?
>
> Are you running the commit command after every 10K docs?  No need to do that 
> if you don't need your searcher to see the changes immediately.
>
>> Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close
>> INFO: Closing searc...@33d9337c main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher
>> INFO: Opening searc...@46ba6905 main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main
>>
>> 4) Anything else (any other configuration in Solr - I'm currently
>> using all default settings in the solrconfig.xml and default handlers)
>> that could help optimize my indexing process?
>
> Increase ramBufferSizeMB as much as you can afford.
> Comment out maxBufferedDocs, it's deprecated.
> Increase mergeFactor slightly.
> Consider the CSV approach.
> Index with multiple threads (match the number of CPU cores).
> If you are using Solrj, use the Streaming version of SolrServer.
> Give the JVM more memory (you'll need it if you increase ramBufferSizeMB)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>

Re: OOM at MultiSegmentReader.norms

2009-03-30 Thread vivek sar

Thanks Otis and Mike.

I'm indexing total of 9 fields, with 5 having norms turned on. I think
I may not need it and will try use the omitNorms for them.

How do I make use of RAMBuffer in Solr? I couldn't find anything on
this on the Wiki - any pointer?

Thanks,
-vivek

On Sat, Mar 28, 2009 at 1:09 AM, Michael McCandless
 wrote:
> Still, 1024M ought to be enough to load one field's norms (how many
> fields have norms?).  If you do things requiring FieldCache that'll
> also consume RAM.
>
> It's also possible you're hitting this bug (false OOME) in Sun's JRE:
>
>  http://issues.apache.org/jira/browse/LUCENE-1566
>
> Feel free to go vote for it!
>
> Mike
>
> On Fri, Mar 27, 2009 at 10:11 PM, Otis Gospodnetic
>  wrote:
>>
>> That's a tiny heap.  Part of it is used for indexing, too.  And the fact 
>> that your heap is so small shows you are not really making use of that nice 
>> ramBufferSizeMB setting. :)
>>
>> Also, use omitNorms="true" for fields that don't need norms (if their types 
>> don't already do that).
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: vivek sar 
>>> To: solr-user@lucene.apache.org
>>> Sent: Friday, March 27, 2009 6:15:59 PM
>>> Subject: OOM at MultiSegmentReader.norms
>>>
>>> Hi,
>>>
>>>    I've index of size 50G (around 100 million documents) and growing -
>>> around 2000 records (1 rec = 500 byes) are being written every second
>>> continuously. If I make any search on this index I get OOM. I'm using
>>> default cache settings (512,512,256) in the solrconfig.xml. The search
>>> is using the admin interface (returning 10 rows) with no sorting,
>>> faceting or highlighting. Max heap size is 1024m.
>>>
>>> Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log
>>> SEVERE: java.lang.OutOfMemoryError: Java heap space
>>>         at
>>> org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335)
>>>         at
>>> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
>>>         at 
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>>>         at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>>>         at org.apache.lucene.search.Searcher.search(Searcher.java:105)
>>>         at
>>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
>>>         at
>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
>>>         at
>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
>>>         at
>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
>>>         at
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
>>>         at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>>         at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>         at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>>         at
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>         at
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>         at
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>         at
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>         at
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>         at
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>         at
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>
>>> What could be the problem?
>>>
>>> Thanks,
>>> -vivek
>>
>>
>

Merging Solr Indexes

2009-03-31 Thread vivek sar

Hi,

  As part of speeding up the index process I'm thinking of spawning
multiple threads which will write to different temporary SolrCores.
Once the index process is done I want to merge all the indexes in
temporary cores to a master core. For ex., if I want one SolrCore per
day then every index cycle I'll spawn 4 threads which will index into
some temporary index and once they are done I want to merge all these
into the day core. My questions,

1) I want to use the same schema and solrconfig.xml for all cores
without duplicating them - how do I do that?
2) How do I merge the temporary Solr cores into one master core
programmatically? I've read the wiki on "MergingSolrIndexes", but I
want to do it programmatically (like in Lucene -
writer.addIndexes(..)) once the temporary indices are done.
3) Can I remove the temporary indices once the merge process is done?
4) Is this the right strategy to speed up indexing?

Thanks,
-vivek

Defining DataDir in Multi-Core

2009-03-31 Thread vivek sar

Hi,

  I'm trying to set up cores dynamically. I want to use the same
schema.xml and solrconfig.xml for all the created cores, so plan to
pass the same instance directory, but different dir directory. Here is
what I got in solr.xml by default (I didn't want define any core here,
but looks like we have to have at least one core defined before we
start the Solr).



  



Now I run the following URL in the browser (as described on wiki -
http://wiki.apache.org/solr/CoreAdmin),

http://localhost:8080/solr/admin/cores?action=CREATE&name=20090331_1&instanceDir=/Users/opal/temp/chat/solr&dataDir=/Users/opal/temp/chat/solr/data/20090331_1

I get a response,

/Users/opal/temp/chat/solr/solr.xml

Now when I check the solr.xml I see,



  
  



Note, there is NO dir directory specified. When I check the status
(http://localhost:8080/solr/admin/cores?action=STATUS) I see,

core0
/Users/opal/temp/afterchat/solr/./
/Users/opal/temp/afterchat/solr/./data/
...

20090331_2
/Users/opal/temp/afterchat/solr/
/Users/opal/temp/afterchat/solr/data/

both cores are pointing to the same data directory. My question is how
can I create cores on fly and have them point to different data
directories so each core write index in different location?

Thanks,
-vivek

Re: Defining DataDir in Multi-Core

2009-04-01 Thread vivek sar

I'm using the latest released one - Solr 1.3. The wiki says passing
dataDir to CREATE action (web service) should work, but that doesn't
seem to be working.

-vivek



2009/3/31 Noble Paul നോബിള്‍  नोब्ळ् :
> which version of Solr are you using? if you are using one from trunk ,
> you can pass the dataDir as an extra parameter?
>
> On Wed, Apr 1, 2009 at 7:41 AM, vivek sar  wrote:
>> Hi,
>>
>>  I'm trying to set up cores dynamically. I want to use the same
>> schema.xml and solrconfig.xml for all the created cores, so plan to
>> pass the same instance directory, but different dir directory. Here is
>> what I got in solr.xml by default (I didn't want define any core here,
>> but looks like we have to have at least one core defined before we
>> start the Solr).
>>
>> 
>>    
>>      
>>    
>> 
>>
>> Now I run the following URL in the browser (as described on wiki -
>> http://wiki.apache.org/solr/CoreAdmin),
>>
>> http://localhost:8080/solr/admin/cores?action=CREATE&name=20090331_1&instanceDir=/Users/opal/temp/chat/solr&dataDir=/Users/opal/temp/chat/solr/data/20090331_1
>>
>> I get a response,
>>
>> /Users/opal/temp/chat/solr/solr.xml
>>
>> Now when I check the solr.xml I see,
>>
>> 
>> 
>>  
>>  
>> 
>> 
>>
>> Note, there is NO dir directory specified. When I check the status
>> (http://localhost:8080/solr/admin/cores?action=STATUS) I see,
>>
>> core0
>> /Users/opal/temp/afterchat/solr/./
>> /Users/opal/temp/afterchat/solr/./data/
>> ...
>>
>> 20090331_2
>> /Users/opal/temp/afterchat/solr/
>> /Users/opal/temp/afterchat/solr/data/
>>
>> both cores are pointing to the same data directory. My question is how
>> can I create cores on fly and have them point to different data
>> directories so each core write index in different location?
>>
>> Thanks,
>> -vivek
>>
>
>
>
> --
> --Noble Paul
>

Re: Merging Solr Indexes

2009-04-01 Thread vivek sar

Thanks Otis. Could you write to same core (same index) from multiple
threads at the same time? I thought each writer would lock the index
so other can not write at the same time. I'll try it though.

Another reason of putting indexes in separate core was to limit the
index size. Our index can grow up to 50G a day, so I was hoping
writing to smaller indexes would be faster in separate cores and if
needed I can merge them at later point (like end of day). I want to
keep daily cores. Isn't this a good idea? How else can I limit the
index size (beside multiple instances or separate boxes).

Thanks,
-vivek


On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
 wrote:
>
> Let me start with 4)
> Have you tried simply using multiple threads to send your docs to a single 
> Solr instance/core?  You should get about the same performance as what you 
> are trying with your approach below, but without the headache of managing 
> multiple cores and index merging (not yet possible to do programatically).
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, March 31, 2009 1:59:01 PM
>> Subject: Merging Solr Indexes
>>
>> Hi,
>>
>>   As part of speeding up the index process I'm thinking of spawning
>> multiple threads which will write to different temporary SolrCores.
>> Once the index process is done I want to merge all the indexes in
>> temporary cores to a master core. For ex., if I want one SolrCore per
>> day then every index cycle I'll spawn 4 threads which will index into
>> some temporary index and once they are done I want to merge all these
>> into the day core. My questions,
>>
>> 1) I want to use the same schema and solrconfig.xml for all cores
>> without duplicating them - how do I do that?
>> 2) How do I merge the temporary Solr cores into one master core
>> programmatically? I've read the wiki on "MergingSolrIndexes", but I
>> want to do it programmatically (like in Lucene -
>> writer.addIndexes(..)) once the temporary indices are done.
>> 3) Can I remove the temporary indices once the merge process is done?
>> 4) Is this the right strategy to speed up indexing?
>>
>> Thanks,
>> -vivek
>
>

Re: Defining DataDir in Multi-Core

2009-04-01 Thread vivek sar

Thanks Shalin.

Is it available in the latest nightly build?

Is there any other way I can create cores dynamically (using CREATE
service) which will use the same schema.xml and solrconfig.xml, but
write to different data directories?

Thanks,
-vivek

On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar
 wrote:
> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar  wrote:
>> I'm using the latest released one - Solr 1.3. The wiki says passing
>> dataDir to CREATE action (web service) should work, but that doesn't
>> seem to be working.
>>
>
> That is a Solr 1.4 feature (not released yet).
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Defining DataDir in Multi-Core

2009-04-01 Thread vivek sar

Hi,

  I tried the latest nightly build (04-01-09) - it takes the dataDir
property now, but it's creating the Data dir at the wrong location.
For ex., I've the following in solr.xml,

but, it always seem to be creating the solr/data directory in the cwd
(where I started the Tomcat from). Here is the log from Catalina.out,

Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore 
INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/,
dataDir=./solr/data/
..
Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex
WARNING: [core2] Solr index directory './solr/data/index' doesn't
exist. Creating new index...

I've also tried relative paths, but to no avail.

Is this a bug?

Thanks,
-vivek

On Wed, Apr 1, 2009 at 9:45 AM, vivek sar  wrote:
> Thanks Shalin.
>
> Is it available in the latest nightly build?
>
> Is there any other way I can create cores dynamically (using CREATE
> service) which will use the same schema.xml and solrconfig.xml, but
> write to different data directories?
>
> Thanks,
> -vivek
>
> On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar
>  wrote:
>> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar  wrote:
>>> I'm using the latest released one - Solr 1.3. The wiki says passing
>>> dataDir to CREATE action (web service) should work, but that doesn't
>>> seem to be working.
>>>
>>
>> That is a Solr 1.4 feature (not released yet).
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>

Re: Runtime exception when adding documents using solrj

2009-04-01 Thread vivek sar

Hi,

  I'm trying to add the list of POJO objects (using annotations) using
solrj, but the "server.addBeans(...) " is throwing this exception,

org.apache.solr.common.SolrException: Bad Request
Bad Request
request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2

Note, I'm using multi-core. There is no other exception in the solr log.

Related question - I'm trying to upgrade the solrj from nightly build,
but I get some classnotfound exception
(java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory). What are
all the dependencies for Solrj1.4 (wiki has only up to 1.3
information).

Thanks,
-vivek



On Wed, Apr 1, 2009 at 3:30 AM, Radha C.  wrote:
>
> Thanks Paul, I resolved it, I missed one field declaration in schema.xml. Now 
> I added, and it works.
>
> -Original Message-
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
> Sent: Wednesday, April 01, 2009 3:52 PM
> To: solr-user@lucene.apache.org; cra...@ceiindia.com
> Subject: Re: Runtime exception when adding documents using solrj
>
> Can u take a look at the Solr logs and see what is hapening?
>
> On Wed, Apr 1, 2009 at 3:19 PM, Radha C.  wrote:
>>
>> Thanks Paul,
>>
>> I changed the URL but I am getting another error - Bad request , Any help 
>> will be appriciated.
>>
>> Exception in thread "main" org.apache.solr.common.SolrException: Bad
>> Request Bad Request
>> request: http://localhost:8080/solr/update?wt=javabin
>>        at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
>> sHttpSolrServer.java:428)
>>        at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
>> sHttpSolrServer.java:245)
>>        at
>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateReque
>> st.java:243)
>>        at
>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>        at SolrIndexTest.main(SolrIndexTest.java:47)
>> Java Result: 1
>>
>>
>>
>>
>> -Original Message-
>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
>> Sent: Wednesday, April 01, 2009 2:26 PM
>> To: solr-user@lucene.apache.org; cra...@ceiindia.com
>> Subject: Re: Runtime exception when adding documents using solrj
>>
>> the url is wrong
>> try this
>> CommonsHttpSolrServer server =   new
>> CommonsHttpSolrServer("http://localhost:8080/solr/";);
>>
>> On Wed, Apr 1, 2009 at 2:04 PM, Radha C.  wrote:
>>>
>>> Can anyone please tell me , what is the issue with the below java code..
>>>
>>> -Original Message-
>>> From: Radha C. [mailto:cra...@ceiindia.com]
>>> Sent: Wednesday, April 01, 2009 12:28 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Runtime exception when adding documents using solrj
>>>
>>>
>>> I am using Solr 1.3 version
>>>
>>>  _
>>>
>>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
>>> Sent: Wednesday, April 01, 2009 12:16 PM
>>> To: solr-user@lucene.apache.org; cra...@ceiindia.com
>>> Subject: Re: Runtime exception when adding documents using solrj
>>>
>>>
>>> which version of Solr are you using?
>>>
>>>
>>> On Wed, Apr 1, 2009 at 12:01 PM, Radha C.  wrote:
>>>
>>>
>>> Hi All,
>>>
>>> I am trying to index documents by using solrj client. I have written
>>> a simple code below,
>>>
>>> {
>>>           CommonsHttpSolrServer server =   new
>>> CommonsHttpSolrServer("http://localhost:8080/solr/update";);
>>>           SolrInputDocument doc1=new SolrInputDocument();
>>>           doc1.addField( "id", "id1", 1.0f );
>>>           doc1.addField( "name", "doc1", 1.0f );
>>>           doc1.addField( "price", 10 );
>>>           SolrInputDocument doc2 = new SolrInputDocument();
>>>           doc2.addField( "id", "id2", 1.0f );
>>>           doc2.addField( "name", "doc2", 1.0f );
>>>           doc2.addField( "price", 20 );
>>>           Collection docs = new
>>> ArrayList();
>>>           docs.add( doc1 );
>>>           docs.add( doc2 );
>>>           server.add(docs);
>>>           server.commit();
>>> }
>>>
>>> But I am getting the below error, Can anyone tell me what is the
>>> wrong with the above code.
>>>
>>> Exception in thread "main" java.lang.RuntimeException: Invalid
>>> version or the data in not in 'javabin' format
>>>       at
>>> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:
>>> 9
>>> 8)
>>>       at
>>> org.apache.solr.client.solrj.impl.BinaryResponseParser.processRespons
>>> e
>>> (Binar
>>> yResponseParser.java:39)
>>>       at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commo
>>> n
>>> sHttpS
>>> olrServer.java:470)
>>>       at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commo
>>> n
>>> sHttpS
>>> olrServer.java:245)
>>>       at
>>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequ
>>> e
>>> st.jav
>>> a:243)
>>>       at
>>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>>       at SolrIndexTest.main(SolrIndexTest.java:46)
>>> Java Result: 1
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>

Re: Runtime exception when adding documents using solrj

2009-04-01 Thread vivek sar

Thanks Shalin.

I added that in the solrconfig.xml, but now I get this exception,

org.apache.solr.common.SolrException: Not Found
Not Found
request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2

I do have the "core0" under the solr.home. The core0 directory also
contains the conf and data directories. The solr.xml has following in
it,


  
 

 Am I missing anything else?

Thanks,
-vivek

On Wed, Apr 1, 2009 at 1:02 PM, Shalin Shekhar Mangar
 wrote:
> On Thu, Apr 2, 2009 at 1:13 AM, vivek sar  wrote:
>> Hi,
>>
>>  I'm trying to add the list of POJO objects (using annotations) using
>> solrj, but the "server.addBeans(...) " is throwing this exception,
>>
>> org.apache.solr.common.SolrException: Bad Request
>> Bad Request
>> request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2
>>
>> Note, I'm using multi-core. There is no other exception in the solr log.
>>
>
> Can you make sure all the cores' solrconfig.xml have the following line?
>
>  class="solr.BinaryUpdateRequestHandler" />
>
> The above is needed for binary update format to work. I don't think
> the multi core example solrconfig.xml in solr nightly builds contain
> this line.
>
>> Related question - I'm trying to upgrade the solrj from nightly build,
>> but I get some classnotfound exception
>> (java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory). What are
>> all the dependencies for Solrj1.4 (wiki has only up to 1.3
>> information).
>>
>
> I think you need slf4j-api-1.5.5.jar and slf4j-jdk14-1.5.5.jar. Both
> can be found in solr's nightly downloads in the lib directory.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

java.lang.ClassCastException: java.lang.Long using Solrj

2009-04-01 Thread vivek sar

Hi,

  I'm using solrj (released v 1.3) to add my POJO objects
(server.addbeans(...)), but I'm getting this exception,

java.lang.ClassCastException: java.lang.Long
at 
org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)

I don't have any "Long" member variable in my java object - so not
sure where is this coming from. I've checked the schema.xml to make
sure the data types are ok. I'm adding 15K objects at a time - I'm
assuming that should be ok.

Any ideas?

Thanks,
-vivek

Re: Runtime exception when adding documents using solrj

2009-04-02 Thread vivek sar

Hello Shalin,

  Looks like I was using old version of solrconfig.xml (from Solr
1.2). After I updated to the latest solrconfig.xml (from 1.4) it seems
to be working fine.

Another question I got is how would I search on multi-cores,

 1) If I want to search for a word in two different cores?
 2) If I want to search for a word in all the cores.
 3) How would I search on multiple cores on multiple machines?

Single core I'm able to search like,

http://localhost:8080/solr/20090402/select?q=*:*

Thanks,
-vivek

--
Just in case if this might be helpful to others who might be trying to
use Solr multicore.

  Here is what I tried.

1)  Created this directory structure -

   multicore/core0 (put in the conf directory - with schema.xml
and solrconfig.xml - under core0)
   multicore/core1

 Make multicore as the solr.home and put the solr.xml under there

2) Added couple of cores in the solr.xml,

 

  
  



Here core1 is using instancedir of core0 (using same schema.xml and
solrconfig.xml).

2) Started Solr

3) Data/index directory is created under both cores

4)  Tried following URLs,

a) http://localhost:8080/solr/admin/cores
 - admin interface for both cores
b) http://localhost:8080/solr/core0/admin/
 - I see the single core admin page
c) http://localhost:8080/solr/admin/cores?action=STATUS
- same as a
d) http://localhost:8080/solr/admin/cores?action=STATUS&core=core0
   - same as b
e) http://localhost:8080/solr/core0/select?q=*:*
   - shows result xml

5) I then created the core dynamically using CREATE service (this
requires Solr 1.4),

   
http://localhost:8080/solr/admin/cores?action=CREATE&name=20090402&instanceDir=/Users/opal/temp/chat/solr/multicore/core0&dataDir=/Users/opal/temp/chat/solr/multicore/20090402/data

  - this dynamically updated the solr.xml and created a directory
structure (20090402/data) on the file system.

6) The use solrj to add beans to the recently created core







On Wed, Apr 1, 2009 at 8:26 PM, Shalin Shekhar Mangar
 wrote:
> On Thu, Apr 2, 2009 at 2:34 AM, vivek sar  wrote:
>> Thanks Shalin.
>>
>> I added that in the solrconfig.xml, but now I get this exception,
>>
>> org.apache.solr.common.SolrException: Not Found
>> Not Found
>> request: http://localhost:8080/solr/core0/update?wt=javabin&version=2.2
>>
>> I do have the "core0" under the solr.home. The core0 directory also
>> contains the conf and data directories. The solr.xml has following in
>> it,
>>
>> 
>>      
>>  
>>
>>
>
> Are you able to see the Solr admin dashboard at
> http://localhost:8080/solr/core0/admin/ ? Are there any exceptions in
> Solr log?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: java.lang.ClassCastException: java.lang.Long using Solrj

2009-04-02 Thread vivek sar

Thanks Noble. That helped - turned out there was field name mismatch in my bean.

2009/4/1 Noble Paul നോബിള്‍  नोब्ळ् :
> The classcast exception is misleading. It happens because the response
> itself was some error response.
>
> debug it by setting the XmlResponseParser
> http://wiki.apache.org/solr/Solrj#head-12c26b2d7806432c88b26cf66e236e9bd6e91849
>
> On Thu, Apr 2, 2009 at 4:21 AM, vivek sar  wrote:
>> Hi,
>>
>>  I'm using solrj (released v 1.3) to add my POJO objects
>> (server.addbeans(...)), but I'm getting this exception,
>>
>> java.lang.ClassCastException: java.lang.Long
>>        at 
>> org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
>>        at 
>> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
>>        at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
>>        at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
>>        at 
>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
>>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>        at 
>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>>
>> I don't have any "Long" member variable in my java object - so not
>> sure where is this coming from. I've checked the schema.xml to make
>> sure the data types are ok. I'm adding 15K objects at a time - I'm
>> assuming that should be ok.
>>
>> Any ideas?
>>
>> Thanks,
>> -vivek
>>
>
>
>
> --
> --Noble Paul
>

Searching on mulit-core Solr

2009-04-03 Thread vivek sar

Hi,

 I've a multi-core system (one core per day), so there would be around
30 cores in a month on a box running one Solr instance. We have two
boxes running the Solr instance and input data is feeded to them in
round-robin fashion. Each box can have up to 30 cores in a month. Here
are questions,

 1) How would I search for a term in multiple cores on same box?

  Single core I'm able to search like,
   http://localhost:8080/solr/20090402/select?q=*:*

2) How would I search for a term in multiple cores on both boxes at
the same time?

3) Is it possible to have two Solr instances on one box with one doing
the indexing and other perform only searches on that index? The idea
is have two JVMs with each doing its own task - I'm not sure whether
the indexer process needs to know about searcher process - like do
they need to have the same solr.xml (for multicore etc). We don't want
to replicate the indexes also (we got very light search traffic, but
very high indexing traffic) so they need to use the same index.


Thanks,
-vivek

Solr 1.4 (nightly build) seem hung under load

2009-04-03 Thread vivek sar

Hi,

  I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my
application with Solr. My app uses Solrj to write to remote Solr (on
same box, but different JVM). The stress test sends over 2 million
records (1 record = 500 bytes, with each record having 10 fields)
within 5 minutes. All was working fine (with 2 million records
processed - 2G index size) and all the sudden Solr stopped responding
- I call server.addBeans(...) passing 15K object and don't get any
response for over an hour (usually it returns in 5 sec).

I've 3 threads writing to the same index at the same time - not sure
if that could cause any problem. I was told by Otis that it should be
ok to have multiple threads write to same index - so I'm assuming it's
ok, though from thread dump I do see couple of "update" threads
waiting on ReadWriteLock and another thread (pool-6-thread-1) have a
lock on SolrWriter.

Attached is the thread dump of the Tomcat process where Solr is
running.  Any ideas?

Thanks,
-vivek

Re: Solr 1.4 (nightly build) seem hung under load

2009-04-03 Thread vivek sar

Just an update on this issue, the Solr did come back after 80 min - so
not sure where was it stuck. I do use RAMBuffer of 64MB and have heap
size of 6G.

There is no error is Solr log and I'd it running under WARNING level
so missed the INFO if there was any during that period. I'm also not
running any "optimize" command. What could cause Solr to hang for 80
min?

Thanks,
-vivek

On Fri, Apr 3, 2009 at 1:55 PM, vivek sar  wrote:
> Hi,
>
>  I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my
> application with Solr. My app uses Solrj to write to remote Solr (on
> same box, but different JVM). The stress test sends over 2 million
> records (1 record = 500 bytes, with each record having 10 fields)
> within 5 minutes. All was working fine (with 2 million records
> processed - 2G index size) and all the sudden Solr stopped responding
> - I call server.addBeans(...) passing 15K object and don't get any
> response for over an hour (usually it returns in 5 sec).
>
> I've 3 threads writing to the same index at the same time - not sure
> if that could cause any problem. I was told by Otis that it should be
> ok to have multiple threads write to same index - so I'm assuming it's
> ok, though from thread dump I do see couple of "update" threads
> waiting on ReadWriteLock and another thread (pool-6-thread-1) have a
> lock on SolrWriter.
>
> Attached is the thread dump of the Tomcat process where Solr is
> running.  Any ideas?
>
> Thanks,
> -vivek
>

Re: Solr 1.4 (nightly build) seem hung under load

2009-04-03 Thread vivek sar

Hi,

more update. It happened again and this time I'd INFO logged in the Solr log,

INFO: {add=[330274716, 330274717, 330274718, 330274719, 330274720,
330274721, 330274722, 330274723, ...(14992 more)]} 0 6041
Apr 3, 2009 10:38:01 PM org.apache.solr.core.SolrCore execute
INFO: [20090403] webapp=/solr path=/update params={wt=javabin}
status=0 QTime=6041
Apr 3, 2009 10:38:11 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)

It's still hung at commit even after 30 min.  So, looks like it takes
a long time to commit the records. I'm committing the records myself,
but have the auto-commit turned on in the solrconfig.xml,

 

   
   
   1000
   100
   

 

In 15 min time period I'm getting approximately 6 million
documents/records. Earlier I've read in the mailing list that we
shouldn't be committing very often and now it seems not committing on
time makes the commit process take forever. I want the records
searchable every 30 min basically. So, 30 min old data is ok for
searching, but indexing shouldn't slow down.

1) So, what's the good commit strategy?
2) How often (on how many records) should I do this?
3) Should I do it programmatically or can I have it in the solrconfig.xml?

Thanks,
-vivek

On Fri, Apr 3, 2009 at 2:27 PM, vivek sar  wrote:
> Just an update on this issue, the Solr did come back after 80 min - so
> not sure where was it stuck. I do use RAMBuffer of 64MB and have heap
> size of 6G.
>
> There is no error is Solr log and I'd it running under WARNING level
> so missed the INFO if there was any during that period. I'm also not
> running any "optimize" command. What could cause Solr to hang for 80
> min?
>
> Thanks,
> -vivek
>
> On Fri, Apr 3, 2009 at 1:55 PM, vivek sar  wrote:
>> Hi,
>>
>>  I'm using Solr 1.4 (nightly build - 03/29/09). I'm stress testing my
>> application with Solr. My app uses Solrj to write to remote Solr (on
>> same box, but different JVM). The stress test sends over 2 million
>> records (1 record = 500 bytes, with each record having 10 fields)
>> within 5 minutes. All was working fine (with 2 million records
>> processed - 2G index size) and all the sudden Solr stopped responding
>> - I call server.addBeans(...) passing 15K object and don't get any
>> response for over an hour (usually it returns in 5 sec).
>>
>> I've 3 threads writing to the same index at the same time - not sure
>> if that could cause any problem. I was told by Otis that it should be
>> ok to have multiple threads write to same index - so I'm assuming it's
>> ok, though from thread dump I do see couple of "update" threads
>> waiting on ReadWriteLock and another thread (pool-6-thread-1) have a
>> lock on SolrWriter.
>>
>> Attached is the thread dump of the Tomcat process where Solr is
>> running.  Any ideas?
>>
>> Thanks,
>> -vivek
>>
>

httpclient.ProtocolException using Solrj

2009-04-04 Thread vivek sar

Hi,

 I'm sending 15K records at once using Solrj (server.addBeans(...))
and have two threads writing to same index. One thread goes fine, but
the second thread always fails with,


org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity
enclosing request can not be repeated.
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
at 
com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44)
at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77)
at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.
at 
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
at 
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:417)

Does anyone know what could be the problem?

Thanks,
-vivek

Re: Searching on mulit-core Solr

2009-04-06 Thread vivek sar

Hi,

  Any help on this. I've looked at DistributedSearch on Wiki, but that
doesn't seem to be working for me on multi-core and multiple Solr
instances on the same box.

Scenario,

1) Two boxes (localhost, 10.4.x.x)
2) Two Solr instances on each box (8080 and 8085 ports)
3) Two cores on each instance (core0, core1)

I'm not sure how to construct my search on the above setup if I need
to search across all the cores on all the boxes. Here is what I'm
trying,

http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan

I get 404 error. Is this the right URL construction for my setup? How
else can I do this?

Thanks,
-vivek

On Fri, Apr 3, 2009 at 1:02 PM, vivek sar  wrote:
> Hi,
>
>  I've a multi-core system (one core per day), so there would be around
> 30 cores in a month on a box running one Solr instance. We have two
> boxes running the Solr instance and input data is feeded to them in
> round-robin fashion. Each box can have up to 30 cores in a month. Here
> are questions,
>
>  1) How would I search for a term in multiple cores on same box?
>
>  Single core I'm able to search like,
>   http://localhost:8080/solr/20090402/select?q=*:*
>
> 2) How would I search for a term in multiple cores on both boxes at
> the same time?
>
> 3) Is it possible to have two Solr instances on one box with one doing
> the indexing and other perform only searches on that index? The idea
> is have two JVMs with each doing its own task - I'm not sure whether
> the indexer process needs to know about searcher process - like do
> they need to have the same solr.xml (for multicore etc). We don't want
> to replicate the indexes also (we got very light search traffic, but
> very high indexing traffic) so they need to use the same index.
>
>
> Thanks,
> -vivek
>

Re: httpclient.ProtocolException using Solrj

2009-04-08 Thread vivek sar

Hi,

 Any ideas on this issue? I ran into this again - once it starts
happening it keeps happening. One of the thread keeps failing. Here
are my SolrServer settings,

int socketTO = 0;
int connectionTO = 100;
int maxConnectionPerHost = 10;
int maxTotalConnection = 50;
boolean followRedirects = false;
boolean allowCompression = true;
int maxRetries = 1;

Note, I'm using two threads to simultaneously write to the same index.

org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity
enclosing request can not be repeated.
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)

Thanks,
-vivek

On Sat, Apr 4, 2009 at 1:07 AM, vivek sar  wrote:
> Hi,
>
>  I'm sending 15K records at once using Solrj (server.addBeans(...))
> and have two threads writing to same index. One thread goes fine, but
> the second thread always fails with,
>
>
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
> enclosing request can not be repeated.
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>        at 
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>        at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>        at 
> com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44)
>        at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77)
>        at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>        at java.lang.Thread.run(Thread.java:637)
> Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
> entity enclosing request can not be repeated.
>        at 
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>        at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>        at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>        at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:417)
>
> Does anyone know what could be the problem?
>
> Thanks,
> -vivek
>

Re: Searching on mulit-core Solr

2009-04-08 Thread vivek sar

m to be working for me on multi-core and multiple Solr
>>instances on the same box.
>>
>>Scenario,
>>
>>1) Two boxes (localhost, 10.4.x.x)
>>2) Two Solr instances on each box (8080 and 8085 ports)
>>3) Two cores on each instance (core0, core1)
>>
>>I'm not sure how to construct my search on the above setup if I need
>>to search across all the cores on all the boxes. Here is what I'm
>>trying,
>>
>>http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan
>>
>>I get 404 error. Is this the right URL construction for my setup? How
>>else can I do this?
>>
>>Thanks,
>>-vivek
>>
>>On Fri, Apr 3, 2009 at 1:02 PM, vivek sar  wrote:
>>> Hi,
>>>
>>>  I've a multi-core system (one core per day), so there would be around
>>> 30 cores in a month on a box running one Solr instance. We have two
>>> boxes running the Solr instance and input data is feeded to them in
>>> round-robin fashion. Each box can have up to 30 cores in a month. Here
>>> are questions,
>>>
>>>  1) How would I search for a term in multiple cores on same box?
>>>
>>>  Single core I'm able to search like,
>>>   http://localhost:8080/solr/20090402/select?q=*:*
>>>
>>> 2) How would I search for a term in multiple cores on both boxes at
>>> the same time?
>>>
>>> 3) Is it possible to have two Solr instances on one box with one doing
>>> the indexing and other perform only searches on that index? The idea
>>> is have two JVMs with each doing its own task - I'm not sure whether
>>> the indexer process needs to know about searcher process - like do
>>> they need to have the same solr.xml (for multicore etc). We don't want
>>> to replicate the indexes also (we got very light search traffic, but
>>> very high indexing traffic) so they need to use the same index.
>>>
>>>
>>> Thanks,
>>> -vivek
>>>
>
> --
>
> ===
> Fergus McMenemie               Email:fer...@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===
>

Re: httpclient.ProtocolException using Solrj

2009-04-08 Thread vivek sar

single thread everything works fine. Two threads are fine too for a
while and all the sudden problem starts happening.

I tried indexing using REST services as well (instead of Solrj), but
with that too I get following error after a while,

2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
indexData()-> Failed to index
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
at 
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
at 
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


Note, I'm using "simple" lock type. I'd tried "single" type before
that once caused index corruption so I switched to "simple".

Thanks,
-vivek

2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
> do you see the same problem when you use a single thread?
>
> what is the version of SolrJ that you use?
>
>
>
> On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
>> Hi,
>>
>>  Any ideas on this issue? I ran into this again - once it starts
>> happening it keeps happening. One of the thread keeps failing. Here
>> are my SolrServer settings,
>>
>>        int socketTO = 0;
>>        int connectionTO = 100;
>>        int maxConnectionPerHost = 10;
>>        int maxTotalConnection = 50;
>>        boolean followRedirects = false;
>>        boolean allowCompression = true;
>>        int maxRetries = 1;
>>
>> Note, I'm using two threads to simultaneously write to the same index.
>>
>> org.apache.solr.client.solrj.SolrServerException:
>> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
>> enclosing request can not be repeated.
>>        at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
>>        at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>>        at 
>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>        at 
>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>>
>> Thanks,
>> -vivek
>>
>> On Sat, Apr 4, 2009 at 1:07 AM, vivek sar  wrote:
>>> Hi,
>>>
>>>  I'm sending 15K records at once using Solrj (server.addBeans(...))
>>> and have two threads writing to same index. One thread goes fine, but
>>> the second thread always fails with,
>>>
>>>
>>> org.apache.solr.client.solrj.SolrServerException:
>>> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
>>> enclosing request can not be repeated.
>>>        at 
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
>>>        at 
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>>>        at 
>>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>>>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>>        at 
>>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>>>        at 
>>> com.apple.afterchat.indexer.solr.handler.BeanIndexHandler.indexData(BeanIndexHandler.java:44)
>>>        at com.apple.afterchat.indexer.Indexer.indexData(Indexer.java:77)
>>>        at com.apple.afterchat.indexer.Indexer.run(Indexer.java:39)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor

Re: Searching on mulit-core Solr

2009-04-08 Thread vivek sar

Any help on this issue? Would distributed search on multi-core on same
Solr instance even work? Does it has to be different Solr instances
altogether (separate shards)?

I'm kind of stuck at this point right now. Keep getting one of the two
errors (when running distributed search - single searches work fine)
as mentioned in this thread earlier.

Thanks,
-vivek

On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
> Thanks Fergus. I'm still having problem with multicore search.
>
> I tried the following with two cores (they both share the same schema
> and solrconfig.xml) on the same box on same solr instance,
>
> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
> cores in admin interface
> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
> xml
> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
> gives me top 10 records
> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
> gives me top 10 records
> 5) 
> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>  - this FAILS. I've seen two problems with this.
>
>    a) When index are being committed I see,
>
> SEVERE: org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>        at java.lang.Thread.run(Thread.java:637)
>
>    b) Other times I see this,
>
> SEVERE: java.lang.NullPointerException
>        at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
>        at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Prot

Re: httpclient.ProtocolException using Solrj

2009-04-08 Thread vivek sar

Thanks Shalin and Paul.

I'm not using MultipartRequest. I do share the same SolrServer between
two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
simply using CommonsHttpSolrServer to create the SolrServer. I've also
tried StreamingUpdateSolrServer, which works much faster, but does
throws "connection reset" exception once in a while.

Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
anything on it on Wiki.

I was also thinking of using EmbeddedSolrServer - in what case would I
be able to use it? Does my application and the Solr web app need to
run into the same JVM for this to work? How would I use the
EmbeddedSolrServer?

Thanks,
-vivek


On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
 wrote:
> Vivek, do you share the same SolrServer instance between your two threads?
> If so, are you using the MultiThreadedHttpConnectionManager when creating
> the HttpClient instance?
>
> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>
>> single thread everything works fine. Two threads are fine too for a
>> while and all the sudden problem starts happening.
>>
>> I tried indexing using REST services as well (instead of Solrj), but
>> with that too I get following error after a while,
>>
>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
>> indexData()-> Failed to index
>> java.net.SocketException: Broken pipe
>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>>        at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>>        at
>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
>>        at
>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
>>         at
>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>>        at
>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>>        at
>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>>        at
>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>>        at
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>>        at
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>>
>>
>> Note, I'm using "simple" lock type. I'd tried "single" type before
>> that once caused index corruption so I switched to "simple".
>>
>> Thanks,
>> -vivek
>>
>> 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
>> > do you see the same problem when you use a single thread?
>> >
>> > what is the version of SolrJ that you use?
>> >
>> >
>> >
>> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
>> >> Hi,
>> >>
>> >>  Any ideas on this issue? I ran into this again - once it starts
>> >> happening it keeps happening. One of the thread keeps failing. Here
>> >> are my SolrServer settings,
>> >>
>> >>        int socketTO = 0;
>> >>        int connectionTO = 100;
>> >>        int maxConnectionPerHost = 10;
>> >>        int maxTotalConnection = 50;
>> >>        boolean followRedirects = false;
>> >>        boolean allowCompression = true;
>> >>        int maxRetries = 1;
>> >>
>> >> Note, I'm using two threads to simultaneously write to the same index.
>> >>
>> >> org.apache.solr.client.solrj.SolrServerException:
>> >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
>> >> enclosing request can not be repeated.
>> >>        at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
>> >>        at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>> >>        at
>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>> >>        at
>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>> >>        at
>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>> >>
>>

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar

Hi,

   I've gone through the mailing archive and have read contradicting
remarks on this issue. Can someone please clear this up as I'm not
able to run distributed search on multi-cores. Is there any document
on how can I search across multicore which share the same schema. Here
are the various comments I've read on this mailing list,

1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781
Don't think you can search against multiple cores "automatically" -
i.e. got to make multiple queries, one for each core and combine
results yourself. Yes, this will slow things down.   - Otis

2) 
http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173
The idea behind multicore is that you will use them if you have completely
different type of documents (basically multiple schemas). - Shalin

3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229
That should work, yes, though it may not be a wise thing to do
performance-wise, if the number of CPU cores that solr server has is
lower than the number of Solr cores. - Otis

My only motivation behind using multi-core is to keep the index size
in limit. All my cores are using the same schema. My index grow to
over 30G within a day and I need to keep up to a year of data.  I
couldn't find any other way of scaling using Solr. I've noticed once
the index grows above 10G the index process starts slowing down, the
commit takes much longer and optimize is hard to finish. So, I'm
trying to create a new core after every 10 million documents (equals
to 10G in my case). I don't want to start new Solr instance every 10G
- that won't scale for a year time. I'm going to use 3-4 servers to
hold all these cores.

Now if someone could please tell me if this is a wrong scaling
architecture I could re-think. I want fast indexing at the same time
fast enough search. If I've to search on each core separately and
merge myself the search performance is going to be awful.

Is Solr the right tool for managing billions of records (I can get up
to 100million records every day - with 1Kb per record - 100GB of index
a day)? Most of the field values are pretty distinct (like  10 million
email addresses) so the index size would be huge too.

I would think it's a common problem to scale huge size index keeping
both indexing and search time acceptable. I'm not sure if this can be
managed on just 4 servers - we don't have 100s of boxes for this
project. Any other tool that might be more appropriate for this kind
of case - like Katta or Lucene on Hadoop, or simply use Lucene using
Parallel Search and partition the indexes on size?

Thanks,
-vivek

On Wed, Apr 8, 2009 at 11:07 AM, vivek sar  wrote:
> Any help on this issue? Would distributed search on multi-core on same
> Solr instance even work? Does it has to be different Solr instances
> altogether (separate shards)?
>
> I'm kind of stuck at this point right now. Keep getting one of the two
> errors (when running distributed search - single searches work fine)
> as mentioned in this thread earlier.
>
> Thanks,
> -vivek
>
> On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
>> Thanks Fergus. I'm still having problem with multicore search.
>>
>> I tried the following with two cores (they both share the same schema
>> and solrconfig.xml) on the same box on same solr instance,
>>
>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>> cores in admin interface
>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
>> xml
>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>> gives me top 10 records
>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>> gives me top 10 records
>> 5) 
>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>  - this FAILS. I've seen two problems with this.
>>
>>    a) When index are being committed I see,
>>
>> SEVERE: org.apache.solr.common.SolrException:
>> org.apache.solr.client.solrj.SolrServerException:
>> java.net.SocketException: Connection reset
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar

Erik,

  Here is what I'd posted in this thread earlier,

I tried the following with two cores (they both share the same schema
and solrconfig.xml) on the same box on same solr instance,

1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
cores in admin interface
2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in xml
3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
gives me top 10 records
4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
gives me top 10 records
5) 
http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
 - this FAILS. I've seen two problems with this.



   a) This is the error most of the times,

SEVERE: java.lang.NullPointerException
   at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
   at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

b)  When index are being committed I see this during search,

SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

Any tips on how can I search on multicore on same solr instance?

Thanks,
-vivek

On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher  wrote:
>
> On Apr 9, 2009, at 3:00 AM, vivek sar wrote:
>>
>>  Can someone please clear this up as I'm not
>> able to run distributed search on multi-cores.
>
> What error or problem are you encountering when trying this?  How are you
> trying it?
>
>        Erik
>
>

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar

 Attached is the solr.xml - note, the schema and solrconfig are
located in the core0 and all other cores point to the same core0
instance for schema.

Searches on individual cores work fine so I'm using the solr.xml is
correct - I also get their status correctly. From the
"NullPointerException" it seems it fails at,

 for (int i=resultSize-1; i>=0; i--) {
ShardDoc shardDoc = (ShardDoc)queue.pop();
shardDoc.positionInResponse = i;
// Need the toString() for correlation with other lists that must
// be strings (like keys in highlighting, explain, etc)
resultIds.put(shardDoc.id.toString(), shardDoc);
  }

I've a unique field (required) in my documents so I'm not sure whether
that can be null - could doc itself be null - how? Same search on the
same cores individually works fine. Not sure if there is a way to
debug this.

I'm not sure on when would I get "Connection reset" exception - would
it be if indexing is happening at the same time at hight rate - would
that cause problems?

Thanks,
-vivek


On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie  wrote:
>>Any help on this issue? Would distributed search on multi-core on same
>>Solr instance even work? Does it has to be different Solr instances
>>altogether (separate shards)?
>
> As best I can tell this works fine for me. Multiple cores on the one
> machine. Very different schema and solrconfig.xml for each of the
> cores. Distributed searching using shards works fine. But I am using
> the trunk version.
>
> Perhaps you should post your solr.xml file.
>
>>I'm kind of stuck at this point right now. Keep getting one of the two
>>errors (when running distributed search - single searches work fine)
>>as mentioned in this thread earlier.
>>
>>Thanks,
>>-vivek
>>
>>On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
>>> Thanks Fergus. I'm still having problem with multicore search.
>>>
>>> I tried the following with two cores (they both share the same schema
>>> and solrconfig.xml) on the same box on same solr instance,
>>>
>>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>>> cores in admin interface
>>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores 
>>> in xml
>>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>>> gives me top 10 records
>>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>>> gives me top 10 records
>>> 5) 
>>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>>  - this FAILS. I've seen two problems with this.
>>>
>>>    a) When index are being committed I see,
>>>
>>> SEVERE: org.apache.solr.common.SolrException:
>>> org.apache.solr.client.solrj.SolrServerException:
>>> java.net.SocketException: Connection reset
>>>        at 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>>        at 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>        at 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>        at 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>        at 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>        at 
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>        at 
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>        at 
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>>        at 
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>>        at 
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar

I'm inserting 10K in a batch (using addBeans method). I read somewhere
in the wiki that it's better to use the same instance of SolrServer
for better performance. Would MultiThreadedConnectionManager help? How
do I use it?

I also wanted to know how can use EmbeddedSolrServer - does my app
needs to be running in the same jvm with Solr webapp?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
> how many documents are you inserting ?
> may be you can create multiple instances of CommonshttpSolrServer and
> upload in parallel
>
>
> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
>> Thanks Shalin and Paul.
>>
>> I'm not using MultipartRequest. I do share the same SolrServer between
>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
>> simply using CommonsHttpSolrServer to create the SolrServer. I've also
>> tried StreamingUpdateSolrServer, which works much faster, but does
>> throws "connection reset" exception once in a while.
>>
>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
>> anything on it on Wiki.
>>
>> I was also thinking of using EmbeddedSolrServer - in what case would I
>> be able to use it? Does my application and the Solr web app need to
>> run into the same JVM for this to work? How would I use the
>> EmbeddedSolrServer?
>>
>> Thanks,
>> -vivek
>>
>>
>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
>>  wrote:
>>> Vivek, do you share the same SolrServer instance between your two threads?
>>> If so, are you using the MultiThreadedHttpConnectionManager when creating
>>> the HttpClient instance?
>>>
>>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>>>
>>>> single thread everything works fine. Two threads are fine too for a
>>>> while and all the sudden problem starts happening.
>>>>
>>>> I tried indexing using REST services as well (instead of Solrj), but
>>>> with that too I get following error after a while,
>>>>
>>>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
>>>> indexData()-> Failed to index
>>>> java.net.SocketException: Broken pipe
>>>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>>        at
>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>>>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>>>>        at
>>>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
>>>>        at
>>>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
>>>>         at
>>>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>>>>        at
>>>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>>>>        at
>>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>>>>        at
>>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>>>>        at
>>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>>>>        at
>>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>>>>
>>>>
>>>> Note, I'm using "simple" lock type. I'd tried "single" type before
>>>> that once caused index corruption so I switched to "simple".
>>>>
>>>> Thanks,
>>>> -vivek
>>>>
>>>> 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
>>>> > do you see the same problem when you use a single thread?
>>>> >
>>>> > what is the version of SolrJ that you use?
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
>>>> >> Hi,
>>>> >>
>>>> >>  Any ideas on this issue? I ran into this again - once it starts
>>>> >> happening it keeps happening. One of the thread keeps failing. Here
>>>> >> are my SolrServer settings,
>>>> >>
>>>> >>

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar

Here is what I'm doing,

SolrServer server = new StreamingUpdateSolrServer(url, 1000,5);

server.addBeans(dataList);  //where dataList is List with 10K elements

I run two threads each using the same server object and then each call
server.addBeans(...).

I'm able to get 50K/sec inserted using that, but the commit after that
(after 100k records) takes 70sec - which messes up the avg time.

There are two problems here,

1) Once in a while I get "connection reset" error,

Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)

Note: if I use CommonsHttpSolrServer I get the buffer error.

2) The commit takes way too long for every 100k  (I may commit more
often if this can not be improved)

I'm trying to fix this error problem which happens only if I run two
threads both calling addBeans (10k at a time). One thread work fine.
I'm not sure how can I use the MultiThreadedConnectionManager to
create StreamingUpdateSolrServer and if they would help?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
> using a single request is the fatest
>
> http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65
>
> I could index at the rate of 10,000 docs/sec using this and 
> BinaryRequestWriter
>
> On Thu, Apr 9, 2009 at 10:36 PM, vivek sar  wrote:
>> I'm inserting 10K in a batch (using addBeans method). I read somewhere
>> in the wiki that it's better to use the same instance of SolrServer
>> for better performance. Would MultiThreadedConnectionManager help? How
>> do I use it?
>>
>> I also wanted to know how can use EmbeddedSolrServer - does my app
>> needs to be running in the same jvm with Solr webapp?
>>
>> Thanks,
>> -vivek
>>
>> 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
>>> how many documents are you inserting ?
>>> may be you can create multiple instances of CommonshttpSolrServer and
>>> upload in parallel
>>>
>>>
>>> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
>>>> Thanks Shalin and Paul.
>>>>
>>>> I'm not using MultipartRequest. I do share the same SolrServer between
>>>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
>>>> simply using CommonsHttpSolrServer to create the SolrServer. I've also
>>>> tried StreamingUpdateSolrServer, which works much faster, but does
>>>> throws "connection reset" exception once in a while.
>>>>
>>>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
>>>> anything on it on Wiki.
>>>>
>>>> I was also thinking of using EmbeddedSolrServer - in what case would I
>>>> be able to use it? Does my application and the Solr web app need to
>>>> run into the same JVM for this to work? How would I use the
>>>> EmbeddedSolrServer?
>>>>
>>>> Thanks,
>>>> -vivek
>>>>
>>>>
>>>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
>>>>  wrote:
>>>>> Vivek, do you share the same SolrServer instance between your two threads?
>>>>> If so, are you using the MultiThreadedHttpConnectionManager when creating
>>>>> the HttpClient instance?
>>>>>
>>>>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>>>>>
>>>>>> single thread everything works fine. Two threads are fine too for a
>>>>>> while and all the sudden problem starts happening.
>>>>>>
>>>>>> I tried indexing using REST services as well (instead of Solrj), but
>>>>>> with that too I get following error after a while,
>>>>>>
>>>>>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
>>>>>> indexData()-> Failed to index
>>>>>> java.net.SocketException: Broken pipe
>>>>>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>>>>        at
>>>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>>>>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>>>>        at 
>>>>>> java.io.Buffe

Question on Solr Distributed Search

2009-04-09 Thread vivek sar

Hi,

  I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result

now if I add "shards" parameter to it,

  
http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa
 - this fails with

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
..
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
..
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

Attached is my solrconfig.xml. Do I need a special RequestHandler for
sharding? I haven't been able to make any distributed search
successfully. Any help is appreciated.

Note: I'm indexing using Solrj - not sure if that makes any difference
to the search part.

Thanks,
-vivek





  
  

  
   
true
100

64
2147483647
1
1000
1
single
  

  

true
100



64
2147483647
1


true
single
  

  
  




 





  


  

1024





   


  



false




   

   
10

















false

  


 
  

 
   explicit
   
 
  

 
  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  
  
  

  
  


  
  
  
  
  
  

  
  
5
   

   
  
solr

Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar

I think the reason behind the "connection reset" is. Looking at the
code it points to QueryComponent.mergeIds()

resultIds.put(shardDoc.id.toString(), shardDoc);

looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not stored
(only indexed) - does it has to be stored for distributed search?

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)

On Thu, Apr 9, 2009 at 5:01 PM, vivek sar  wrote:
> Hi,
>
>  I've another thread on multi-core distributed search, but just
> wanted to put a simple question here on distributed search to get some
> response. I've a search query,
>
>   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa     -
> returns with 10 result
>
> now if I add "shards" parameter to it,
>
>  http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa
>  - this fails with
>
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset
> org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
> at
> ..
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>        at java.lang.Thread.run(Thread.java:637)
> Caused by: org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>        at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
> ..
> Caused by: java.net.SocketException: Connection reset
>        at java.net.SocketInputStream.read(SocketInputStream.java:168)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at 
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
>        at 
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>        at 
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
>        at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
>
> Attached is my solrconfig.xml. Do I need a special RequestHandler for
> sharding? I haven't been able to make any distributed search
> successfully. Any help is appreciated.
>
> Note: I'm indexing using Solrj - not sure if that makes any difference
> to the search part.
>
> Thanks,
> -vivek
>

Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar

Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using "shards" parameter I get the
connection reset exception. Can anyone provide some tip on where can I
look for this problem?

Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
... 1 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)

On Thu, Apr 9, 2009 at 6:51 PM, vivek sar  wrote:
> I think the reason behind the "connection reset" is. Looking at the
> code it points to QueryComponent.mergeIds()
>
> resultIds.put(shardDoc.id.toString(), shardDoc);
>
> looks like the doc unique id is returning null. I'm not sure how is it
> possible as its a required field. Right my unique id is not stored
> (only indexed) - does it has to be stored for distributed search?
>
> HTTP Status 500 - null java.lang.NullPointerException at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
> at 
> org.apache.solr.handler.compon

Re: Question on Solr Distributed Search

2009-04-10 Thread vivek sar

yes - it's all new indexes. I can search them individually, but adding
"shards" throws "Connection Reset" error. Is there any way I can debug
this or any other pointers?

-vivek

On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar
 wrote:
> On Fri, Apr 10, 2009 at 7:50 AM, vivek sar  wrote:
>
>> Just an update. I changed the schema to store the unique id field, but
>> I still get the connection reset exception. I did notice that if there
>> is no data in the core then it returns the 0 result (no exception),
>> but if there is data and you search using "shards" parameter I get the
>> connection reset exception. Can anyone provide some tip on where can I
>> look for this problem?
>>
>>
> Did you re-index after changing the field to stored?
> --
> Regards,
> Shalin Shekhar Mangar.
>

Question on StreamingUpdateSolrServer

2009-04-10 Thread vivek sar

Hi,

 I was using CommonsHttpSolrServer for indexing, but having two
threads writing (10K batches) at the same time was throwing,

  "ProtocolException: Unbuffered entity enclosing request can not be repeated. "

I switched to StreamingUpdateSolrServer (using addBeans) and I don't
see the problem anymore. The speed is very fast - getting around
25k/sec (single thread), but I'm facing another problem. When the
indexer using StreamingUpdateSolrServer is running I'm not able to
send any url request from browser to Solr web app. I just get blank
page. I can't even get to the admin interface. I'm also not able to
shutdown the Tomcat running the Solr webapp when the Indexer is
running. I've to first stop the Indexer app and then stop the Tomcat.
I don't have this problem when using CommonsHttpSolrServer.

Here is how I'm creating it,

server = new StreamingUpdateSolrServer(url, 1000,3);

I simply call server.addBeans(...) on it. Is there anything else I
need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
become unresponsive  when Indexer using StreamingUpdateSolrServer is
running (though, indexing happens fine)?

Thanks,
-vivek

Re: Question on StreamingUpdateSolrServer

2009-04-10 Thread vivek sar

I also noticed that the Solr app has over 6000 file handles open -

"lsof | grep solr | wc -l"   - shows 6455

I've 10 cores (using multi-core) managed by the same Solr instance. As
soon as start up the Tomcat the open file count goes up to 6400.  Few
questions,

1) Why is Solr holding on to all the segments from all the cores - is
it because of auto-warmer?
2) How can I reduce the open file count?
3) Is there a way to stop the auto-warmer?
4) Could this be related to "Tomcat returning blank page for every request"?

Any ideas?

Thanks,
-vivek

On Fri, Apr 10, 2009 at 1:48 PM, vivek sar  wrote:
> Hi,
>
>  I was using CommonsHttpSolrServer for indexing, but having two
> threads writing (10K batches) at the same time was throwing,
>
>  "ProtocolException: Unbuffered entity enclosing request can not be repeated. 
> "
>
> I switched to StreamingUpdateSolrServer (using addBeans) and I don't
> see the problem anymore. The speed is very fast - getting around
> 25k/sec (single thread), but I'm facing another problem. When the
> indexer using StreamingUpdateSolrServer is running I'm not able to
> send any url request from browser to Solr web app. I just get blank
> page. I can't even get to the admin interface. I'm also not able to
> shutdown the Tomcat running the Solr webapp when the Indexer is
> running. I've to first stop the Indexer app and then stop the Tomcat.
> I don't have this problem when using CommonsHttpSolrServer.
>
> Here is how I'm creating it,
>
> server = new StreamingUpdateSolrServer(url, 1000,3);
>
> I simply call server.addBeans(...) on it. Is there anything else I
> need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
> become unresponsive  when Indexer using StreamingUpdateSolrServer is
> running (though, indexing happens fine)?
>
> Thanks,
> -vivek
>

Re: Question on StreamingUpdateSolrServer

2009-04-11 Thread vivek sar

Thanks Shalin.

The problem is I don't see any error message in the catalina.out. I
don't even see the request coming in - I simply get blank page on
browser. If I keep trying the request goes through and I get respond
from Solr, but then it become unresponsive again or sometimes throws
"connection reset" error. I'm not sure why would it work sometimes and
not the other times for the same query. As soon as I stop the Indexer
process things start working fine. Any way I can debug this problem?

-vivek

On Fri, Apr 10, 2009 at 11:05 PM, Shalin Shekhar Mangar
 wrote:
> On Sat, Apr 11, 2009 at 3:29 AM, vivek sar  wrote:
>
>> I also noticed that the Solr app has over 6000 file handles open -
>>
>>    "lsof | grep solr | wc -l"   - shows 6455
>>
>> I've 10 cores (using multi-core) managed by the same Solr instance. As
>> soon as start up the Tomcat the open file count goes up to 6400.  Few
>> questions,
>>
>> 1) Why is Solr holding on to all the segments from all the cores - is
>> it because of auto-warmer?
>
>
> You have 10 cores, so Solr opens 10 indexes, each of which contains multiple
> files. That is one reason. Apart from that, Tomcat will keep some file
> handles for incoming connections.
>
>
>>
>> 2) How can I reduce the open file count?
>
>
> Are they causing a problem? Tomcat will log messages when it cannot accept
> incoming connections if it runs out of available file handles. But if you
> experiencing issues, you can increase the file handle limit or you can set
> useCompoundFile=true in solrconfig.xml.
>
>
>>
>> 3) Is there a way to stop the auto-warmer?
>> 4) Could this be related to "Tomcat returning blank page for every
>> request"?
>>
>
> It could be. Check the Tomcat and Solr logs.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Question on StreamingUpdateSolrServer

2009-04-12 Thread vivek sar

Thanks Shalin.

I noticed couple more things. As I index around 100 million records a
day, my Indexer is running pretty much at all times throughout the
day. Whenever I run a search query I usually get "connection reset"
when the commit is happening and get "blank page" when the
auto-warming of searchers is happening. Here are my questions,

1) Is this coincidence or a known issue? Can't we search while commit
or auto-warming is happening?
2) How do I stop auto-warming? My search traffic is very low so I'm
trying to turn off auto-warming after commit has happened - is there
anything in the solrconfig.xml to do that?
3) What would be the best strategy for searching in my scenario where
commits may be happening all the time (I commit every 50K records - so
every 30-60 sec there is a commit happening followed by auto-warming
that takes 40 sec)?

Search frequency is pretty low for us, but we want to make sure that
whenever it happens it is fast enough and returns result (instead of
exception or a blank screen).

Thanks for all the help.

-vivek

On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar
 wrote:
> On Sun, Apr 12, 2009 at 2:15 AM, vivek sar  wrote:
>
>>
>> The problem is I don't see any error message in the catalina.out. I
>> don't even see the request coming in - I simply get blank page on
>> browser. If I keep trying the request goes through and I get respond
>> from Solr, but then it become unresponsive again or sometimes throws
>> "connection reset" error. I'm not sure why would it work sometimes and
>> not the other times for the same query. As soon as I stop the Indexer
>> process things start working fine. Any way I can debug this problem?
>>
>
> I'm not sure. I've never seen this issue myself.
>
> Could you try using the bundled jetty instead of Tomcat or on a different
> box just to make sure this is not an environment specific issue?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar

I index in 10K batches and commit after 5 index cyles (after 50K). Is
there any limitation that I can't search during commit or
auto-warming? I got 8 CPU cores and only 2 were showing busy (using
top) - so it's unlikely that the CPU was pegged.

2009/4/12 Noble Paul നോബിള്‍  नोब्ळ् :
> If you use StreamingUpdateSolrServer it POSTs all the docs in a single
> request. 10 million docs may be a bit too much for a single request. I
> guess you should batch it in multiple requests of smaller chunks,
>
> It is likely that the CPU is really hot when the autowarming is hapening.
>
> getting a decent search perf w/o autowarming is not easy .
>
> autowarmCount is an attribute of a cache .see here
> http://wiki.apache.org/solr/SolrCaching
>
> On Mon, Apr 13, 2009 at 3:32 AM, vivek sar  wrote:
>> Thanks Shalin.
>>
>> I noticed couple more things. As I index around 100 million records a
>> day, my Indexer is running pretty much at all times throughout the
>> day. Whenever I run a search query I usually get "connection reset"
>> when the commit is happening and get "blank page" when the
>> auto-warming of searchers is happening. Here are my questions,
>>
>> 1) Is this coincidence or a known issue? Can't we search while commit
>> or auto-warming is happening?
>> 2) How do I stop auto-warming? My search traffic is very low so I'm
>> trying to turn off auto-warming after commit has happened - is there
>> anything in the solrconfig.xml to do that?
>> 3) What would be the best strategy for searching in my scenario where
>> commits may be happening all the time (I commit every 50K records - so
>> every 30-60 sec there is a commit happening followed by auto-warming
>> that takes 40 sec)?
>>
>> Search frequency is pretty low for us, but we want to make sure that
>> whenever it happens it is fast enough and returns result (instead of
>> exception or a blank screen).
>>
>> Thanks for all the help.
>>
>> -vivek
>>
>>
>>
>> On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar
>>  wrote:
>>> On Sun, Apr 12, 2009 at 2:15 AM, vivek sar  wrote:
>>>
>>>>
>>>> The problem is I don't see any error message in the catalina.out. I
>>>> don't even see the request coming in - I simply get blank page on
>>>> browser. If I keep trying the request goes through and I get respond
>>>> from Solr, but then it become unresponsive again or sometimes throws
>>>> "connection reset" error. I'm not sure why would it work sometimes and
>>>> not the other times for the same query. As soon as I stop the Indexer
>>>> process things start working fine. Any way I can debug this problem?
>>>>
>>>
>>> I'm not sure. I've never seen this issue myself.
>>>
>>> Could you try using the bundled jetty instead of Tomcat or on a different
>>> box just to make sure this is not an environment specific issue?
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>
>
>
> --
> --Noble Paul
>

Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar

Here is some more information about my setup,

Solr - v1.4 (nightly build 03/29/09)
Servlet Container - Tomcat 6.0.18
JVM - 1.6.0 (64 bit)
OS -  Mac OS X Server 10.5.6

Hardware Overview:

Processor Name: Quad-Core Intel Xeon
Processor Speed: 3 GHz
Number Of Processors: 2
Total Number Of Cores: 8
L2 Cache (per processor): 12 MB
Memory: 20 GB
Bus Speed: 1.6 GHz

JVM Parameters (for Solr):

export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
-Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360"

Other:

lsof|grep solr|wc -l
2493

ulimit -an
  open files  (-n) 9000

Tomcat


Total Solr cores on same instance - 65

useCompoundFile - true

The tests I ran,

While Indexer is running
1)  Go to "http://juum19.co.com:8080/solr";- returns blank page (no
error in the catalina.out)

2) Try "telnet juum19.co.com 8080"  - returns with "Connection closed
by foreign host"

Stop the Indexer Program (Tomcat is still running with Solr)

3)  Go to "http://juum19.co.com:8080/solr";  - works ok, shows the list
of all the Solr cores

4) Try telnet - able to Telnet fine

5)  Now comment out all the caches in solrconfig.xml. Try same tests,
but the Tomcat still doesn't response.

Is there a way to stop the auto-warmer. I commented out the caches in
the solrconfig.xml but still see the following log,

INFO: autowarming result for searc...@3aba3830 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

INFO: Closing searc...@175dc1e2
main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


6) Change the Indexer frequency so it runs every 2 min (instead of all
the time). I noticed once the commit is done, I'm able to run my
searches. During commit and auto-warming period I just get blank page.

 7) Changed from Solrj to XML update -  I still get the blank page
whenever update/commit is happening.

Apr 13, 2009 6:46:18 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948


So, looks like it's not just StreamingUpdateSolrServer, but whenever
the update/commit is happening I'm not able to search. I don't know if
it's related to using multi-core. In this test I was using only single
thread for update to a single core using only single Solr instance.

So, it's clearly related to index process (update, commit and
auto-warming). As soon as update/commit/auto-warming is completed I'm
able to run my queries again. Is there anything that could stop
searching while update process is in-progress - like any lock or
something?

Any other ideas?

Thanks,
-vivek

On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar Mangar
 wrote:
> On Mon, Apr 13, 2009 at 12:36 PM, vivek sar  wrote:
>
>> I index in 10K batches and commit after 5 index cyles (after 50K). Is
>> there any limitation that I can't search during commit or
>> auto-warming? I got 8 CPU cores and only 2 were showing busy (using
>> top) - so it's unlikely that the CPU was pegged.
>>
>>
> No, there is no such limitation. The old searcher will continue to serve
> search requests until the new one is warmed and registered.
>
> So, CPU does not seem to be an issue. Does this happen only when you use
> StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are
> you using? Which servlet container and version?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar

Some more update. As I mentioned earlier we are using multi-core Solr
(up to 65 cores in one Solr instance with each core 10G). This was
opening around 3000 file descriptors (lsof). I removed some cores and
after some trial and error I found at 25 cores system seems to work
fine (around 1400 file descriptors). Tomcat is responsive even when
the indexing is happening at Solr (for 25 cores). But, as soon as it
goes to 26 cores the Tomcat becomes unresponsive again. The puzzling
thing is if I stop indexing I can search on even 65 cores, but while
indexing is happening it seems to support only up to 25 cores.

1) Is there a limit on number of cores a Solr instance can handle?
2) Does Solr do anything to the existing cores while indexing? I'm
writing to only one core at a time.

We are struggling to find why Tomcat stops responding on high number
of cores while indexing is in-progress. Any help is very much
appreciated.

Thanks,
-vivek

On Mon, Apr 13, 2009 at 10:52 AM, vivek sar  wrote:
> Here is some more information about my setup,
>
> Solr - v1.4 (nightly build 03/29/09)
> Servlet Container - Tomcat 6.0.18
> JVM - 1.6.0 (64 bit)
> OS -  Mac OS X Server 10.5.6
>
> Hardware Overview:
>
> Processor Name: Quad-Core Intel Xeon
> Processor Speed: 3 GHz
> Number Of Processors: 2
> Total Number Of Cores: 8
> L2 Cache (per processor): 12 MB
> Memory: 20 GB
> Bus Speed: 1.6 GHz
>
> JVM Parameters (for Solr):
>
> export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
> -Dsun.rmi.dgc.client.gcInterval=360
> -Dsun.rmi.dgc.server.gcInterval=360"
>
> Other:
>
> lsof|grep solr|wc -l
>    2493
>
> ulimit -an
>  open files                      (-n) 9000
>
> Tomcat
>                   connectionTimeout="2"
>               maxThreads="100" />
>
> Total Solr cores on same instance - 65
>
> useCompoundFile - true
>
> The tests I ran,
>
> While Indexer is running
> 1)  Go to "http://juum19.co.com:8080/solr";    - returns blank page (no
> error in the catalina.out)
>
> 2) Try "telnet juum19.co.com 8080"  - returns with "Connection closed
> by foreign host"
>
> Stop the Indexer Program (Tomcat is still running with Solr)
>
> 3)  Go to "http://juum19.co.com:8080/solr";  - works ok, shows the list
> of all the Solr cores
>
> 4) Try telnet - able to Telnet fine
>
> 5)  Now comment out all the caches in solrconfig.xml. Try same tests,
> but the Tomcat still doesn't response.
>
> Is there a way to stop the auto-warmer. I commented out the caches in
> the solrconfig.xml but still see the following log,
>
> INFO: autowarming result for searc...@3aba3830 main
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
> INFO: Closing searc...@175dc1e2
> main    
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
>
> 6) Change the Indexer frequency so it runs every 2 min (instead of all
> the time). I noticed once the commit is done, I'm able to run my
> searches. During commit and auto-warming period I just get blank page.
>
>  7) Changed from Solrj to XML update -  I still get the blank page
> whenever update/commit is happening.
>
> Apr 13, 2009 6:46:18 PM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
> 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
> Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
> INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948
>
>
> So, looks like it's not just StreamingUpdateSolrServer, but whenever
> the update/commit is happening I'm not able to search. I don't know if
> it's related to using multi-core. In this test I was using only single
>

Re: Question on StreamingUpdateSolrServer

2009-04-14 Thread vivek sar

The machine's ulimit is set to 9000 and the OS has upper limit of
12000 on files. What would explain this? Has anyone tried Solr with 25
cores on the same Solr instance?

Thanks,
-vivek

2009/4/13 Noble Paul നോബിള്‍  नोब्ळ् :
> On Tue, Apr 14, 2009 at 7:14 AM, vivek sar  wrote:
>> Some more update. As I mentioned earlier we are using multi-core Solr
>> (up to 65 cores in one Solr instance with each core 10G). This was
>> opening around 3000 file descriptors (lsof). I removed some cores and
>> after some trial and error I found at 25 cores system seems to work
>> fine (around 1400 file descriptors). Tomcat is responsive even when
>> the indexing is happening at Solr (for 25 cores). But, as soon as it
>> goes to 26 cores the Tomcat becomes unresponsive again. The puzzling
>> thing is if I stop indexing I can search on even 65 cores, but while
>> indexing is happening it seems to support only up to 25 cores.
>>
>> 1) Is there a limit on number of cores a Solr instance can handle?
>> 2) Does Solr do anything to the existing cores while indexing? I'm
>> writing to only one core at a time.
> There is no hard limit (it is Integer.MAX_VALUE) . But inreality your
> mileage depends on your hardware and no:of file handles the OS can
> open
>>
>> We are struggling to find why Tomcat stops responding on high number
>> of cores while indexing is in-progress. Any help is very much
>> appreciated.
>>
>> Thanks,
>> -vivek
>>
>> On Mon, Apr 13, 2009 at 10:52 AM, vivek sar  wrote:
>>> Here is some more information about my setup,
>>>
>>> Solr - v1.4 (nightly build 03/29/09)
>>> Servlet Container - Tomcat 6.0.18
>>> JVM - 1.6.0 (64 bit)
>>> OS -  Mac OS X Server 10.5.6
>>>
>>> Hardware Overview:
>>>
>>> Processor Name: Quad-Core Intel Xeon
>>> Processor Speed: 3 GHz
>>> Number Of Processors: 2
>>> Total Number Of Cores: 8
>>> L2 Cache (per processor): 12 MB
>>> Memory: 20 GB
>>> Bus Speed: 1.6 GHz
>>>
>>> JVM Parameters (for Solr):
>>>
>>> export CATALINA_OPTS="-server -Xms6044m -Xmx6044m -DSOLR_APP
>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
>>> -Dsun.rmi.dgc.client.gcInterval=360
>>> -Dsun.rmi.dgc.server.gcInterval=360"
>>>
>>> Other:
>>>
>>> lsof|grep solr|wc -l
>>>    2493
>>>
>>> ulimit -an
>>>  open files                      (-n) 9000
>>>
>>> Tomcat
>>>    >>               connectionTimeout="2"
>>>               maxThreads="100" />
>>>
>>> Total Solr cores on same instance - 65
>>>
>>> useCompoundFile - true
>>>
>>> The tests I ran,
>>>
>>> While Indexer is running
>>> 1)  Go to "http://juum19.co.com:8080/solr";    - returns blank page (no
>>> error in the catalina.out)
>>>
>>> 2) Try "telnet juum19.co.com 8080"  - returns with "Connection closed
>>> by foreign host"
>>>
>>> Stop the Indexer Program (Tomcat is still running with Solr)
>>>
>>> 3)  Go to "http://juum19.co.com:8080/solr";  - works ok, shows the list
>>> of all the Solr cores
>>>
>>> 4) Try telnet - able to Telnet fine
>>>
>>> 5)  Now comment out all the caches in solrconfig.xml. Try same tests,
>>> but the Tomcat still doesn't response.
>>>
>>> Is there a way to stop the auto-warmer. I commented out the caches in
>>> the solrconfig.xml but still see the following log,
>>>
>>> INFO: autowarming result for searc...@3aba3830 main
>>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>>>
>>> INFO: Closing searc...@175dc1e2
>>> main    
>>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evi

Using CSV for indexing ... Remote Streaming disabled

2009-04-15 Thread vivek sar

Hi,

  I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki
(http://wiki.apache.org/solr/UpdateCSV). I've updated the
solrconfig.xml to have this lines,



...


   

When I try to upload the csv,

  curl 
'http://localhost:8080/solr/20090414_1/update/csv?commit=true&separator=%09&escape=%5c&stream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv'

I get following response,

HTTP Status 400 - Remote Streaming is
disabled.type Status
reportmessage Remote Streaming is
disabled.description The request sent by the
client was syntactically incorrect (Remote Streaming is
disabled.).Apache
Tomcat/6.0.18

Why is it complaining about the remote streaming if it's already
enabled? Is there anything I'm missing?

Thanks,
-vivek

Commits taking too long

2009-04-15 Thread vivek sar

Hi,

  I've index where I commit every 50K records (using Solrj). Usually
this commit takes 20sec to complete, but every now and then the commit
takes way too long - from 10 min to 30 min. I see more delays as the
index size continues to grow - once it gets over 5G I start seeing
long commit cycles more frequently. See this for ex.,

Apr 15, 2009 12:04:13 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=false)
Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fq,version=1239747075391,generation=566,filenames=[_19m.cfs,
_jm.cfs, _1bk.cfs, _193.cfx, _19z.cfs, _1b8.cfs, _1bf.cfs, _10g.cfs, _
2s.cfs, _1bf.cfx, _18x.cfx, _19c.cfx, _193.cfs, _18x.cfs, _1b7.cfs,
_1aw.cfs, _1aq.cfs, _1bi.cfx, _1a6.cfs, _19l.cfs, _1ad.cfs, _1a6.cfx,
_1as.cfs, _19l.cfx, _1aa.cfs, _1an.cfs, _19d.cfs, _1a3.cfx, _1a3.cfs,
_19g.cfs, _b7.cfs, _19
e.cfs, _19b.cfs, _1ab.cfs, _1b3.cfx, _19j.cfs, _190.cfs, _uu.cfs,
_1b3.cfs, _1ak.cfs, _19p.cfs, _195.cfs, _194.cfs, _19i.cfx, _199.cfs,
_19i.cfs, _19o.cfx, _196.cfs, _199.cfx, _196.cfx, _19o.cfs, _190.cfx,
_xn.cfs, _1b0.cfx, _1at.
cfs, _1av.cfs, _1ao.cfs, _1a9.cfx, _1b0.cfs, _5l.cfs, _1ao.cfx,
_1ap.cfs, _1b6.cfx, _19a.cfs, _139.cfs, _1a1.cfs, _s1.cfs, _1b6.cfs,
_1a9.cfs, _197.cfs, _1bd.cfs, _19n.cfs, _1au.cfx, _1au.cfs, _1a5.cfs,
_1be.cfs, segments_fq, _1b4.cfs, _gt.cfs, _1ag.cfs, _18z.cfs,
_162.cfs, _1a4.cfs, _198.cfs, _19x.cfs, _1ah.cfs, _1ai.cfs, _19q.cfs,
_1a7.cfs, _1ae.cfs, _19h.cfs, _19x.cfx, _1a2.cfs, _1bj.cfs, _1bb.cfs,
_1b1.cfs, _1ai.cfx, _19r.cfs, _18y.cfs, _19u.cfx, _1a8.
cfs, _19u.cfs, _1aj.cfs, _19r.cfx, _1ac.cfs, _1az.cfs, _1ac.cfx,
_19y.cfs, _1bc.cfx, _19s.cfs, _1ar.cfs, _1al.cfx, _1bg.cfs, _18v.cfs,
_1ar.cfx, _1bc.cfs, _1a0.cfx, _1b2.cfs, _1af.cfs, _1bi.cfs, _1af.cfx,
_19f.cfs, _1a0.cfs, _1bh.cfs, _19f.cfx, _19c.cfs, _e0.cfs, _1ax.cfx,
_1b5.cfs, _191.cfs, _18w.cfs, _19t.cfs, _8e.cfs, _19v.cfs, _192.cfs,
_1b9.cfs, _1ay.cfs, _p8.cfs, _19k.cfs, _1b9.cfx, _1ax.cfs, _1am.cfs,
_1ba.cfs, _mf.cfs, _1al.cfs, _19w.cfs]

commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fr,version=1239747075392,generation=567,filenames=[_jm.cfs,
_1bo.cfs, _xn.cfs, segments_fr, _8e.cfs, _gt.cfs, _18v.cfs, _uu.cfs,
_1
0g.cfs, _2s.cfs, _5l.cfs, _162.cfs, _p8.cfs, _139.cfs, _s1.cfs,
_mf.cfs, _b7.cfs, _e0.cfs]
Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1239747075392

Here is my default index settings,

 
   
true
100

64
2147483647
1
1000
1
single
  

What am I doing wrong here? What's causing these delays?

Thanks,
-vivek

Re: Question on StreamingUpdateSolrServer

2009-04-15 Thread vivek sar

Thanks Otis.

I did increase the number of file descriptors to 22K, but I still get
this problem. I've noticed following so far,

1) As soon as I get to around 1140 index segments (this is total over
multiple cores) I start seeing this problem.
2) When the problem starts occassionally the index request
(solrserver.commit) also fails with the following error,
  java.net.SocketException: Connection reset
3) Whenever the commit fails, I'm able to access Solr by the browser
(http://ets11.co.com/solr). If the commit is succssfull and going on I
get blank page on Firefox. Even the telnet to 8080 fails with
"Connection closed by foreign host."

It does seem like there is some resource issue as it happens only once
we reach a breaking point (too many index segment files) - lsof at
this point usually shows at 1400, but my ulimit is much higher than
that.

I already use compound format for index files. I can also run optimize
occassionally (though not preferred as it blocks the whole index cycle
for a long time). I do want to find out what resource limitation is
causing this and it has to do something with when Indexer is
committing the records where there are large number of segment files.

Any other ideas?

Thanks,
-vivek

On Wed, Apr 15, 2009 at 3:10 PM, Otis Gospodnetic
 wrote:
>
> One more thing.  I don't think this was mentioned, but you can:
> - optimize your indices
> - use compound index format
>
> That will lower the number of open file handles.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, April 10, 2009 5:59:37 PM
>> Subject: Re: Question on StreamingUpdateSolrServer
>>
>> I also noticed that the Solr app has over 6000 file handles open -
>>
>>     "lsof | grep solr | wc -l"   - shows 6455
>>
>> I've 10 cores (using multi-core) managed by the same Solr instance. As
>> soon as start up the Tomcat the open file count goes up to 6400.  Few
>> questions,
>>
>> 1) Why is Solr holding on to all the segments from all the cores - is
>> it because of auto-warmer?
>> 2) How can I reduce the open file count?
>> 3) Is there a way to stop the auto-warmer?
>> 4) Could this be related to "Tomcat returning blank page for every request"?
>>
>> Any ideas?
>>
>> Thanks,
>> -vivek
>>
>> On Fri, Apr 10, 2009 at 1:48 PM, vivek sar wrote:
>> > Hi,
>> >
>> >  I was using CommonsHttpSolrServer for indexing, but having two
>> > threads writing (10K batches) at the same time was throwing,
>> >
>> >  "ProtocolException: Unbuffered entity enclosing request can not be 
>> > repeated.
>> "
>> >
>> > I switched to StreamingUpdateSolrServer (using addBeans) and I don't
>> > see the problem anymore. The speed is very fast - getting around
>> > 25k/sec (single thread), but I'm facing another problem. When the
>> > indexer using StreamingUpdateSolrServer is running I'm not able to
>> > send any url request from browser to Solr web app. I just get blank
>> > page. I can't even get to the admin interface. I'm also not able to
>> > shutdown the Tomcat running the Solr webapp when the Indexer is
>> > running. I've to first stop the Indexer app and then stop the Tomcat.
>> > I don't have this problem when using CommonsHttpSolrServer.
>> >
>> > Here is how I'm creating it,
>> >
>> > server = new StreamingUpdateSolrServer(url, 1000,3);
>> >
>> > I simply call server.addBeans(...) on it. Is there anything else I
>> > need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
>> > become unresponsive  when Indexer using StreamingUpdateSolrServer is
>> > running (though, indexing happens fine)?
>> >
>> > Thanks,
>> > -vivek
>> >
>
>

Re: Using CSV for indexing ... Remote Streaming disabled

2009-04-16 Thread vivek sar

Any help on this? Could this error be because of something else (not
remote streaming issue)?

Thanks.

On Wed, Apr 15, 2009 at 10:04 AM, vivek sar  wrote:
> Hi,
>
>  I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki
> (http://wiki.apache.org/solr/UpdateCSV). I've updated the
> solrconfig.xml to have this lines,
>
>    
>         multipartUploadLimitInKB="20480" />
>        ...
>    
>
>    startup="lazy" />
>
> When I try to upload the csv,
>
>  curl 
> 'http://localhost:8080/solr/20090414_1/update/csv?commit=true&separator=%09&escape=%5c&stream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv'
>
> I get following response,
>
>    HTTP Status 400 - Remote Streaming is
> disabled.type Status
> reportmessage Remote Streaming is
> disabled.description The request sent by the
> client was syntactically incorrect (Remote Streaming is
> disabled.).Apache
> Tomcat/6.0.18
>
> Why is it complaining about the remote streaming if it's already
> enabled? Is there anything I'm missing?
>
> Thanks,
> -vivek
>

Re: Solr Search Error

2009-04-16 Thread vivek sar

Hi,

  I'm using the Solr 1.4 (03/29 nightly build) and when searching on a
large index (40G) I get the same exception as in this thread,

 HTTP Status 500 - 13724 java.lang.ArrayIndexOutOfBoundsException:
13724 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at
org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:262)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:126) at
org.apache.lucene.search.Searcher.search(Searcher.java:105) at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1072)
at
...

The search url  is,

http://think2.co.com:8080/solr/20090415_1/select/?q=japan&version=2.2&start=0&rows=10&indent=on

It would have millions of records matching this term, but I guess that
shouldn't throw this exception. I saw a similar jira to
ArrayOutOfBoundException,
https://issues.apache.org/jira/browse/SOLR-450  (it's not the same
though).

I also see someone reported this same problem back in 2007 so I'm not
sure whether it's a real bug or some configuration issue,

http://www.nabble.com/ArrayIndexOutOfBoundsException-on-TermScorer-td11750899.html#a11750899

Any ideas?

Thanks,
-vivek



On Fri, Mar 27, 2009 at 10:11 AM, Narayanan, Karthikeyan
 wrote:
> Hi Otis,
>              Thanks for the  recommendation. Will try with latest
> nightly build.. I  did couple of full data import and got this error at
> few times while searching..
>
>
> Thanks.
>
> Karthik
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Friday, March 27, 2009 12:57 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Search Error
>
>
> Hi Karthik,
>
> First thing I'd do is get the latest Solr nightly build.
> If that doesn't fix thing, I'd grab the latest Lucene nightly build and
> use it to replace Lucene jars that are in your version of Solr.
> If that doesn't work I'd email the ML with a bit more info about the
> type of search that causes this (e.g. Do all searches cause this or only
> some?  What do those that trigger this error look like or have in
> common?)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: "Narayanan, Karthikeyan" 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, March 27, 2009 11:42:12 AM
>> Subject: Solr Search Error
>>
>> Hi All,
>>            I am intermittently getting this Exception when I do the
> search.
>> What could be the reason?.
>>
>> Caused by: org.apache.solr.common.SolrException: 11938
>> java.lang.ArrayIndexOutOfBoundsException: 11938         at
>> org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
> at
>> org.apache.lucene.search.TermScorer.score(TermScorer.java:61)
> at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)
> at
>> org.apache.lucene.search.Searcher.search(Searcher.java:126)  at
>> org.apache.lucene.search.Searcher.search(Searcher.java:105)  at
>>
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
> java:966)
>>   at
>>
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j
> ava:838)
>>    at
>>
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2
> 69)  at
>>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
> java:160)
>>   at
>>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
> Handler.java:169)
>>   at
>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> ase.java:131)
>>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> at
>>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> va:303)
>>     at
>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> ava:232)
>>    at
>>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
> tionFilterChain.java:215)
>>   at
>>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
> erChain.java:188)
>>   at
>>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
> e.java:210)
>> at
>>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
> e.java:174)
>> at
>>
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator
> Base.java:433)
>>      at
>>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
> :127)
>>     at
>>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
> :117)
>>     at
>>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
> java:108)
>>   at
>>
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
> 51)  at
>>
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87
> 0)   at
>>
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.p

Multiple Solr-instance share same solr.home

2009-04-19 Thread vivek sar

Hi,

 Is it possible to have two solr instances share the same solr.home?
I've two Solr instances running on the same box and I was wondering if
I can configure them to have the same solr.home. I tried it, but looks
like the second instance overwrites the first one's value in the
solr.xml (I'm using multicore for both instances). This is just for
convenience so I don't have to manage multiple solr index directory
locations - I can have all the indexes written into the same location
and do the clean up from one place itself. If this is not supported
then it's not a big deal.

Thanks,
-vivek

Re: Multiple Solr-instance share same solr.home

2009-04-19 Thread vivek sar

Both Solr instances will be writing to separate indexes, but can they
share the same solr.home? So, here is what I want,

1) solr.home = solr/multicore
2) There is a single solr.xml under multicore directory
3) Each instance would use the same solr.xml, which will have entries
for multiple cores
4) Each instance will write to different core at a time - so one index
will be written by only one writer at a time.

not sure if this is a supported configuration.

Thanks.
-vivek




On Sun, Apr 19, 2009 at 5:55 AM, Otis Gospodnetic
 wrote:
>
> Vivek - no, unless you want trouble - only 1 writer can write to a specific 
> index at a time.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Sunday, April 19, 2009 4:33:00 AM
>> Subject: Multiple Solr-instance share same solr.home
>>
>> Hi,
>>
>> Is it possible to have two solr instances share the same solr.home?
>> I've two Solr instances running on the same box and I was wondering if
>> I can configure them to have the same solr.home. I tried it, but looks
>> like the second instance overwrites the first one's value in the
>> solr.xml (I'm using multicore for both instances). This is just for
>> convenience so I don't have to manage multiple solr index directory
>> locations - I can have all the indexes written into the same location
>> and do the clean up from one place itself. If this is not supported
>> then it's not a big deal.
>>
>> Thanks,
>> -vivek
>
>

Control segment size

2009-04-23 Thread vivek sar

Hi,

  Is there any configuration to control the segments' file size in
Solr? Currently, I've an index (70G) with 80 segment files and one of
the file is 24G. We noticed that in some cases commit takes over 2
hours to complete (committing 50K records), whereas usually it
finishes in 20 seconds. After further investigation it turns out the
system was doing lot of paging - the file system buffer was trying to
write back the big segment back to disk. I got 20G memory on system
with 6 G assigned to Solr instance (running 2 instances).

It seems if I can control the segment size to max of 4-5 GB I'll be
ok. Is there any way to do so?

I got merging factor of 100 - does that impacts the size too? Why
different segments have different size?

Thanks,
-vivek

Using UUID for unique key

2009-05-05 Thread vivek sar

Hi,

 I've a distributed Solr instances. I'm using Java's UUID
(UUID.randomUUID()) to generate the unique id for my documents. Before
adding unique key I was able to commit 50K records in 15sec (pretty
constant over the growing index), after adding unique key it's taking
over 35 sec for 50k and the time is increasing as the index size
grows. Here is my schema setting for unique key,



Why is commit taking so long? Should I not be using UUID key for
unique keys? What are other options - timestamp etc.?

Thanks,
-vivek

Re: Using UUID for unique key

2009-05-05 Thread vivek sar

I did clean up the indexes and re-started the index process from
scratch (new index file). As another test if I use simple numeric
counter for unique id the index speed is fast (within 20 sec for
commit 50k records). I'm thinking UUID might not be the way to go for
unique id - I'll look into using sequence# instead.

Thanks,
-vivek

On Tue, May 5, 2009 at 11:03 AM, Otis Gospodnetic
 wrote:
>
> You really had nothing in uniqueKey element in schema.xml at first?  I'm not 
> looking at Solr code right now, but it could be the lack of the cost of that 
> lookup that made things faster.  Now you have a lookup + generation + more 
> data to pass through analyzer + write out, though I can't imagine how that 
> would make things 2x slower.  You didn't say whether you cleared the old 
> index after adding UUID key did you do that?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, May 5, 2009 1:49:21 PM
>> Subject: Using UUID for unique key
>>
>> Hi,
>>
>> I've a distributed Solr instances. I'm using Java's UUID
>> (UUID.randomUUID()) to generate the unique id for my documents. Before
>> adding unique key I was able to commit 50K records in 15sec (pretty
>> constant over the growing index), after adding unique key it's taking
>> over 35 sec for 50k and the time is increasing as the index size
>> grows. Here is my schema setting for unique key,
>>
>>
>> required="true" omitNorms="true" compressed="false"/>
>>
>> Why is commit taking so long? Should I not be using UUID key for
>> unique keys? What are other options - timestamp etc.?
>>
>> Thanks,
>> -vivek
>
>

Delete complete core without stopping Solr

2009-05-06 Thread vivek sar

Hi,

  I'm using multi-core feature of Solr. Each Solr instance maintains
multiple-core - each core of size 100G. I would like to delete older
cores directory completely after 2 weeks (using file.delete).
Currently, Solr loads all the cores that are listed in solr.xml. I was
thinking of following,

1) Call unload service to unload the core from Solr  - would this
remove the entry from solr.xml as well?
2) Delete the core directory

Would this work? I'm hoping I don't have to restart or do any
individual document deletes.

Thanks,
-vivek

Re: Control segment size

2009-05-07 Thread vivek sar

Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> You are looking for maxMergeDocs, I believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, April 23, 2009 1:08:20 PM
>> Subject: Control segment size
>>
>> Hi,
>>
>>   Is there any configuration to control the segments' file size in
>> Solr? Currently, I've an index (70G) with 80 segment files and one of
>> the file is 24G. We noticed that in some cases commit takes over 2
>> hours to complete (committing 50K records), whereas usually it
>> finishes in 20 seconds. After further investigation it turns out the
>> system was doing lot of paging - the file system buffer was trying to
>> write back the big segment back to disk. I got 20G memory on system
>> with 6 G assigned to Solr instance (running 2 instances).
>>
>> It seems if I can control the segment size to max of 4-5 GB I'll be
>> ok. Is there any way to do so?
>>
>> I got merging factor of 100 - does that impacts the size too? Why
>> different segments have different size?
>>
>> Thanks,
>> -vivek
>
>

Re: Control segment size

2009-05-11 Thread vivek sar

Shalin,

 Here is what I've read on maxMergeDocs,

 "While merging segments, Lucene will ensure that no segment with more
than maxMergeDocs is created."

 Wouldn't that mean that no index file should contain more than max
docs? I guess the index files could also just contain the index
information which is not limited by any property - is that true?

Is there any work around to limit the index size, beside limiting the
index itself?

Thanks,
-vivek

On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar
 wrote:
> On Fri, May 8, 2009 at 1:30 AM, vivek sar  wrote:
>
>>
>> I did set the maxMergeDocs to 10M, but I still see couple of index
>> files over 30G which do not match with max number of documents. Here
>> are some numbers,
>>
>> 1) My total index size = 66GB
>> 2) Number of total documents = 200M
>> 3) 1M doc = 300MB
>> 4) 10M doc should be roughly around 3-4GB.
>>
>> As you can see couple of files are huge. Are those documents or index
>> files? How can I control the file size so no single file grows more
>> than 10GB.
>>
>
> No, there is no way to limit an individual file to a specific size.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Commits taking too long

2009-05-13 Thread vivek sar

Hi,

  This problem is still haunting us. I've reduced the merge factor to
50, but as my index get fat (anything over 20G), the commit starts
taking much longer. Some info,

1) Less than 20 G index size, 5000 records commit takes around 15sec
2) Over 20G the commit starts taking 50-70sec for 5K records
3) mergefactor = 50
4) Using multicore - each core is around 70G (currently there are 5
cores maintained by single Solr instance)
5) RAM = 6G
6) OS = OS X 10.5
7) JVM Options:

export JAVA_OPTS="-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,address=3090,suspend=n \
  -server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m \
  -XX:NewRatio=2 -XX:MaxPermSize=512m \
  -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=${AC_ROOT}/data/pmiJavaHeapDump.hprof \
  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360 \
  -Droot.dir=$AC_ROOT"

export CATALINA_OPTS="-server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m 
\
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50
-XX:-UseGCOverheadLimit"

I also see following from GC log to coincide with commit slowness,

40387.691: [GC 40387.691: [ParNew (promotion failed):
132131K->149120K(149120K), 186.3768727 secs]40574.068: [CMSbailing out
to foreground collection
40736.670: [CMS-concurrent-mark: 168.574/356.749 secs] [Times:
user=276.41 sys=1192.51, real=356.77 secs]
 (concurrent mode failure): 6116976K->5908559K(6121088K), 174.0819842
secs] 6229178K->5908559K(6270208K), 360.4589949 secs] [Times:
user=267.90 sys=1185.49, real=360.48 secs]
40748.155: [GC [1 CMS-initial-mark: 5908559K(6121088K)]
5910029K(6270208K), 0.0014832 secs] [Times: user=0.00 sys=0.00,
real=0.00 secs]
40748.156: [CMS-concurrent-mark-start]
40748.513: [GC 40748.513: [ParNew: 127872K->21248K(149120K), 0.7482810
secs] 6036431K->6050277K(6270208K), 0.7483775 secs] [Times: user=1.66
sys=0.71, real=0.75 secs]
40749.613: [GC 40749.613: [ParNew: 149120K->149120K(149120K),
0.227 secs]40749.613: [CMS40784.961: [CMS-concurrent-mark:
36.055/36.805 secs] [Times: user=20.74 sys=31.41, real=36.81 secs]
 (concurrent mode failure): 6029029K->4899386K(6121088K), 44.2068275
secs] 6178149K->4899386K(6270208K), 44.2069457 secs] [Times:
user=26.05 sys=30.21, real=44.21 secs]

Few questions,

1) Should I lower the merge factor even more? Low merge factor seems
to cause more frequent commit pauses.
2)  Do I need more RAM to maintain large indexes?
3) Should I not have any core bigger than 20G?
4) Any other configuration (Solr or JVM) that can help with this?
5) Does search has to wait until commit completes? Right now the
search doesn't return while the commit is happening.

We are using Solr 1.4 (nightly build from 3/29/09).

Thanks,
-vivek

On Wed, Apr 15, 2009 at 11:41 AM, Mark Miller  wrote:
> vivek sar wrote:
>>
>> Hi,
>>
>>  I've index where I commit every 50K records (using Solrj). Usually
>> this commit takes 20sec to complete, but every now and then the commit
>> takes way too long - from 10 min to 30 min. I see more delays as the
>> index size continues to grow - once it gets over 5G I start seeing
>> long commit cycles more frequently. See this for ex.,
>>
>> Apr 15, 2009 12:04:13 AM org.apache.solr.update.DirectUpdateHandler2
>> commit
>> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=false)
>> Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy onCommit
>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>>
>>  commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fq,version=1239747075391,generation=566,filenames=[_19m.cfs,
>> _jm.cfs, _1bk.cfs, _193.cfx, _19z.cfs, _1b8.cfs, _1bf.cfs, _10g.cfs, _
>> 2s.cfs, _1bf.cfx, _18x.cfx, _19c.cfx, _193.cfs, _18x.cfs, _1b7.cfs,
>> _1aw.cfs, _1aq.cfs, _1bi.cfx, _1a6.cfs, _19l.cfs, _1ad.cfs, _1a6.cfx,
>> _1as.cfs, _19l.cfx, _1aa.cfs, _1an.cfs, _19d.cfs, _1a3.cfx, _1a3.cfs,
>> _19g.cfs, _b7.cfs, _19
>> e.cfs, _19b.cfs, _1ab.cfs, _1b3.cfx, _19j.cfs, _190.cfs, _uu.cfs,
>> _1b3.cfs, _1ak.cfs, _19p.cfs, _195.cfs, _194.cfs, _19i.cfx, _199.cfs,
>> _19i.cfs, _19o.cfx, _196.cfs, _199.cfx, _196.cfx, _19o.cfs, _190.cfx,
>> _xn.cfs, _1b0.cfx, _1at.
>> cfs, _1av.cfs, _1ao.cfs, _1a9.cfx, _1b0.cfs, _5l.cfs, _1ao.cfx,
>> _1ap.cfs, _1b6.cfx, _19a.cfs, _139.cfs, _1a1.cfs, _s1.cfs, _1b6.cfs,
>> _1a9.cfs, _197.cfs, _1bd.cfs, _19n.cfs, _1au.cfx, _1au.cfs, _1a5.cfs,
>> _1be.cfs, segments_fq, _1b4.cfs, _gt.cfs, _1ag.cfs, _18z.cfs,
>> _162.cfs, _1a4.cfs, _198.cfs, _19x.cfs, _1ah.cfs, _1ai.cfs, _19q.cfs,
>> _1a7.cfs, _1ae.cfs, _19h.cfs, _19x.cfx, _1a2.cfs, _1bj.cfs, _1bb.cfs,
>> _1b1.cfs, _1ai.cfx, _19r.cfs, _18y.cfs, _19u.cfx, _1a8.
>> cfs, _1

Solr memory requirements?

2009-05-13 Thread vivek sar

Hi,

  I'm pretty sure this has been asked before, but I couldn't find a
complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory? Let's say
I've 4 cores with each core 50G in size. When Solr comes up how much
of it would be loaded in memory?

2) How much memory is required during index time? If I'm committing
50K records at a time (1 record = 1KB) using solrj, how much memory do
I need to give to Solr.

3) Is there a minimum memory requirement by Solr to maintain a certain
size index? Is there any benchmark on this?

Here are some of my configuration from solrconfig.xml,

1) 64
2) All the caches (under query tag) are commented out
3) Few others,
  a)  true==>
would this require memory?
  b)  50
  c) 200
  d) 
  e) false
  f)  2

The problem we are having is following,

I've given Solr RAM of 6G. As the total index size (all cores
combined) start growing the Solr memory consumption  goes up. With 800
million documents, I see Solr already taking up all the memory at
startup. After that the commits, searches everything become slow. We
will be having distributed setup with multiple Solr instances (around
8) on four boxes, but our requirement is to have each Solr instance at
least maintain around 1.5 billion documents.

We are trying to see if we can somehow reduce the Solr memory
footprint. If someone can provide a pointer on what parameters affect
memory and what effects it has we can then decide whether we want that
parameter or not. I'm not sure if there is any minimum Solr
requirement for it to be able maintain large indexes. I've used Lucene
before and that didn't require anything by default - it used up memory
only during index and search times - not otherwise.

Any help is very much appreciated.

Thanks,
-vivek

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


   
   

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
>
> Hi,
> Some answers:
> 1) .tii files in the Lucene index.  When you sort, all distinct values for 
> the field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to trigger snapshot creation.
> 3) see 1) above
>
> 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> going to fly. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> Subject: Solr memory requirements?
>>
>> Hi,
>>
>>   I'm pretty sure this has been asked before, but I couldn't find a
>> complete answer in the forum archive. Here are my questions,
>>
>> 1) When solr starts up what does it loads up in the memory? Let's say
>> I've 4 cores with each core 50G in size. When Solr comes up how much
>> of it would be loaded in memory?
>>
>> 2) How much memory is required during index time? If I'm committing
>> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> I need to give to Solr.
>>
>> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> size index? Is there any benchmark on this?
>>
>> Here are some of my configuration from solrconfig.xml,
>>
>> 1) 64
>> 2) All the caches (under query tag) are commented out
>> 3) Few others,
>>       a)  true    ==>
>> would this require memory?
>>       b)  50
>>       c) 200
>>       d)
>>       e) false
>>       f)  2
>>
>> The problem we are having is following,
>>
>> I've given Solr RAM of 6G. As the total index size (all cores
>> combined) start growing the Solr memory consumption  goes up. With 800
>> million documents, I see Solr already taking up all the memory at
>> startup. After that the commits, searches everything become slow. We
>> will be having distributed setup with multiple Solr instances (around
>> 8) on four boxes, but our requirement is to have each Solr instance at
>> least maintain around 1.5 billion documents.
>>
>> We are trying to see if we can somehow reduce the Solr memory
>> footprint. If someone can provide a pointer on what parameters affect
>> memory and what effects it has we can then decide whether we want that
>> parameter or not. I'm not sure if there is any minimum Solr
>> requirement for it to be able maintain large indexes. I've used Lucene
>> before and that didn't require anything by default - it used up memory
>> only during index and search times - not otherwise.
>>
>> Any help is very much appreciated.
>>
>> Thanks,
>> -vivek
>
>

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/index/_3au.tii

I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Sorting is triggered by the sort parameter in the URL, not a characteristic 
> of a field. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> Subject: Re: Solr memory requirements?
>>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>
>> required="true" omitNorms="true" compressed="false"/>
>>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> default="NOW/HOUR"  compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>
>>
>> omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values for 
>> > the
>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>> > consume
>> during indexing.  There is no need to commit every 50K docs unless you want 
>> to
>> trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
>> > going
>> to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in me

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

Just an update on the memory issue - might be useful for others. I
read the following,

 http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)

and looks like the first and new searcher listeners would populate the
FieldCache. Commenting out these two listener entries seems to do the
trick - at least the heap size is not growing as soon as Solr starts
up.

I ran some searches and they all came out fine. Index rate is also
pretty good. Would there be any impact of disabling these listeners?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
> Otis,
>
> In that case, I'm not sure why Solr is taking up so much memory as
> soon as we start it up. I checked for .tii file and there is only one,
>
> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
>
> I have all the cache disabled - so that shouldn't be a problem too. My
> ramBuffer size is only 64MB.
>
> I read note on sorting,
> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> something related to FieldCache. I don't see this as parameter defined
> in either solrconfig.xml or schema.xml. Could this be something that
> can load things in memory at startup? How can we disable it?
>
> I'm trying to find out if there is a way to tell how much memory Solr
> would consume and way to cap it.
>
> Thanks,
> -vivek
>
>
>
>
> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>  wrote:
>>
>> Hi,
>>
>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>> of a field. :)
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: vivek sar 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>>> Subject: Re: Solr memory requirements?
>>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>
>>> required="true" omitNorms="true" compressed="false"/>
>>>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> default="NOW/HOUR"  compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>
>>>
>>> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>> wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
>>> > for the
>>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>>> > consume
>>> during indexing.  There is no need to commit every 50K docs unless you want 
>>> to
>>> trigger

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and doesn't release?

I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.

Some other query properties under solrconfig,


   1024
   true
   50
   200

   false
   2
 

Currently, I got 800 million documents and have specified 8G heap size.

Any other suggestion on what can I do to control the Solr memory consumption?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
> Just an update on the memory issue - might be useful for others. I
> read the following,
>
>  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>
> and looks like the first and new searcher listeners would populate the
> FieldCache. Commenting out these two listener entries seems to do the
> trick - at least the heap size is not growing as soon as Solr starts
> up.
>
> I ran some searches and they all came out fine. Index rate is also
> pretty good. Would there be any impact of disabling these listeners?
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>>  wrote:
>>>
>>> Hi,
>>>
>>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>>> of a field. :)
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
>>>> From: vivek sar 
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>>>> Subject: Re: Solr memory requirements?
>>>>
>>>> Thanks Otis.
>>>>
>>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>>> I've configured anything wrong.
>>>>
>>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>>> stored). All my fields are basic data type - which I thought are not
>>>> sorted. My id field is unique key.
>>>>
>>>> Is there any field here that might be getting sorted?
>>>>
>>>>
>>>> required="true" omitNorms="true" compressed="false"/>
>>>>
>>>>
>>>> compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> default="NOW/HOUR"  compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> compressed="false"/>
>>>>
>>>> compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" co

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

I think maxBufferedDocs has been deprecated in Solr 1.4 - it's
recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64.
This shouldn't be a problem I think.

There has to be something else that Solr is holding up in memory. Anyone else?

Thanks,
-vivek

On Wed, May 13, 2009 at 4:01 PM, Jack Godwin  wrote:
> Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
> 3 million docs.
> Jack
>
> On Wed, May 13, 2009 at 6:53 PM, vivek sar  wrote:
>
>> Disabling first/new searchers did help for the initial load time, but
>> after 10-15 min the heap memory start climbing up again and reached
>> max within 20 min. Now the GC is coming up all the time, which is
>> slowing down the commit and search cycles.
>>
>> This is still puzzling what does Solr holds in the memory and doesn't
>> release?
>>
>> I haven't been able to profile as the dump is too big. Would setting
>> termIndexInterval help - not sure how can that be set using Solr.
>>
>> Some other query properties under solrconfig,
>>
>> 
>>   1024
>>   true
>>   50
>>   200
>>    
>>   false
>>   2
>>  
>>
>> Currently, I got 800 million documents and have specified 8G heap size.
>>
>> Any other suggestion on what can I do to control the Solr memory
>> consumption?
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
>> > Just an update on the memory issue - might be useful for others. I
>> > read the following,
>> >
>> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>> >
>> > and looks like the first and new searcher listeners would populate the
>> > FieldCache. Commenting out these two listener entries seems to do the
>> > trick - at least the heap size is not growing as soon as Solr starts
>> > up.
>> >
>> > I ran some searches and they all came out fine. Index rate is also
>> > pretty good. Would there be any impact of disabling these listeners?
>> >
>> > Thanks,
>> > -vivek
>> >
>> > On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> >> Otis,
>> >>
>> >> In that case, I'm not sure why Solr is taking up so much memory as
>> >> soon as we start it up. I checked for .tii file and there is only one,
>> >>
>> >> -rw-r--r--  1 search  staff  20306 May 11 21:47
>> ./20090510_1/data/index/_3au.tii
>> >>
>> >> I have all the cache disabled - so that shouldn't be a problem too. My
>> >> ramBuffer size is only 64MB.
>> >>
>> >> I read note on sorting,
>> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> >> something related to FieldCache. I don't see this as parameter defined
>> >> in either solrconfig.xml or schema.xml. Could this be something that
>> >> can load things in memory at startup? How can we disable it?
>> >>
>> >> I'm trying to find out if there is a way to tell how much memory Solr
>> >> would consume and way to cap it.
>> >>
>> >> Thanks,
>> >> -vivek
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> >>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Sorting is triggered by the sort parameter in the URL, not a
>> characteristic of a field. :)
>> >>>
>> >>> Otis
>> >>> --
>> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >>>
>> >>>
>> >>>
>> >>> - Original Message 
>> >>>> From: vivek sar 
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> >>>> Subject: Re: Solr memory requirements?
>> >>>>
>> >>>> Thanks Otis.
>> >>>>
>> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> >>>> I've configured anything wrong.
>> >>>>
>> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> >>>> stored). All my fields are basic data type - which I thought are not
>> >>>> sorted. My id field is unique key.
>> >>>>
>> >>>> Is there any fie

Re: Solr memory requirements?

2009-05-13 Thread vivek sar

Otis,

 We are not running master-slave configuration. We get very few
searches(admin only) in a day so we didn't see the need of
replication/snapshot. This problem is with one Solr instance managing
4 cores (each core 200 million records). Both indexing and searching
is performed by the same Solr instance.

What are .tii files used for? I see this file under only one core.

Still looking for what gets loaded in heap by Solr (during load time,
indexing, and searching) and stays there. I see most of these are
tenured objects and not getting released by GC - will post profile
records tomorrow.

Thanks,
-vivek





On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic
 wrote:
>
> There is constant mixing of indexing concepts and searching concepts in this 
> thread.  Are you having problems on the master (indexing) or on the slave 
> (searching)?
>
>
> That .tii is only 20K and you said this is a large index?  That doesn't smell 
> right...
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 5:12:00 PM
>> Subject: Re: Solr memory requirements?
>>
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> >
>> > Sorting is triggered by the sort parameter in the URL, not a 
>> > characteristic of
>> a field. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> >> Subject: Re: Solr memory requirements?
>> >>
>> >> Thanks Otis.
>> >>
>> >> Our use case doesn't require any sorting or faceting. I'm wondering if
>> >> I've configured anything wrong.
>> >>
>> >> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> >> stored). All my fields are basic data type - which I thought are not
>> >> sorted. My id field is unique key.
>> >>
>> >> Is there any field here that might be getting sorted?
>> >>
>> >>
>> >> required="true" omitNorms="true" compressed="false"/>
>> >>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> default="NOW/HOUR"  compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false&qu

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  wrote:
> Warning: I'm wy out of my competency range when I comment
> on SOLR, but I've seen the statement that string fields are NOT
> tokenized while text fields are, and I notice that almost all of your fields
> are string type.
>
> Would someone more knowledgeable than me care to comment on whether
> this is at all relevant? Offered in the spirit that sometimes there are
> things
> so basic that only an amateur can see them 
>
> Best
> Erick
>
> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>  > required="true" omitNorms="true" compressed="false"/>
>>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > default="NOW/HOUR"  compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>   
>>   > omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>  wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>> caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>> consume during indexing.  There is no need to commit every 50K docs unless
>> you want to trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>> going to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in memory?
>> >>
>> >> 2) How much memory is required during index time? If I'm committing
>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> >> I need to give to Solr.
>> >>
>> >> 3) Is there a minimum memory requirement

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
> I don't know if field type has any impact on the memory usage - does it?
>
> Our use cases require complete matches, thus there is no need of any
> analysis in most cases - does it matter in terms of memory usage?
>
> Also, is there any default caching used by Solr if I comment out all
> the caches under query in solrconfig.xml? I also don't have any
> auto-warming queries.
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  
> wrote:
>> Warning: I'm wy out of my competency range when I comment
>> on SOLR, but I've seen the statement that string fields are NOT
>> tokenized while text fields are, and I notice that almost all of your fields
>> are string type.
>>
>> Would someone more knowledgeable than me care to comment on whether
>> this is at all relevant? Offered in the spirit that sometimes there are
>> things
>> so basic that only an amateur can see them 
>>
>> Best
>> Erick
>>
>> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>  >> required="true" omitNorms="true" compressed="false"/>
>>>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> default="NOW/HOUR"  compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
&

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

Thanks Mark.

I checked all the items you mentioned,

1) I've omitnorms=true for all my indexed fields (stored only fields I
guess doesn't matter)
2) I've tried commenting out all caches in the solrconfig.xml, but
that doesn't help much
3) I've tried commenting out the first and new searcher listeners
settings in the solrconfig.xml - the only way that helps is that at
startup time the memory usage doesn't spike up - that's only because
there is no auto-warmer query to run. But, I noticed commenting out
searchers slows down any other queries to Solr.
4) I don't have any sort or facet in my queries
5) I'm not sure how to change the "Lucene term interval" from Solr -
is there a way to do that?

I've been playing around with this memory thing the whole day and have
found that it's the search that's hogging the memory. Any time there
is a search on all the records (800 million) the heap consumption
jumps by 5G. This makes me think there has to be some configuration in
Solr that's causing some terms per document to be loaded in memory.

I've posted my settings several times on this forum, but no one has
been able to pin point what configuration might be causing this. If
someone is interested I can attach the solrconfig and schema files as
well. Here are the settings again under Query tag,


  1024
  true
  50
  200
   
  false
  2
 

and schema,

 

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

  
  

Any help is greatly appreciated.

Thanks,
-vivek

On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
> 800 million docs is on the high side for modern hardware.
>
> If even one field has norms on, your talking almost 800 MB right there. And
> then if another Searcher is brought up well the old one is serving (which
> happens when you update)? Doubled.
>
> Your best bet is to distribute across a couple machines.
>
> To minimize you would want to turn off or down caching, don't facet, don't
> sort, turn off all norms, possibly get at the Lucene term interval and raise
> it. Drop on deck searchers setting. Even then, 800 million...time to
> distribute I'd think.
>
> vivek sar wrote:
>>
>> Some update on this issue,
>>
>> 1) I attached jconsole to my app and monitored the memory usage.
>> During indexing the memory usage goes up and down, which I think is
>> normal. The memory remains around the min heap size (4 G) for
>> indexing, but as soon as I run a search the tenured heap usage jumps
>> up to 6G and remains there. Subsequent searches increases the heap
>> usage even more until it reaches the max (8G) - after which everything
>> (indexing and searching becomes slow).
>>
>> The search query is a very generic one in this case which goes through
>> all the cores (4 of them - 800 million records), finds 400million
>> matches and returns 100 rows.
>>
>> Does the Solr searcher holds up the reference to objects in memory? I
>> couldn't find any settings that would tell me it does, but every
>> search causing heap to go up is definitely suspicious.
>>
>> 2) I ran the jmap histo to get the top objects (this is on a smaller
>> instance with 2 G memory, this is before running search - after
>> running search I wasn't able to run jmap),
>>
>>  num     #instances         #bytes  class name
>> --
>>   1:       3890855      222608992  [C
>>   2:       3891673      155666920  java.lang.String
>>   3:       3284341      131373640  org.apache.lucene.index.TermInfo
>>   4:       3334198      106694336  org.apache.lucene.index.Term
>>   5:           271       26286496  [J
>>   6:            16       26273936  [Lorg.apache.lucene.index.Term;
>>   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
>>   8:        320512       15384576
>> org.apache.lucene.index.FreqProxTermsWriter$PostingList
>>   9:         10335       11554136  [I
>>
>> I'm not sure what's the first one (C)? I couldn't profile it to know
>> what all the Strings are being allocated by - any ideas?
>>
>> Any ideas on what Searcher might be holding on and how can we change
>> that behavior?
>>
>> Thanks,
>> -vivek
>>
>>
>> On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
>>
>>>
>>> I don't know if field type has any impact on the memory usage - does it?
>>>
>>> Our use cases require complete matches, thus there is no need of any
>>> analysis in most cases - does it matter in terms of memory usage?
>>>
>>> Also, is there any default caching used by Solr if I comment out all
&g

Re: Solr memory requirements?

2009-05-15 Thread vivek sar

Some more info,

  Profiling the heap dump shows
"org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object
- taking up almost 80% of total memory (6G) - see the attached screen
shot for a smaller dump. There is some norms object - not sure where
are they coming from as I've omitnorms=true for all indexed records.

I also noticed that if I run a query - let's say generic query that
hits 100million records and then follow up with a specific query -
which hits only 1 record, the second query causes the increase in
heap.

Looks like there are few bytes being loaded into memory for each
document - I've checked the schema all indexes have omitNorms=true,
all caches are commented out - still looking to see what else might
put things in memory which don't get collected by GC.

I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
1.4 (which I'm using). Not sure if that can cause any problem. I do
use range queries for dates - would that have any effect?

Any other ideas?

Thanks,
-vivek

On Thu, May 14, 2009 at 8:38 PM, vivek sar  wrote:
> Thanks Mark.
>
> I checked all the items you mentioned,
>
> 1) I've omitnorms=true for all my indexed fields (stored only fields I
> guess doesn't matter)
> 2) I've tried commenting out all caches in the solrconfig.xml, but
> that doesn't help much
> 3) I've tried commenting out the first and new searcher listeners
> settings in the solrconfig.xml - the only way that helps is that at
> startup time the memory usage doesn't spike up - that's only because
> there is no auto-warmer query to run. But, I noticed commenting out
> searchers slows down any other queries to Solr.
> 4) I don't have any sort or facet in my queries
> 5) I'm not sure how to change the "Lucene term interval" from Solr -
> is there a way to do that?
>
> I've been playing around with this memory thing the whole day and have
> found that it's the search that's hogging the memory. Any time there
> is a search on all the records (800 million) the heap consumption
> jumps by 5G. This makes me think there has to be some configuration in
> Solr that's causing some terms per document to be loaded in memory.
>
> I've posted my settings several times on this forum, but no one has
> been able to pin point what configuration might be causing this. If
> someone is interested I can attach the solrconfig and schema files as
> well. Here are the settings again under Query tag,
>
> 
>  1024
>  true
>  50
>  200
>   
>  false
>  2
>  
>
> and schema,
>
>   required="true" omitNorms="true" compressed="false"/>
>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   default="NOW/HOUR"  compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   default="NOW/HOUR" omitNorms="true"/>
>
>  
>   omitNorms="true" multiValued="true"/>
>
> Any help is greatly appreciated.
>
> Thanks,
> -vivek
>
> On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
>> 800 million docs is on the high side for modern hardware.
>>
>> If even one field has norms on, your talking almost 800 MB right there. And
>> then if another Searcher is brought up well the old one is serving (which
>> happens when you update)? Doubled.
>>
>> Your best bet is to distribute across a couple machines.
>>
>> To minimize you would want to turn off or down caching, don't facet, don't
>> sort, turn off all norms, possibly get at the Lucene term interval and raise
>> it. Drop on deck searchers setting. Even then, 800 million...time to
>> distribute I'd think.
>>
>> vivek sar wrote:
>>>
>>> Some update on this issue,
>>>
&

Re: Defining DataDir in Multi-Core

2009-05-19 Thread vivek sar

Yeah, it was sometime back - it did work. Thanks for following up.

On Tue, May 19, 2009 at 12:34 AM, RaghavPrabhu  wrote:
>
> Hi Vivek,
>
>  Have you figure out the problem of creating the data dir in wrong
> location?
>
>  For me its working...
>
>   Just comment the data dir (in solrconfig.xml file) and create the core
> via REST call. It should work!!!
>
> Thanks & regards
> Prabhu.K
>
>
>
> vivek sar wrote:
>>
>> Hi,
>>
>>   I tried the latest nightly build (04-01-09) - it takes the dataDir
>> property now, but it's creating the Data dir at the wrong location.
>> For ex., I've the following in solr.xml,
>>
>> 
>>     
>>       > dataDir="/Users/opal/temp/afterchat/solr/data/core0"/>
>>     
>> 
>>
>> but, it always seem to be creating the solr/data directory in the cwd
>> (where I started the Tomcat from). Here is the log from Catalina.out,
>>
>> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore 
>> INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/,
>> dataDir=./solr/data/
>> ..
>> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex
>> WARNING: [core2] Solr index directory './solr/data/index' doesn't
>> exist. Creating new index...
>>
>> I've also tried relative paths, but to no avail.
>>
>> Is this a bug?
>>
>> Thanks,
>> -vivek
>>
>> On Wed, Apr 1, 2009 at 9:45 AM, vivek sar  wrote:
>>> Thanks Shalin.
>>>
>>> Is it available in the latest nightly build?
>>>
>>> Is there any other way I can create cores dynamically (using CREATE
>>> service) which will use the same schema.xml and solrconfig.xml, but
>>> write to different data directories?
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar
>>>  wrote:
>>>> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar  wrote:
>>>>> I'm using the latest released one - Solr 1.3. The wiki says passing
>>>>> dataDir to CREATE action (web service) should work, but that doesn't
>>>>> seem to be working.
>>>>>
>>>>
>>>> That is a Solr 1.4 feature (not released yet).
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Defining-DataDir-in-Multi-Core-tp22818543p23611179.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Defining DataDir in Multi-Core

2009-05-19 Thread vivek sar

As per splitting the index, I simply start a new core once a core
reaches a certain size - using CREATE and then start writing to that
new core. Note that Solr will maintain all the cores defined in the
solr.xml.

As per reusing the same schema and solrconfig.xml - you can create a
default core (say core0) and put in the conf directory there. In the
solr.xml for every new core set the instanceDir to core0 and its
dataDir to the new core's data directory.

Hope this helps.

-vivek

2009/5/19 Noble Paul നോബിള്‍  नोब्ळ् :
> On Tue, May 19, 2009 at 2:32 PM, KK  wrote:
>> I could not follow[is this mail a continuation of some old mail, a part of
>> which seems to be missing], but I want to.
>> Is it the case that CREATE is to be supported by solr1.4 i.e currently
>> solr1.3 doesnot support this? Correct me if I'm wrong .
> CREATE is supported in Solr1.3 also
>
> the dataDir attribute is a new feature in 1.4
>>
>> Vivek could you please tell me how did you fix the problem of using a single
>> schema and config file for all cores and having different data directories.
>> I'm stuck at the same point as you were. Please help me out. Can you provide
>> some specific examples that shows the way you used the create statement to
>> register new cores on the fly. Thank you .
>>
>> --KK
>>
>> On Tue, May 19, 2009 at 1:17 PM, vivek sar  wrote:
>>
>>> Yeah, it was sometime back - it did work. Thanks for following up.
>>>
>>> On Tue, May 19, 2009 at 12:34 AM, RaghavPrabhu 
>>> wrote:
>>> >
>>> > Hi Vivek,
>>> >
>>> >  Have you figure out the problem of creating the data dir in wrong
>>> > location?
>>> >
>>> >  For me its working...
>>> >
>>> >   Just comment the data dir (in solrconfig.xml file) and create the core
>>> > via REST call. It should work!!!
>>> >
>>> > Thanks & regards
>>> > Prabhu.K
>>> >
>>> >
>>> >
>>> > vivek sar wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >>   I tried the latest nightly build (04-01-09) - it takes the dataDir
>>> >> property now, but it's creating the Data dir at the wrong location.
>>> >> For ex., I've the following in solr.xml,
>>> >>
>>> >> 
>>> >>     
>>> >>       >> >> dataDir="/Users/opal/temp/afterchat/solr/data/core0"/>
>>> >>     
>>> >> 
>>> >>
>>> >> but, it always seem to be creating the solr/data directory in the cwd
>>> >> (where I started the Tomcat from). Here is the log from Catalina.out,
>>> >>
>>> >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore 
>>> >> INFO: [core2] Opening new SolrCore at /Users/opal/temp/chat/solr/,
>>> >> dataDir=./solr/data/
>>> >> ..
>>> >> Apr 1, 2009 10:47:21 AM org.apache.solr.core.SolrCore initIndex
>>> >> WARNING: [core2] Solr index directory './solr/data/index' doesn't
>>> >> exist. Creating new index...
>>> >>
>>> >> I've also tried relative paths, but to no avail.
>>> >>
>>> >> Is this a bug?
>>> >>
>>> >> Thanks,
>>> >> -vivek
>>> >>
>>> >> On Wed, Apr 1, 2009 at 9:45 AM, vivek sar  wrote:
>>> >>> Thanks Shalin.
>>> >>>
>>> >>> Is it available in the latest nightly build?
>>> >>>
>>> >>> Is there any other way I can create cores dynamically (using CREATE
>>> >>> service) which will use the same schema.xml and solrconfig.xml, but
>>> >>> write to different data directories?
>>> >>>
>>> >>> Thanks,
>>> >>> -vivek
>>> >>>
>>> >>> On Wed, Apr 1, 2009 at 1:55 AM, Shalin Shekhar Mangar
>>> >>>  wrote:
>>> >>>> On Wed, Apr 1, 2009 at 1:48 PM, vivek sar  wrote:
>>> >>>>> I'm using the latest released one - Solr 1.3. The wiki says passing
>>> >>>>> dataDir to CREATE action (web service) should work, but that doesn't
>>> >>>>> seem to be working.
>>> >>>>>
>>> >>>>
>>> >>>> That is a Solr 1.4 feature (not released yet).
>>> >>>>
>>> >>>> --
>>> >>>> Regards,
>>> >>>> Shalin Shekhar Mangar.
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>> >
>>> > --
>>> > View this message in context:
>>> http://www.nabble.com/Defining-DataDir-in-Multi-Core-tp22818543p23611179.html
>>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>> >
>>> >
>>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Servlet filter for Solr

2009-06-09 Thread vivek sar

Hi,

  I've to intercept every request to solr (search and update) and log
some performance numbers. In order to do so I tried a Servlet filter
and added this to Solr's web.xml,

  
IndexFilter

com.xxx.index.filter.IndexRequestFilter

test-param
This parameter is for
testing.



IndexFilter
   
 SolrUpdate
 SolrServer


but, this doesn't seem to be working. Couple of questions,

1) What's wrong with my web.xml setting?
2) Is there any easier way to intercept calls to Solr without changing
its web.xml? Basically can I just change the solrconfig.xml to do so
(beside requesthandlers) so I don't have to customize the solr.war?

Thanks,
-vivek

Re: Servlet filter for Solr

2009-06-10 Thread vivek sar

I've tried both "url-pattern" (/*) and servlet-name in the filter
mapping , but none of it seem to intercept the call. If I put (/*)
only up to /solr gets intercepted. Since, I'm using multicore - calls
like /solr/core0 don't get intercepted. I want both select and update
to be monitored. Any ideas?

Thanks,
-vivek

2009/6/9 Noble Paul നോബിള്‍  नोब्ळ् :
> if you wish to intercept "read" calls ,a filter is the only way.
>
>
> On Wed, Jun 10, 2009 at 6:35 AM, vivek sar wrote:
>> Hi,
>>
>>  I've to intercept every request to solr (search and update) and log
>> some performance numbers. In order to do so I tried a Servlet filter
>> and added this to Solr's web.xml,
>>
>>          
>>                IndexFilter
>>
>> com.xxx.index.filter.IndexRequestFilter
>>                
>>                        test-param
>>                        This parameter is for
>> testing.
>>                
>>        
>>        
>>                IndexFilter
>>               
>>             SolrUpdate
>>             SolrServer
>
> I guess you canot put servlets in the filter mapping
>>        
>>
>> but, this doesn't seem to be working. Couple of questions,
>>
>> 1) What's wrong with my web.xml setting?
>> 2) Is there any easier way to intercept calls to Solr without changing
>> its web.xml? Basically can I just change the solrconfig.xml to do so
>> (beside requesthandlers) so I don't have to customize the solr.war?
>>
>> Thanks,
>> -vivek
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Boosting for most recent documents

2009-07-08 Thread vivek sar

Hi,

  I'm trying to find a way to get the most recent entry for the
searched word. For ex., if I have a document with field name "user".
If I search for user:vivek, I want to get the document that was
indexed most recently. Two ways I could think of,

1) Sort by some time stamp field - but with millions of documents this
becomes a huge memory problem as we have seen OOM with sorting before
2) Boost the most recent document - I'm not sure how to do this.
Basically, we want to have the most recent document score higher than
any other and then we can retrieve just 10 records and sort in the
application by time stamp field to get the most recent document
matching the keyword.

Any suggestion on how can this be done?

Thanks,
-vivek

Re: Boosting for most recent documents

2009-07-09 Thread vivek sar

Thanks Otis. I got a distributed index - using Solr multi-core.
Basically, I got 6 indexer instances running on 3 different boxes.
Couple of questions,

1)  Is it possible to sort on document id for multiple-shards? How is that done?
2) How would boost by most recent doc at index time?

Thanks,
-vivek



On Wed, Jul 8, 2009 at 7:47 PM, Otis
Gospodnetic wrote:
>
> Sort by the internal Lucene document ID and pick the highest one.  That might 
> do the job for you.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user 
>> Sent: Wednesday, July 8, 2009 8:34:16 PM
>> Subject: Boosting for most recent documents
>>
>> Hi,
>>
>>   I'm trying to find a way to get the most recent entry for the
>> searched word. For ex., if I have a document with field name "user".
>> If I search for user:vivek, I want to get the document that was
>> indexed most recently. Two ways I could think of,
>>
>> 1) Sort by some time stamp field - but with millions of documents this
>> becomes a huge memory problem as we have seen OOM with sorting before
>> 2) Boost the most recent document - I'm not sure how to do this.
>> Basically, we want to have the most recent document score higher than
>> any other and then we can retrieve just 10 records and sort in the
>> application by time stamp field to get the most recent document
>> matching the keyword.
>>
>> Any suggestion on how can this be done?
>>
>> Thanks,
>> -vivek
>
>

Re: Boosting for most recent documents

2009-07-09 Thread vivek sar

How do we sort by internal doc id (say on one index only) using Solr?
I saw couple of threads saying it (Sort.INDEXORDER) was not supported
in Solr,

http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009
http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032

Has the index order support been added in Solr 1.4? How do we use that
- any documentation?

Thanks,
-vivek

On Thu, Jul 9, 2009 at 2:21 PM, Otis
Gospodnetic wrote:
>
> Ah, with multiple indices you can't rely on the max Lucene doc Id.  I think 
> you have to do with the timestamp approach.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, July 9, 2009 1:13:54 PM
>> Subject: Re: Boosting for most recent documents
>>
>> Thanks Otis. I got a distributed index - using Solr multi-core.
>> Basically, I got 6 indexer instances running on 3 different boxes.
>> Couple of questions,
>>
>> 1)  Is it possible to sort on document id for multiple-shards? How is that 
>> done?
>> 2) How would boost by most recent doc at index time?
>>
>> Thanks,
>> -vivek
>>
>>
>>
>> On Wed, Jul 8, 2009 at 7:47 PM, Otis
>> Gospodneticwrote:
>> >
>> > Sort by the internal Lucene document ID and pick the highest one.  That 
>> > might
>> do the job for you.
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user
>> >> Sent: Wednesday, July 8, 2009 8:34:16 PM
>> >> Subject: Boosting for most recent documents
>> >>
>> >> Hi,
>> >>
>> >>   I'm trying to find a way to get the most recent entry for the
>> >> searched word. For ex., if I have a document with field name "user".
>> >> If I search for user:vivek, I want to get the document that was
>> >> indexed most recently. Two ways I could think of,
>> >>
>> >> 1) Sort by some time stamp field - but with millions of documents this
>> >> becomes a huge memory problem as we have seen OOM with sorting before
>> >> 2) Boost the most recent document - I'm not sure how to do this.
>> >> Basically, we want to have the most recent document score higher than
>> >> any other and then we can retrieve just 10 records and sort in the
>> >> application by time stamp field to get the most recent document
>> >> matching the keyword.
>> >>
>> >> Any suggestion on how can this be done?
>> >>
>> >> Thanks,
>> >> -vivek
>> >
>> >
>
>

Re: Boosting for most recent documents

2009-07-10 Thread vivek sar

Thanks Bill. Couple of questions,

1) Would the function query load all unique terms (for that field) in
memory the way sort (field cache) does? If so, that wouldn't work for
us as we can have over 5 billion records spread across multiple shards
(up to 10 indexer instances), that would surely kill the process if it
were to load everything in memory.

2) Would the function query work on multi-shard query? For ex.,
recip(rord(creationDate),1,1000,1000) would it automatically do the
function on the combined result from all the shards or would it run on
individual shard and get results from them?

I would still be interested in knowing if Solr supports
Sort.IndexOrder - if so, how?

Thanks,
-vivek

On Thu, Jul 9, 2009 at 8:27 PM, Bill Au wrote:
> With a time stamp you can use a function query to boost the score of newer
> documents:
> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd
>
> Bill
>
> On Thu, Jul 9, 2009 at 5:58 PM, vivek sar  wrote:
>
>> How do we sort by internal doc id (say on one index only) using Solr?
>> I saw couple of threads saying it (Sort.INDEXORDER) was not supported
>> in Solr,
>>
>>
>> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009
>>
>> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032
>>
>> Has the index order support been added in Solr 1.4? How do we use that
>> - any documentation?
>>
>> Thanks,
>> -vivek
>>
>> On Thu, Jul 9, 2009 at 2:21 PM, Otis
>> Gospodnetic wrote:
>> >
>> > Ah, with multiple indices you can't rely on the max Lucene doc Id.  I
>> think you have to do with the timestamp approach.
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Thursday, July 9, 2009 1:13:54 PM
>> >> Subject: Re: Boosting for most recent documents
>> >>
>> >> Thanks Otis. I got a distributed index - using Solr multi-core.
>> >> Basically, I got 6 indexer instances running on 3 different boxes.
>> >> Couple of questions,
>> >>
>> >> 1)  Is it possible to sort on document id for multiple-shards? How is
>> that done?
>> >> 2) How would boost by most recent doc at index time?
>> >>
>> >> Thanks,
>> >> -vivek
>> >>
>> >>
>> >>
>> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis
>> >> Gospodneticwrote:
>> >> >
>> >> > Sort by the internal Lucene document ID and pick the highest one.
>>  That might
>> >> do the job for you.
>> >> >
>> >> > Otis
>> >> > --
>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >> >
>> >> >
>> >> >
>> >> > - Original Message 
>> >> >> From: vivek sar
>> >> >> To: solr-user
>> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM
>> >> >> Subject: Boosting for most recent documents
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>   I'm trying to find a way to get the most recent entry for the
>> >> >> searched word. For ex., if I have a document with field name "user".
>> >> >> If I search for user:vivek, I want to get the document that was
>> >> >> indexed most recently. Two ways I could think of,
>> >> >>
>> >> >> 1) Sort by some time stamp field - but with millions of documents
>> this
>> >> >> becomes a huge memory problem as we have seen OOM with sorting before
>> >> >> 2) Boost the most recent document - I'm not sure how to do this.
>> >> >> Basically, we want to have the most recent document score higher than
>> >> >> any other and then we can retrieve just 10 records and sort in the
>> >> >> application by time stamp field to get the most recent document
>> >> >> matching the keyword.
>> >> >>
>> >> >> Any suggestion on how can this be done?
>> >> >>
>> >> >> Thanks,
>> >> >> -vivek
>> >> >
>> >> >
>> >
>> >
>>
>

Re: Boosting for most recent documents

2009-07-15 Thread vivek sar

Hi,

  Does anyone know if Solr supports sorting by internal document ids,
i.e, like Sort.INDEXORDER in Lucene? If so, how?

Also, if anyone have any insight on if function query loads up unique
terms (like field sorts) in memory or not.

Thanks,
-vivek

On Fri, Jul 10, 2009 at 10:26 AM, vivek sar wrote:
> Thanks Bill. Couple of questions,
>
> 1) Would the function query load all unique terms (for that field) in
> memory the way sort (field cache) does? If so, that wouldn't work for
> us as we can have over 5 billion records spread across multiple shards
> (up to 10 indexer instances), that would surely kill the process if it
> were to load everything in memory.
>
> 2) Would the function query work on multi-shard query? For ex.,
> recip(rord(creationDate),1,1000,1000) would it automatically do the
> function on the combined result from all the shards or would it run on
> individual shard and get results from them?
>
> I would still be interested in knowing if Solr supports
> Sort.IndexOrder - if so, how?
>
> Thanks,
> -vivek
>
> On Thu, Jul 9, 2009 at 8:27 PM, Bill Au wrote:
>> With a time stamp you can use a function query to boost the score of newer
>> documents:
>> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd
>>
>> Bill
>>
>> On Thu, Jul 9, 2009 at 5:58 PM, vivek sar  wrote:
>>
>>> How do we sort by internal doc id (say on one index only) using Solr?
>>> I saw couple of threads saying it (Sort.INDEXORDER) was not supported
>>> in Solr,
>>>
>>>
>>> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009
>>>
>>> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032
>>>
>>> Has the index order support been added in Solr 1.4? How do we use that
>>> - any documentation?
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Thu, Jul 9, 2009 at 2:21 PM, Otis
>>> Gospodnetic wrote:
>>> >
>>> > Ah, with multiple indices you can't rely on the max Lucene doc Id.  I
>>> think you have to do with the timestamp approach.
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Message 
>>> >> From: vivek sar 
>>> >> To: solr-user@lucene.apache.org
>>> >> Sent: Thursday, July 9, 2009 1:13:54 PM
>>> >> Subject: Re: Boosting for most recent documents
>>> >>
>>> >> Thanks Otis. I got a distributed index - using Solr multi-core.
>>> >> Basically, I got 6 indexer instances running on 3 different boxes.
>>> >> Couple of questions,
>>> >>
>>> >> 1)  Is it possible to sort on document id for multiple-shards? How is
>>> that done?
>>> >> 2) How would boost by most recent doc at index time?
>>> >>
>>> >> Thanks,
>>> >> -vivek
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis
>>> >> Gospodneticwrote:
>>> >> >
>>> >> > Sort by the internal Lucene document ID and pick the highest one.
>>>  That might
>>> >> do the job for you.
>>> >> >
>>> >> > Otis
>>> >> > --
>>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >> >
>>> >> >
>>> >> >
>>> >> > - Original Message 
>>> >> >> From: vivek sar
>>> >> >> To: solr-user
>>> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM
>>> >> >> Subject: Boosting for most recent documents
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >>   I'm trying to find a way to get the most recent entry for the
>>> >> >> searched word. For ex., if I have a document with field name "user".
>>> >> >> If I search for user:vivek, I want to get the document that was
>>> >> >> indexed most recently. Two ways I could think of,
>>> >> >>
>>> >> >> 1) Sort by some time stamp field - but with millions of documents
>>> this
>>> >> >> becomes a huge memory problem as we have seen OOM with sorting before
>>> >> >> 2) Boost the most recent document - I'm not sure how to do this.
>>> >> >> Basically, we want to have the most recent document score higher than
>>> >> >> any other and then we can retrieve just 10 records and sort in the
>>> >> >> application by time stamp field to get the most recent document
>>> >> >> matching the keyword.
>>> >> >>
>>> >> >> Any suggestion on how can this be done?
>>> >> >>
>>> >> >> Thanks,
>>> >> >> -vivek
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>

86 matches

Mail list logo