date:20130131

long QTime for big index

2013-01-31 Thread Mou

I am running solr 3.4 on tomcat 7.

Our index is very big , two cores each 120G. We are searching the slaves
which are replicated every 30 min.
 I am using filtercache only and We have more than 90% cache hits. We use
lot of filter queries, queries are usually pretty big with 10-20 fq
parameters. Not all filters are cached.

we are searching three shards and query looks like this --
shards=core1,core2,core3&q=*:* &fq=field1:some value&fq = -field2=some
value&sort=date 
But some queries are taking more than 30 sec to return result and the
behavior is intermittent. I can not find relation to replication. We are
using Zing jvm which reduced our GC pause to milli secs, so GC is not a
problem.

How can I improve the qtime? Is it at all possible to get a better qtime
given our index size?

Thank you for your suggestion.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Fwd: advice about develop AbstractSolrEventListener.

2013-01-31 Thread Miguel


Hi

  After to study apache solr documentation, I think only way to know 
update records (modify, delete an insert actions) is developed a class 
extends org.apache.solr.servlet.SolrUpdateServlet.
In this class, I can access updated record information go into Apache 
solr server.


Somebody can confirm me, that this way is the best way? or is there any 
options?


thanks

El 30/01/2013 13:39, Miguel escribió:


Hi

I have to developed a function that must comunicate with webservice and
this function must execute after each time commits.
My doubt;
it's possible get that records had been updated on solr index?
My function must send information about add, updated and delete records
from solr index to external webservice, and this information must be
send after commit event.

I have read wiki apache solr and it seems the best way is create
listener with event=postCommit, but I have seen example
"solr.RunExecutableListener" and I don't see how to know records
associated to commit event.

Example Solrconfig.xml:


 


Thanks.

How to use SolrCloud in multi-threaded indexing

2013-01-31 Thread andy

Hi, 

I am going to upgrade to solr 4.1 from version 3.6, and I want to set up to
shards.
I use ConcurrentUpdateSolrServer to index the documents in solr3.6.
I saw the api CloudSolrServer in 4.1,BUT
1:CloudSolrServer use the LBHttpSolrServer to issue requests,but "*
LBHttpSolrServer  should NOT be used for indexing *" documented in the api 
http://lucene.apache.org/solr/4_1_0/solr-solrj/index.html
  
2:it seems CloudSolrServer does not support multi thread indexing 

So, how to do multi-threaded indexing in solr 4.1?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrCloud-in-multi-threaded-indexing-tp4037641.html
Sent from the Solr - User mailing list archive at Nabble.com.

searching for an id

2013-01-31 Thread b.riez...@pixel-ink.de

Hi

I have an id wich is a string like this.
tx-20130130-4599

i'm using a field without processing, wich i got confirmed via the analyser tool
But when i search for that it got split up, so instead of finding that specific 
entry with that unique id,
it finds all entries with "tx" in it.

Any idea how to get rid of that behavior?

Best
Ben

RE: Indexing problems

2013-01-31 Thread GASPARD Joel

Hello,

After more tests, we could identify our problem in indexation (Solr 4.0.0).
Indeed our problems are OutOfMemoryErrors. Thinking about Zookeeper connection 
problems was a mistake. We have thought about this because OOME sometimes 
appear in logs after errors on Zookeeper leader election.

Indexing fails when we define several Solr schemas in Zookeeper.
When we define a single schema, indexation works well. It has been tested with 
a single Solr node in the cluster, or with two Solr nodes.
We are facing problems when we upload several configurations in Zookeeper : we 
can create an index for a single collection, but OutOfMemoryErrors are thrown 
when we try to create an index for a second collection with another schema.
Garbage collect logs show a rapid increase of memory consumption, then 
OutOfMemory errors.

Can we define a distinct schema for each collection ?

Thanks !

Joel Gaspard



De : GASPARD Joel [mailto:joel.gasp...@cegedim.com]
Envoyé : mardi 22 janvier 2013 16:30
À : solr-user@lucene.apache.org
Objet : Indexing problems

Hello,

We are facing some problems when indexing with Solr 4.0.0 with more than one 
server node and we can't find a way to solve them.
We have 2 nodes of Solr Cloud instances.
They are running in a Zookeeper ensemble (3.4.4 version) with 3 servers 
(another application is deployed on the third server).
We try to index a collection with 1 shard stored in the 2 nodes.
2 other collections with an only shard have already been indexed. The logs for 
this first indexing have been lost but maybe there was a single Solr node when 
the indexing has been made. Each collection contains about 3.000.000 documents 
(16 Go).

When we start adding documents, failures occur very fast, after maybe 2000 
documents, and the solr servers cannot be accessed anymore.
I add to this mail an attachment containing a part of the logs.

When we use Solr Cloud with only one node in a single zookeeper ensemble, we 
don't encounter any problem.



Some precisions on our configuration :
We send about 400 documents per minute.
The documents are added in Solr by two threads on our application, using the 
CloudSolrServer class.
These threads don't call the commit method. We use only the solr config to 
commit. The solrconfig.xml defines for now :
15000false
No soft commit
We have also tried :
60false
1000

The Solr servers are launched with these options :
-Xmx12G -Xms4G
-XX:MaxPermSize=256m -XX:MaxNewSize=356m
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC
-XX:+CMSClassUnloadingEnabled
-XX:MinHeapFreeRatio=10
-XX:MaxHeapFreeRatio=25
-DzkHost=server1:2188,server2:2188,server3:2188

The solr.xml contains zkClientTimeout="6" and zoo.cfg defines a ticktime of 
3000 ms.

The Solr servers on which we are facing some problems contain old collections 
and old cores created for some tests.



Could you give some indications to me ?
Is this a problem in our solr or zookeeper config ?
How could we detect network problems ?
Is there a problem with the VM parameters ? Should we analyse some garbage 
collect logs ?

Thanks in advance.

Joel Gaspard

Re: long QTime for big index

2013-01-31 Thread Dmitry Kan

Does debugQuery=true tell anything useful for these? Like what is the
component taking most of the 30 seconds. Do you have evictions in your solr
caches?

Dmitry

On Thu, Jan 31, 2013 at 10:01 AM, Mou  wrote:

> I am running solr 3.4 on tomcat 7.
>
> Our index is very big , two cores each 120G. We are searching the slaves
> which are replicated every 30 min.
>  I am using filtercache only and We have more than 90% cache hits. We use
> lot of filter queries, queries are usually pretty big with 10-20 fq
> parameters. Not all filters are cached.
>
> we are searching three shards and query looks like this --
> shards=core1,core2,core3&q=*:* &fq=field1:some value&fq = -field2=some
> value&sort=date
> But some queries are taking more than 30 sec to return result and the
> behavior is intermittent. I can not find relation to replication. We are
> using Zing jvm which reduced our GC pause to milli secs, so GC is not a
> problem.
>
> How can I improve the qtime? Is it at all possible to get a better qtime
> given our index size?
>
> Thank you for your suggestion.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Question on Facet field constraints sort order

2013-01-31 Thread vijeshnair

It could be a foolish question or concern, but I have no option :-) . We do
have an e-com site where we consuming the feed from the CSE partners and
indexing it in to SOLR for our search. Instead of the traditional
auto-suggest, the predictive search in the header search box recommends the
categories(category facet) for which it found the matching for the given
keyword. With this approach for a search like "apple iphone" will yield more
results for "cell phone accessories" than "Cell phone", hence in the drop
down cell phone accessories will come first then the cell phone. Which is
quiet natural and works as expected as we have the default sorting for facet
constraints as "count". Today my boss "tech director" was asking me to tweak
this order, i.e. business team will prioritize the whole 1300 categories
which is available today in my taxonomy in some order, then my category
facet constraint's order should be based on the order that they are
providing to us. He was telling me in Oracle Endeca it is possible, where he
was showing me to change the order of category etc, mean to say any sort of
customization to change the order etc, so check if SOLR supports. Though my
answer was no to that, he was proposing to handle this in the code other
wise, i.e. change the order in the client side. So the intention of writing
this is to check whether there are any such options available in SOLR or
not. I understand the two types of sorting which is available i.e count and
index, are there some thing beyond? where I can alter this using an external
list or some thing like that. Any help will be appreciated.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-on-Facet-field-constraints-sort-order-tp4037647.html
Sent from the Solr - User mailing list archive at Nabble.com.

48 matches

Mail list logo