Re: SolrJ Socket Leak

2014-02-17 Thread Kiran Chitturi
Jared,

I faced a similar issue when using CloudSolrServer with Solr. As Shawn
pointed out the 'TIME_WAIT' status happens when the connection is closed
by the http client. HTTP client closes connection whenever it thinks the
connection is stale
(https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
#d5e405). Even the docs point out the stale connection checking cannot be
all reliable. 

I see two ways to get around this:

1. Enable 'SO_REUSEADDR'
2. Disable stale connection checks.

Also by default, when we create CSS it does not explicitly configure any
http client parameters
(https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a
pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the
default configuration parameters (max connections, max connections per
host) are used for a http connection. You can explicitly configure these
params when creating CSS using HttpClientUtil:

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3);
httpClient = HttpClientUtil.createClient(params);

final HttpClient client = HttpClientUtil.createClient(params);
LBHttpSolrServer lb = new LBHttpSolrServer(client);
CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


Currently, I am using http client 4.3.2 and building the client when
creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the
'TIME_WAIT'  after this (may be because of better handling of stale
connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My
current http client code looks like this: (works only with http client
4.3.2)

HttpClientBuilder httpBuilder = HttpClientBuilder.create();

Builder socketConfig =  SocketConfig.custom();
socketConfig.setSoReuseAddress(true);
socketConfig.setSoTimeout(1);
httpBuilder.setDefaultSocketConfig(socketConfig.build());
httpBuilder.setMaxConnTotal(300);
httpBuilder.setMaxConnPerRoute(100);

httpBuilder.disableRedirectHandling();
httpBuilder.useSystemProperties();
LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser)
CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


There should be a way to configure socket reuse with 4.2.3 too. You can
try different configurations. I am surprised you have 'TIME_WAIT'
connections even after 30 minutes because 'TIME_WAIT' connection should be
closed by default in 2 mins by O.S I think.


HTH,

-- 
Kiran Chitturi,


On 2/13/14 12:38 PM, "Jared Rodriguez"  wrote:

>I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part
>of a web application which connects to the solr server via solrj
>using CloudSolrServer();  The web application is wired up with Guice, and
>there is a single instance of the CloudSolrServer class used by all
>inbound
>requests.  All this is running on Amazon.
>
>Basically, everything looks and runs fine for a while, but even with
>moderate concurrency, solrj starts leaving sockets open.  We are handling
>only about 250 connections to the web app per minute and each of these
>issues from 3 - 7 requests to solr.  Over a 30 minute period of this type
>of use, we end up with many 1000s of lingering sockets.  I can see these
>when running netstats
>
>tcp0  0 ip-10-80-14-26.ec2.in:41098
>ip-10-99-145-47.ec2.i:glrpc
>TIME_WAIT
>
>All to the same target host, which is my solr server. There are no other
>pieces of infrastructure on that box, just solr.  Eventually, the server
>just dies as no further sockets can be opened and the opened ones are not
>reused.
>
>The solr server itself is unphased and running like a champ.  Average
>timer
>per request of 0.126, as seen in the solr web app admin UI query handler
>stats.
>
>Apache httpclient had a bunch of leakage from version 4.2.x that they
>cleaned up and refactored in 4.3.x, which is why I upgraded.  Currently,
>solrj makes use of the old leaky 4.2 classes for establishing connections
>and using a connection pool.
>
>http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.3.x.t
>xt
>
>
>
>-- 
>Jared Rodriguez



Re: DIH

2014-02-17 Thread Mikhail Khludnev
On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey  wrote:

> On 2/14/2014 10:45 PM, William Bell wrote:
> > On virtual cores the DIH handler is really slow. On a 12 core box it only
> > uses 1 core while indexing.
> >
> > Does anyone know how to do Java threading from a SQL query into Solr?
> > Examples?
> >
> > I can use SolrJ to do it, or I might be able to modify DIH to enable
> > threading.
> >
> > At some point in 3.x threading was enabled in DIH, but it was removed
> since
> > people where having issues with it (we never did).
>
> If you know how to fix DIH so it can do multiple indexing threads
> safely, please open an issue and upload a patch.
>
Please! Don't do it. Never again!
https://issues.apache.org/jira/browse/SOLR-3011

As far as I understand the general idea is to find the DIH successor
https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424


>
> I'm still using DIH for full rebuilds, but I'd actually like to replace
> it with a rebuild routine written in SolrJ.  I currently achieve decent
> speed by running DIH on all my shards at the same time.
>
> I do use SolrJ for once-a-minute index maintenance, but the code that
> I've written to pull data out of SQL and write it to Solr is not able to
> index millions of documents in a single thread as fast as DIH does.  I
> have been building a multithreaded design in my head, but I haven't had
> a chance to write real code and see whether it's actually a good design.
>
> For me, the bottleneck is definitely Solr, not the database.  I recently
> wrote a test program that uses my current SolrJ indexing method.  If I
> skip the "server.add(docs)" line, it can read all 91 million docs from
> the database and build SolrInputDocument objects for them in 2.5 hours
> or less, all with a single thread.  When I do a real rebuild with DIH,
> it takes a little more than 4.5 hours -- and that is inherently
> multithreaded, because it's doing all the shards simultaneously.  I have
> no idea how long it would take with a single-threaded SolrJ program.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Solr index filename doesn't match with solr vesion

2014-02-17 Thread Nguyen Manh Tien
Thanks Shawn, Tri for your infos, explanation.
Tien


On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao  wrote:

> Lucene main file formats actually don't change a lot in 4.x (or even 5.x),
> and the newer codecs just delegate to previous versions for most file
> types. The newer file types don't typically include Lucene's version in
> file names.
>
> For example, Lucene 4.6 codes basically delegate stored fields and term
> vector file format to 4.1, doc format to 4.0, etc. and only implement the
> new segment info/fields info formats (the .si and .fnm files).
>
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50
>
> Hope this helps,
> Tri
>
>
> On Feb 16, 2014, at 08:52 PM, Shawn Heisey  wrote:
>
> On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote:
>
> I upgraded recently from solr 4.0 to solr 4.6,
>
> I check solr index folder and found there file
>
> _aars_*Lucene41*_0.doc
>
> _aars_*Lucene41*_0.pos
>
> _aars_*Lucene41*_0.tim
>
> _aars_*Lucene41*_0.tip
>
> I don't know why it don't have *Lucene46* in file name.
>
>
> This is an indication that this part of the index is using a file format
> introduced in Lucene 4.1.
>
> Here's what I have for one of my index segments on a Solr 4.6.1 server:
>
> _5s7_2h.del
> _5s7.fdt
> _5s7.fdx
> _5s7.fnm
> _5s7_Lucene41_0.doc
> _5s7_Lucene41_0.pos
> _5s7_Lucene41_0.tim
> _5s7_Lucene41_0.tip
> _5s7_Lucene45_0.dvd
> _5s7_Lucene45_0.dvm
> _5s7.nvd
> _5s7.nvm
> _5s7.si
> _5s7.tvd
> _5s7.tvx
>
> It shows the same pieces as your list, but I am also using docValues in
> my index, and those files indicate that they are using the format from
> Lucene 4.5. I'm not sure why there are not version numbers in *all* of
> the file extensions -- that happens in the Lucene layer, which is a bit
> of a mystery to me.
>
> Thanks,
> Shawn
>
>


Re: DIH

2014-02-17 Thread Alexandre Rafalovitch
There has been a couple of discussions to find DIH successor
(including on HelioSearch list), but no real momentum as far as I can
tell.

I think somebody will have to really pitch in and do the same couple
of scenarios DIH does in several different frameworks (TodoMVC style).
That should get it going.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 17, 2014 at 7:40 PM, Mikhail Khludnev
 wrote:
> On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey  wrote:
>
>> On 2/14/2014 10:45 PM, William Bell wrote:
>> > On virtual cores the DIH handler is really slow. On a 12 core box it only
>> > uses 1 core while indexing.
>> >
>> > Does anyone know how to do Java threading from a SQL query into Solr?
>> > Examples?
>> >
>> > I can use SolrJ to do it, or I might be able to modify DIH to enable
>> > threading.
>> >
>> > At some point in 3.x threading was enabled in DIH, but it was removed
>> since
>> > people where having issues with it (we never did).
>>
>> If you know how to fix DIH so it can do multiple indexing threads
>> safely, please open an issue and upload a patch.
>>
> Please! Don't do it. Never again!
> https://issues.apache.org/jira/browse/SOLR-3011
>
> As far as I understand the general idea is to find the DIH successor
> https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424
>
>
>>
>> I'm still using DIH for full rebuilds, but I'd actually like to replace
>> it with a rebuild routine written in SolrJ.  I currently achieve decent
>> speed by running DIH on all my shards at the same time.
>>
>> I do use SolrJ for once-a-minute index maintenance, but the code that
>> I've written to pull data out of SQL and write it to Solr is not able to
>> index millions of documents in a single thread as fast as DIH does.  I
>> have been building a multithreaded design in my head, but I haven't had
>> a chance to write real code and see whether it's actually a good design.
>>
>> For me, the bottleneck is definitely Solr, not the database.  I recently
>> wrote a test program that uses my current SolrJ indexing method.  If I
>> skip the "server.add(docs)" line, it can read all 91 million docs from
>> the database and build SolrInputDocument objects for them in 2.5 hours
>> or less, all with a single thread.  When I do a real rebuild with DIH,
>> it takes a little more than 4.5 hours -- and that is inherently
>> multithreaded, because it's doing all the shards simultaneously.  I have
>> no idea how long it would take with a single-threaded SolrJ program.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  


Re: DIH

2014-02-17 Thread Ahmet Arslan
Hi Mikhail,

Can you please elaborate what do you mean? 
My understanding is that there is no multi-threading support in DIH. For some 
reasons, it won't have. Am I correct?

Regarding apache flume, how it can be dih replacement? Can I index rich 
documents on my disk using flume? Can I fetch documents from 
wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it?

Ahmet



On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev 
 wrote:
On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey  wrote:

> On 2/14/2014 10:45 PM, William Bell wrote:
> > On virtual cores the DIH handler is really slow. On a 12 core box it only
> > uses 1 core while indexing.
> >
> > Does anyone know how to do Java threading from a SQL query into Solr?
> > Examples?
> >
> > I can use SolrJ to do it, or I might be able to modify DIH to enable
> > threading.
> >
> > At some point in 3.x threading was enabled in DIH, but it was removed
> since
> > people where having issues with it (we never did).
>
> If you know how to fix DIH so it can do multiple indexing threads
> safely, please open an issue and upload a patch.
>
Please! Don't do it. Never again!
https://issues.apache.org/jira/browse/SOLR-3011

As far as I understand the general idea is to find the DIH successor
https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424



>
> I'm still using DIH for full rebuilds, but I'd actually like to replace
> it with a rebuild routine written in SolrJ.  I currently achieve decent
> speed by running DIH on all my shards at the same time.
>
> I do use SolrJ for once-a-minute index maintenance, but the code that
> I've written to pull data out of SQL and write it to Solr is not able to
> index millions of documents in a single thread as fast as DIH does.  I
> have been building a multithreaded design in my head, but I haven't had
> a chance to write real code and see whether it's actually a good design.
>
> For me, the bottleneck is definitely Solr, not the database.  I recently
> wrote a test program that uses my current SolrJ indexing method.  If I
> skip the "server.add(docs)" line, it can read all 91 million docs from
> the database and build SolrInputDocument objects for them in 2.5 hours
> or less, all with a single thread.  When I do a real rebuild with DIH,
> it takes a little more than 4.5 hours -- and that is inherently
> multithreaded, because it's doing all the shards simultaneously.  I have
> no idea how long it would take with a single-threaded SolrJ program.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics






Re: DIH

2014-02-17 Thread Alexandre Rafalovitch
I haven't tried Apache Flume but the manual seems to suggest 'yes' to
a large number of your checklist items:
http://flume.apache.org/FlumeUserGuide.html

When you say 'rich document' indexing, the keyword you are looking for
is (Apache) Tika, as that's what actually doing the job under the
covers.

Whether it can replicate your specific requirements, is a question
only you can answer for yourself of course. When you do, maybe let us
know, so we can learn too. :-)

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 17, 2014 at 8:11 PM, Ahmet Arslan  wrote:
> Hi Mikhail,
>
> Can you please elaborate what do you mean?
> My understanding is that there is no multi-threading support in DIH. For some 
> reasons, it won't have. Am I correct?
>
> Regarding apache flume, how it can be dih replacement? Can I index rich 
> documents on my disk using flume? Can I fetch documents from 
> wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it?
>
> Ahmet
>
>
>
> On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev 
>  wrote:
> On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey  wrote:
>
>> On 2/14/2014 10:45 PM, William Bell wrote:
>> > On virtual cores the DIH handler is really slow. On a 12 core box it only
>> > uses 1 core while indexing.
>> >
>> > Does anyone know how to do Java threading from a SQL query into Solr?
>> > Examples?
>> >
>> > I can use SolrJ to do it, or I might be able to modify DIH to enable
>> > threading.
>> >
>> > At some point in 3.x threading was enabled in DIH, but it was removed
>> since
>> > people where having issues with it (we never did).
>>
>> If you know how to fix DIH so it can do multiple indexing threads
>> safely, please open an issue and upload a patch.
>>
> Please! Don't do it. Never again!
> https://issues.apache.org/jira/browse/SOLR-3011
>
> As far as I understand the general idea is to find the DIH successor
> https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424
>
>
>
>>
>> I'm still using DIH for full rebuilds, but I'd actually like to replace
>> it with a rebuild routine written in SolrJ.  I currently achieve decent
>> speed by running DIH on all my shards at the same time.
>>
>> I do use SolrJ for once-a-minute index maintenance, but the code that
>> I've written to pull data out of SQL and write it to Solr is not able to
>> index millions of documents in a single thread as fast as DIH does.  I
>> have been building a multithreaded design in my head, but I haven't had
>> a chance to write real code and see whether it's actually a good design.
>>
>> For me, the bottleneck is definitely Solr, not the database.  I recently
>> wrote a test program that uses my current SolrJ indexing method.  If I
>> skip the "server.add(docs)" line, it can read all 91 million docs from
>> the database and build SolrInputDocument objects for them in 2.5 hours
>> or less, all with a single thread.  When I do a real rebuild with DIH,
>> it takes a little more than 4.5 hours -- and that is inherently
>> multithreaded, because it's doing all the shards simultaneously.  I have
>> no idea how long it would take with a single-threaded SolrJ program.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> >
>


Re: Solr Load Testing Issues

2014-02-17 Thread Annette Newton
Sorry I didn't make myself clear.  I have 20 machines in the configuration,
each shard/replica is on it's own machine.


On 14 February 2014 19:44, Shawn Heisey  wrote:

> On 2/14/2014 5:28 AM, Annette Newton wrote:
> > Solr Version: 4.3.1
> > Number Shards: 10
> > Replicas: 1
> > Heap size: 15GB
> > Machine RAM: 30GB
> > Zookeeper timeout: 45 seconds
> >
> > We are continuing the fight to keep our solr setup functioning.  As a
> > result of this we have made significant changes to our schema to reduce
> the
> > amount of data we write.  I setup a new cluster to reindex our data,
> > initially I ran the import with no replicas, and achieved quite
> impressive
> > results.  Our peak was 60,000 new documents per minute, no shard loses,
> no
> > outages due to garbage collection (which is an issue we see in
> production),
> > at the end of the load the index stood at 97,000,000 documents and 20GB
> per
> > shard.  During the highest insertion rate I would say that querying
> > suffered, but that is not of concern right now.
>
> Solr 4.3.1 has a number of problems when it comes to large clouds.
> Upgrading to 4.6.1 would be strongly advisable, but that's only
> something to try after looking into the rest of what I have to say.
>
> If I read what you've written correctly, you are running all this on one
> machine.  To put it bluntly, this isn't going to work well unless you
> put a LOT more memory into that machine.
>
> For good performance, Solr relies on the OS disk cache, because reading
> from the disk is VERY expensive in terms of time.  The OS will
> automatically use RAM that's not being used for other purposes for the
> disk cache, so that it can avoid reading off the disk as much as possible.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Below is a summary of what that Wiki page says, with your numbers as I
> understand them.  If I am misunderstanding your numbers, then this
> advice may need adjustment.  Note that when I see "one replica" I take
> that to mean replicationFactor=1, so there is only one copy of the
> index.  If you actually mean that you have *two* copies, then you have
> twice as much data as I've indicated below, and your requirements will
> be even larger:
>
> With ten shards that are each 20GB in size, your total index size is
> 200GB.  With 15 GB of heap, your ideal memory size for that server would
> be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB
> index into RAM.
>
> In reality you probably don't need that much, but it's likely that you
> would need at least half the index to fit into RAM at any one moment,
> which adds up to 115GB.  If you're prepared to deal with
> moderate-to-severe performance problems, you **MIGHT** be able to get
> away with only 25% of the index fitting into RAM, which still requires
> 65GB of RAM, but with SolrCloud, such performance problems usually mean
> that the cloud won't be stable, so it's not advisable to even try it.
>
> One of the bits of advice on the wiki page is to split your index into
> shards and put it on more machines, which drops the memory requirements
> for each machine.  You're already using a multi-shard SolrCloud, so you
> probably just need more hardware.  If you had one 20GB shard on a
> machine with 30GB of RAM, you could probably use a heap size of 4-8GB
> per machine and have plenty of RAM left over to cache the index very
> well.  You could most likely add another 50% to the index size and still
> be OK.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com *

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety
I have configured solrcloud as follows,
 

Solr.xml:

  


  


I  have added all the required config for solrcloud, referred this :
http://wiki.apache.org/solr/SolrCloud#Required_Config

I am adding data to core:document.
Now when i try to index using solrnet, (solr.Add(doc)) , i get this error :
SEVERE: org.apache.solr.common.SolrException: *No registered leader was
found, collection:document* slice:shard2
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481)

and this error also:
SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed*
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520)

I guess, it is because the leader is from core:contract and i am trying to
index in core:document?
Is there a way to change the leader, and how ?
How can i change the state of shards from gone to active?

Also when i try to query : q=*:* , this is shown
org.apache.solr.common.SolrException: *Error opening new searcher at*
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at 

I read that, if number of commits exceed then this searcher error comes, but
i did not issue commit command,then how will the commit exceed. Also it
requires some warming setting, so i added this to solrconfig.xml, but still
i get the same error,


 
  
 solr
  0
  10

 rocks
  0
  10

  

2


I have just started with solrcloud, please tell if I am doing anything wrong
in solrcloud configurations.
Also i did not good material for solrcloud in windows 7 with apache tomcat ,
please suggest for that too.
Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud how to spread out to multiple nodes

2014-02-17 Thread soodyogesh
Thanks, Im going to give this  a try



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326p4117728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8<--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField("id", "foo");
document.setField("locale", "en");
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField("id", "bar");
document.setField("locale", "en");
server.add(document);

server.commit();

SolrQuery query = new SolrQuery("*:*");
query.set("facet", "on");
query.set("facet.field", "locale");
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(2, en.getCount());

server.delete("foo");
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(1, en.getCount());
-->8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius


Solr Suggester not working in sharding (distributed search)

2014-02-17 Thread aniket potdar
I have two solr server (solr 4.5.1) which is running in shard..

I have implemented solr suggester using spellcheckComponent for
auto-suggester.

when i execute suggest url on individual core then the solr suggestion is
coming properly.

http://localhost:8986/solr/core1/suggest?spellcheck.q=city%20of and 
http://localhost:8987/solr/core1/suggest?spellcheck.q=city%20of

when i fired url with refence to solr wiki
(https://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support)
the result is not coming and below exception is occur.


URL :-
http://localhost:8986/solr/core1/select?shards=localhost:8986/solr/core1,localhost:8987/solr/core1&spellcheck.q=city%20of&shards.qt=%2Fsuggest&qt=suggest

java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:619)

 
for reference below is my schema.xml and solrconfig.xml entry



  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookupFactory
  sugg 
  true

  



  on
  suggest
  true
  10
  true


  suggest

   












i have unique filed, id which is store = true in schema.xml

can any one please suggest the solution.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-in-sharding-distributed-search-tp4117732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH

2014-02-17 Thread Mikhail Khludnev
On Mon, Feb 17, 2014 at 1:11 PM, Ahmet Arslan  wrote:

> My understanding is that there is no multi-threading support in DIH. For
> some reasons, it won't have. Am I correct?


threads parameter seems working in 3.6 or so, but was removed from 4.x as
causes a lot of instability.

Regarding apache flume, how it can be dih replacement? Can I index rich
> documents on my disk using flume? Can I fetch documents from
> wikipedia,jira,twitter,


I don't know Flume, and I'm even not ready to propose a DIH replacement
candidate.
I personally consider an old school ETL, 'cause I'm mostly interested in
joining RDBMS tables.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Ahmet Arslan
Hi Marius,

Facets are computed from indexed terms. Can you commit with expungeDeletes=true 
flag?

Ahmet



On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
 wrote:
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8<--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField("id", "foo");
document.setField("locale", "en");
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField("id", "bar");
document.setField("locale", "en");
server.add(document);

server.commit();

SolrQuery query = new SolrQuery("*:*");
query.set("facet", "on");
query.set("facet.field", "locale");
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(2, en.getCount());

server.delete("foo");
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(1, en.getCount());
-->8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius



Solr Suggester not working in sharding (distributed search)

2014-02-17 Thread Aniket Potdar

I have two solr server (solr 4.5.1) which is running in shard..

I have implemented solr suggester using spellcheckComponent for 
auto-suggester.


when i execute suggest url on individual core then the solr suggestion 
is coming properly.


mysolr.com:8986/solr/core1/suggest?spellcheck.q=city%20of and 
mysolr.com:8987/solr/core1/suggest?spellcheck.q=city%20of


when i fired url with refence to solr wiki 
(wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support) 
the result is not coming and below exception is occur.


URL :- 
mysolr.com:8986/solr/core1/select?shards=mysolr.com:8986/solr/core1,mysolr.com:8987/solr/core1&spellcheck.q=city%20of&shards.qt=%2Fsuggest&qt=suggest


java.lang.NullPointerException at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843) 
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) 
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) 
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) 
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) 
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) 
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) 
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) 
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) 
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) 
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) 
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) 
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) 
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) 
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) 
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Thread.java:619)


for reference below is my schema.xml and solrconfig.xml entry

|

  >suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookupFactory
  sugg  

  true
|

|  

  on
  suggest
  true
  10
  true




  suggest
|

|

"solr.KeywordTokenizerFactory"/>




|

i have unique filed, id which is store = true in schema.xml

can any one please suggest the solution.

--
Thanks & Regards,
Aniket Potdar
| Java Developer | ANMsoft Technologies (P) Ltd. |
| 218-220 Building 2 Sector 1, MBP, Mahape, Navi Mumbai 400710 | 
Maharashtra | India |

| Email - aniket.pot...@anmsoft.com |


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Ahmet Arslan
Hi,

Also I noticed that in your code snippet you have server.delete("foo"); which 
does not exists. deleteById and deleteByQuery methods are defined in SolrServer 
implementation.



On Monday, February 17, 2014 1:42 PM, Ahmet Arslan  wrote:
Hi Marius,

Facets are computed from indexed terms. Can you commit with expungeDeletes=true 
flag?

Ahmet




On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
 wrote:
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8<--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField("id", "foo");
document.setField("locale", "en");
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField("id", "bar");
document.setField("locale", "en");
server.add(document);

server.commit();

SolrQuery query = new SolrQuery("*:*");
query.set("facet", "on");
query.set("facet.field", "locale");
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(2, en.getCount());

server.delete("foo");
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField("locale");
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals("en", en.getName());
Assert.assertEquals(1, en.getCount());
-->8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius



Solr cloud hangs

2014-02-17 Thread Pawel Rog
Hi,
I have quite annoying problem with Solr cloud. I have a cluster with 8
shards and with 2 replicas in each. (Solr 4.6.1)
After some time cluster doesn't respond to any update requests. Restarting
the cluster nodes doesn't help.

There are a lot of such stack traces (waiting for very long time):


   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
   -
   org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
   - java.lang.Thread.run(Thread.java:722)


Do you have any idea where can I look for?

--
Pawel


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan  wrote:
> Hi,
>

> Also I noticed that in your code snippet you have server.delete("foo"); which 
> does not exists. deleteById and deleteByQuery methods are defined in 
> SolrServer implementation.

Yes, sorry, I have a wrapper over the SolrInstance that doesn't do
much. In the case of delete it just forwards the call to deleteById.
I'll check the expungeDeletes=true flag and post back the results.

Thanks,
Marius

>
>
>
> On Monday, February 17, 2014 1:42 PM, Ahmet Arslan  wrote:
> Hi Marius,
>
> Facets are computed from indexed terms. Can you commit with 
> expungeDeletes=true flag?
>
> Ahmet
>
>
>
>
> On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
>  wrote:
> Hi guys,
>
> I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
> not invalidated when documents are deleted from the index. Sadly, for
> me, I cannot reproduce this issue with an integration test like this:
>
> --8<--
> SolrInstance server = getSolrInstance();
>
> SolrInputDocument document = new SolrInputDocument();
> document.setField("id", "foo");
> document.setField("locale", "en");
> server.add(document);
>
> server.commit();
>
> document = new SolrInputDocument();
> document.setField("id", "bar");
> document.setField("locale", "en");
> server.add(document);
>
> server.commit();
>
> SolrQuery query = new SolrQuery("*:*");
> query.set("facet", "on");
> query.set("facet.field", "locale");
> QueryResponse response = server.query(query);
>
> Assert.assertEquals(2, response.getResults().size());
> FacetField localeFacet = response.getFacetField("locale");
> Assert.assertEquals(1, localeFacet.getValues().size());
> Count en = localeFacet.getValues().get(0);
> Assert.assertEquals("en", en.getName());
> Assert.assertEquals(2, en.getCount());
>
> server.delete("foo");
> server.commit();
>
> response = server.query(query);
>
> Assert.assertEquals(1, response.getResults().size());
> localeFacet = response.getFacetField("locale");
> Assert.assertEquals(1, localeFacet.getValues().size());
> en = localeFacet.getValues().get(0);
> Assert.assertEquals("en", en.getName());
> Assert.assertEquals(1, en.getCount());
> -->8--
>
> Nevertheless, when I do the 'same' on my real environment, the count
> for the locale facet remains 2 after one of the documents is deleted.
> The search result count is fine, so that's why I think it's a facet
> cache issue. Note that the facet count remains 2 even after I restart
> the server, so the cache is persisted on the file system.
>
> Strangely, the facet count is updated correctly if I modify the
> document instead of deleting it (i.e. removing a keyword from the
> content so that it isn't matched by the search query any more). So it
> looks like only delete triggers the issue.
>
> Now, an interesting fact is that if, on my real environment, I delete
> one of the documents and then add a new one, the facet count becomes
> 3. So the last commit to the index, which inserts a new document,
> doesn't trigger a re-computation of the facet cache. The previous
> facet cache is simply incremented, so the error is perpetuated. At
> this point I don't even know how to fix the facet cache without
> deleting the Solr data folder so that the full index is rebuild.
>
> I'm still trying to figure out what is the difference between the
> integration test and my real environment (as I used the same schema
> and configuration). Do you know what might be wrong?
>
> Thanks,
> Marius
>


Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Daniel Bryant

Hi all,

I have a production SolrCloud server which has multiple sharded indexes, 
and I need to copy all of the indexes to a (non-cloud) Solr server 
within our QA environment.


Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr 
(https://github.com/dbashford/solr2solr), but the author states that 
this is best for small indexes, and ours are rather large at ~20Gb each. 
I've also looked at replication, but can't find a definite reference on 
how this should be done between SolrCloud and Solr?


Any guidance is very much appreciated.

Best wishes,

Daniel



--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
*
daniel.bry...@tai-dev.co.uk   |  +44 
(0) 7799406399  |  Twitter: @taidevcouk 


Re: Solr cloud hangs

2014-02-17 Thread Mark Miller
Can you share the full stack trace dump?

- Mark

http://about.me/markrmiller

On Feb 17, 2014, at 7:07 AM, Pawel Rog  wrote:

> Hi,
> I have quite annoying problem with Solr cloud. I have a cluster with 8
> shards and with 2 replicas in each. (Solr 4.6.1)
> After some time cluster doesn't respond to any update requests. Restarting
> the cluster nodes doesn't help.
> 
> There are a lot of such stack traces (waiting for very long time):
> 
> 
>   - sun.misc.Unsafe.park(Native Method)
>   - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   -
>   
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>   -
>   org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
>   -
>   
> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
>   -
>   
> org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
>   -
>   
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>   - java.lang.Thread.run(Thread.java:722)
> 
> 
> Do you have any idea where can I look for?
> 
> --
> Pawel



Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread Erick Erickson
I think commits are not really the issue here. It _looks_ like
at least one node in your "document" collection is failing to
start, in fact your shard 2. On the Solr admin screen, the
"cloud" section on the left should show you the states of all
your nodes, make sure they're all green.

My guess is that if you look at your Solr logs on the nodes that
aren't coming up, you'll have a better idea of what's happening.

You need to get all the nodes running first before worrying about
messages like you're showing.

Best,
Erick


On Mon, Feb 17, 2014 at 1:28 AM, sweety  wrote:

> I have configured solrcloud as follows,
> 
>
> Solr.xml:
> 
>zkClientTimeout="${zkClientTimeout:15000}"
> hostPort="${jetty.port:}" hostContext="solr">
>  name="document"/>
>  name="contract"/>
>   
> 
>
> I  have added all the required config for solrcloud, referred this :
> http://wiki.apache.org/solr/SolrCloud#Required_Config
>
> I am adding data to core:document.
> Now when i try to index using solrnet, (solr.Add(doc)) , i get this error :
> SEVERE: org.apache.solr.common.SolrException: *No registered leader was
> found, collection:document* slice:shard2
> at
>
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481)
>
> and this error also:
> SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed*
> at
>
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520)
>
> I guess, it is because the leader is from core:contract and i am trying to
> index in core:document?
> Is there a way to change the leader, and how ?
> How can i change the state of shards from gone to active?
>
> Also when i try to query : q=*:* , this is shown
> org.apache.solr.common.SolrException: *Error opening new searcher at*
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at
>
> I read that, if number of commits exceed then this searcher error comes,
> but
> i did not issue commit command,then how will the commit exceed. Also it
> requires some warming setting, so i added this to solrconfig.xml, but still
> i get the same error,
>
> 
>  
>   
>  solr
>   0
>   10
> 
>  rocks
>   0
>   10
> 
>   
> 
> 2
> 
>
> I have just started with solrcloud, please tell if I am doing anything
> wrong
> in solrcloud configurations.
> Also i did not good material for solrcloud in windows 7 with apache tomcat
> ,
> please suggest for that too.
> Thanks a lot.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: block join and atomic updates

2014-02-17 Thread Mikhail Khludnev
Hello,

It sounds like you need to switch to query time join.
15.02.2014 21:57 пользователь  написал:

> Any suggestions?
>
>
> Zitat von m...@preselect-media.com:
>
>  Yonik Seeley :
>>
>>> On Thu, Feb 13, 2014 at 8:25 AM,   wrote:
>>>
 Is there any workaround to perform atomic updates on blocks or do I
 have to
 re-index the parent document and all its children always again if I
 want to
 update a field?

>>>
>>> The latter, unfortunately.
>>>
>>
>> Is there any plan to change this behavior in near future?
>>
>> So, I'm thinking of alternatives without loosing the benefit of block
>> join.
>> I try to explain an idea I just thought about:
>>
>> Let's say I have a parent document A with a number of fields I want to
>> update regularly and a number of child documents AC_1 ... AC_n which are
>> only indexed once and aren't going to change anymore.
>> So, if I index A and AC_* in a block and I update A, the block is gone.
>> But if I create an additional document AF which only contains something
>> like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
>> anymore), could I perform a {!parent ... } query on AF + AC_* and make an
>> join from the results to get A?
>> Does this makes any sense and is it even possible? ;-)
>> And if it's possible, how can I do it?
>>
>> Thanks,
>> - Moritz
>>
>
>
>
>


Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety
How do i get them running?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cloud hangs

2014-02-17 Thread Pawel Rog
Hi,
Here is the whole stack trace: https://gist.github.com/anonymous/9056783

--
Pawel

On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller  wrote:

> Can you share the full stack trace dump?
>
> - Mark
>
> http://about.me/markrmiller
>
> On Feb 17, 2014, at 7:07 AM, Pawel Rog  wrote:
>
> > Hi,
> > I have quite annoying problem with Solr cloud. I have a cluster with 8
> > shards and with 2 replicas in each. (Solr 4.6.1)
> > After some time cluster doesn't respond to any update requests.
> Restarting
> > the cluster nodes doesn't help.
> >
> > There are a lot of such stack traces (waiting for very long time):
> >
> >
> >   - sun.misc.Unsafe.park(Native Method)
> >   -
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> >   -
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> >   -
> >
> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
> >   -
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
> >   -
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
> >   -
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
> >   - java.lang.Thread.run(Thread.java:722)
> >
> >
> > Do you have any idea where can I look for?
> >
> > --
> > Pawel
>
>


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Shawn Heisey
On 2/17/2014 8:32 AM, Daniel Bryant wrote:
> I have a production SolrCloud server which has multiple sharded indexes,
> and I need to copy all of the indexes to a (non-cloud) Solr server
> within our QA environment.
> 
> Can I ask for advice on the best way to do this please?
> 
> I've searched the web and found solr2solr
> (https://github.com/dbashford/solr2solr), but the author states that
> this is best for small indexes, and ours are rather large at ~20Gb each.
> I've also looked at replication, but can't find a definite reference on
> how this should be done between SolrCloud and Solr?
> 
> Any guidance is very much appreciated.

If the master index isn't changing at the time of the copy, and you're
on a non-Windows platform, you should be able to copy the index
directory directly.  On a Windows platform, whether you can copy the
index while Solr is using it would depend on how Solr/Lucene opens the
files.  A typical Windows file open will prevent anything else from
opening them, and I do not know whether Lucene is smarter than that.

SolrCloud requires the replication handler to be enabled on all configs,
but during normal operation, it does not actually use replication.  This
is a confusing thing for some users.

I *think* you can configure the replication handler on slave cores with
a non-cloud config that point at the master cores, and it should
replicate the main Lucene index, but not the config files.  I have no
idea whether things will work right if you configure other master
options like replicateAfter and config files, and I also don't know if
those options might cause problems for SolrCloud itself.  Those options
shouldn't be necessary for just getting the data into a dev environment,
though.

Thanks,
Shawn



Re: Solr cloud hangs

2014-02-17 Thread Pawel Rog
There are also many errors in solr log like that one:

org.apache.solr.update.StreamingSolrServers$1; error
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool
at
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


--
Pawel


On Mon, Feb 17, 2014 at 8:01 PM, Pawel Rog  wrote:

> Hi,
> Here is the whole stack trace: https://gist.github.com/anonymous/9056783
>
> --
> Pawel
>
>
> On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller wrote:
>
>> Can you share the full stack trace dump?
>>
>> - Mark
>>
>> http://about.me/markrmiller
>>
>> On Feb 17, 2014, at 7:07 AM, Pawel Rog  wrote:
>>
>> > Hi,
>> > I have quite annoying problem with Solr cloud. I have a cluster with 8
>> > shards and with 2 replicas in each. (Solr 4.6.1)
>> > After some time cluster doesn't respond to any update requests.
>> Restarting
>> > the cluster nodes doesn't help.
>> >
>> > There are a lot of such stack traces (waiting for very long time):
>> >
>> >
>> >   - sun.misc.Unsafe.park(Native Method)
>> >   -
>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>> >   -
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>> >   -
>> >
>> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
>> >   -
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
>> >   -
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
>> >   -
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>> >   - java.lang.Thread.run(Thread.java:722)
>> >
>> >
>> > Do you have any idea where can I look for?
>> >
>> > --
>> > Pawel
>>
>>
>


Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread Erick Erickson
Well, first determine whether they are running or not.

Then look at the Solr log for that node when you try to start it up.

Then post the results if you're still puzzled.

You've given us no information about what the error (if any) is,
I'm speculating here.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Mon, Feb 17, 2014 at 10:27 AM, sweety  wrote:

> How do i get them running?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-17 Thread Developer
Hi Erik,

Thanks a lot for your reply.

I expect it to return zero suggestions since the suggested keyword doesnt
actually start with numbers.

Expected results 
Searching for ga -> returns galaxy 
Searching for gal -> returns galaxy
Searching for 12321312321312ga -> should not return any suggestion since
there is no keyword (combination) exists in the index.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
Sir, after I made experiment
if I there are more than 1000(roughly) documents in the core, the problem
will show up.

then I make a query in command window it shows

Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
opening new searcher. exceeded limit of maxWarmingSearchers=2, try again
later.
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at ExampleSolrJClient.handler(ExampleSolrJClient.java:107)
at ExampleSolrJClient.main(ExampleSolrJClient.java:53)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Socket Leak

2014-02-17 Thread Jared Rodriguez
Kiran & Shawn,

Thank you both for the info and you are both absolutely correct.  The issue
was not that sockets were leaked, but that wait time thing is a killer.  I
ended up fixing the problem by changing the system property of
"http.maxConnections" which is used internally to Apache httpclient to
setup the PoolingClientConnectionManager.  Previously, this had no value,
and was defaulting to 5.  That meant that any time there were more than 50
(maxConnections * maxperroute) concurrent connections to the Solr server,
non reusable connections were opening and closing and thus sitting in that
idle state .. too many sockets.

The fix was simply tuning the pool and setting "http.maxConnections" to a
higher value representing the number of concurrent users that I expect.
 Problem fixed, and a modest speed improvement simply by higher socket
reuse.

Thank you both for the help!

Jared




On Mon, Feb 17, 2014 at 3:03 AM, Kiran Chitturi <
kiran.chitt...@lucidworks.com> wrote:

> Jared,
>
> I faced a similar issue when using CloudSolrServer with Solr. As Shawn
> pointed out the 'TIME_WAIT' status happens when the connection is closed
> by the http client. HTTP client closes connection whenever it thinks the
> connection is stale
> (
> https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
> #d5e405). Even the docs point out the stale connection checking cannot be
> all reliable.
>
> I see two ways to get around this:
>
> 1. Enable 'SO_REUSEADDR'
> 2. Disable stale connection checks.
>
> Also by default, when we create CSS it does not explicitly configure any
> http client parameters
> (
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a
> pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the
> default configuration parameters (max connections, max connections per
> host) are used for a http connection. You can explicitly configure these
> params when creating CSS using HttpClientUtil:
>
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
> params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
> params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
> params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3);
> httpClient = HttpClientUtil.createClient(params);
>
> final HttpClient client = HttpClientUtil.createClient(params);
> LBHttpSolrServer lb = new LBHttpSolrServer(client);
> CloudSolrServer server = new CloudSolrServer(zkConnect, lb);
>
>
> Currently, I am using http client 4.3.2 and building the client when
> creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the
> 'TIME_WAIT'  after this (may be because of better handling of stale
> connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My
> current http client code looks like this: (works only with http client
> 4.3.2)
>
> HttpClientBuilder httpBuilder = HttpClientBuilder.create();
>
> Builder socketConfig =  SocketConfig.custom();
> socketConfig.setSoReuseAddress(true);
> socketConfig.setSoTimeout(1);
> httpBuilder.setDefaultSocketConfig(socketConfig.build());
> httpBuilder.setMaxConnTotal(300);
> httpBuilder.setMaxConnPerRoute(100);
>
> httpBuilder.disableRedirectHandling();
> httpBuilder.useSystemProperties();
> LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser)
> CloudSolrServer server = new CloudSolrServer(zkConnect, lb);
>
>
> There should be a way to configure socket reuse with 4.2.3 too. You can
> try different configurations. I am surprised you have 'TIME_WAIT'
> connections even after 30 minutes because 'TIME_WAIT' connection should be
> closed by default in 2 mins by O.S I think.
>
>
> HTH,
>
> --
> Kiran Chitturi,
>
>
> On 2/13/14 12:38 PM, "Jared Rodriguez"  wrote:
>
> >I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part
> >of a web application which connects to the solr server via solrj
> >using CloudSolrServer();  The web application is wired up with Guice, and
> >there is a single instance of the CloudSolrServer class used by all
> >inbound
> >requests.  All this is running on Amazon.
> >
> >Basically, everything looks and runs fine for a while, but even with
> >moderate concurrency, solrj starts leaving sockets open.  We are handling
> >only about 250 connections to the web app per minute and each of these
> >issues from 3 - 7 requests to solr.  Over a 30 minute period of this type
> >of use, we end up with many 1000s of lingering sockets.  I can see these
> >when running netstats
> >
> >tcp0  0 ip-10-80-14-26.ec2.in:41098
> >ip-10-99-145-47.ec2.i:glrpc
> >TIME_WAIT
> >
> >All to the same target host, which is my solr server. There are no other
> >pieces of infrastructure on that box, just solr.  Eventually, the server
> >just dies as no further so

Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
I found out in this stranger situation, I could import, update, or delete
data(using DIH or SolrJ)
But the query will waiting forever.

So I delete all the documents or just reduce the document number and then
restart the server, problem disappeared



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117852.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it possible to load new elevate.xml on the fly?

2014-02-17 Thread Developer
Hi,

I am trying to figure out a way to use multiple elevate.xml using the query
parameters on the fly.

We have a scenario where we need to elevate documents based on
authentication (same core) without creating a new search handler.
*
For authenticated customers
*
elevate documents based on elevate1.xml

*For non-authenticated customers*

elevate documents based on elevate2.xml

I am not sure if there is a way to implement this using any other method. 

Any help in this regard is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-new-elevate-xml-on-the-fly-tp4117856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
I solved it , my mistake
I was using Solr4.6.1 jars,  but in my solrconfig.xml I used
LucenMatcheVersion 4.5
I just coped from last project and didn't check it.
My really stupid mistake



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Michael Della Bitta
I do know for certain that the backup command on a cloud core still works.
We have a script like this running on a cron to snapshot indexes:

curl -s '
http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp
'

(not really using /tmp for this, parameters changed to protect the guilty)

The admin handler for replication doesn't seem to be there, but the actual
API seems to work normally.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey  wrote:

> On 2/17/2014 8:32 AM, Daniel Bryant wrote:
> > I have a production SolrCloud server which has multiple sharded indexes,
> > and I need to copy all of the indexes to a (non-cloud) Solr server
> > within our QA environment.
> >
> > Can I ask for advice on the best way to do this please?
> >
> > I've searched the web and found solr2solr
> > (https://github.com/dbashford/solr2solr), but the author states that
> > this is best for small indexes, and ours are rather large at ~20Gb each.
> > I've also looked at replication, but can't find a definite reference on
> > how this should be done between SolrCloud and Solr?
> >
> > Any guidance is very much appreciated.
>
> If the master index isn't changing at the time of the copy, and you're
> on a non-Windows platform, you should be able to copy the index
> directory directly.  On a Windows platform, whether you can copy the
> index while Solr is using it would depend on how Solr/Lucene opens the
> files.  A typical Windows file open will prevent anything else from
> opening them, and I do not know whether Lucene is smarter than that.
>
> SolrCloud requires the replication handler to be enabled on all configs,
> but during normal operation, it does not actually use replication.  This
> is a confusing thing for some users.
>
> I *think* you can configure the replication handler on slave cores with
> a non-cloud config that point at the master cores, and it should
> replicate the main Lucene index, but not the config files.  I have no
> idea whether things will work right if you configure other master
> options like replicateAfter and config files, and I also don't know if
> those options might cause problems for SolrCloud itself.  Those options
> shouldn't be necessary for just getting the data into a dev environment,
> though.
>
> Thanks,
> Shawn
>
>


Boost Query Example

2014-02-17 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Hi can some one help me on the Boost & Sort query example.

http://localhost:8983/solr/ 
ProductCollection/select?q=*%3A*&wt=json&indent=true&fq=SKU:223-CL10V3^100 OR 
SKU:223-CL1^90

There is not different in the query Order, Let me know if I am missing 
something. Also I like to Order with the exact match for SKU:223-CL10V3^100

Thanks

Ravi


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
I tried to set the expungeDeletes flag but it didn't fix the problem.
The SolrServer doesn't expose a way to set this flag so I had to use:

new UpdateRequest().setAction(UpdateRequest.ACTION.COMMIT, true, true,
1, true).process(solrServer);

Any other hints?

Note that I managed to run my test in my real environment at runtime
and it passed, so it seems the behaviour depends on the size of the
documents that are committed (added to or deleted from the index).

Thanks,
Marius

On Mon, Feb 17, 2014 at 2:32 PM, Marius Dumitru Florea
 wrote:
> On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan  wrote:
>> Hi,
>>
>
>> Also I noticed that in your code snippet you have server.delete("foo"); 
>> which does not exists. deleteById and deleteByQuery methods are defined in 
>> SolrServer implementation.
>
> Yes, sorry, I have a wrapper over the SolrInstance that doesn't do
> much. In the case of delete it just forwards the call to deleteById.
> I'll check the expungeDeletes=true flag and post back the results.
>
> Thanks,
> Marius
>
>>
>>
>>
>> On Monday, February 17, 2014 1:42 PM, Ahmet Arslan  wrote:
>> Hi Marius,
>>
>> Facets are computed from indexed terms. Can you commit with 
>> expungeDeletes=true flag?
>>
>> Ahmet
>>
>>
>>
>>
>> On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
>>  wrote:
>> Hi guys,
>>
>> I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
>> not invalidated when documents are deleted from the index. Sadly, for
>> me, I cannot reproduce this issue with an integration test like this:
>>
>> --8<--
>> SolrInstance server = getSolrInstance();
>>
>> SolrInputDocument document = new SolrInputDocument();
>> document.setField("id", "foo");
>> document.setField("locale", "en");
>> server.add(document);
>>
>> server.commit();
>>
>> document = new SolrInputDocument();
>> document.setField("id", "bar");
>> document.setField("locale", "en");
>> server.add(document);
>>
>> server.commit();
>>
>> SolrQuery query = new SolrQuery("*:*");
>> query.set("facet", "on");
>> query.set("facet.field", "locale");
>> QueryResponse response = server.query(query);
>>
>> Assert.assertEquals(2, response.getResults().size());
>> FacetField localeFacet = response.getFacetField("locale");
>> Assert.assertEquals(1, localeFacet.getValues().size());
>> Count en = localeFacet.getValues().get(0);
>> Assert.assertEquals("en", en.getName());
>> Assert.assertEquals(2, en.getCount());
>>
>> server.delete("foo");
>> server.commit();
>>
>> response = server.query(query);
>>
>> Assert.assertEquals(1, response.getResults().size());
>> localeFacet = response.getFacetField("locale");
>> Assert.assertEquals(1, localeFacet.getValues().size());
>> en = localeFacet.getValues().get(0);
>> Assert.assertEquals("en", en.getName());
>> Assert.assertEquals(1, en.getCount());
>> -->8--
>>
>> Nevertheless, when I do the 'same' on my real environment, the count
>> for the locale facet remains 2 after one of the documents is deleted.
>> The search result count is fine, so that's why I think it's a facet
>> cache issue. Note that the facet count remains 2 even after I restart
>> the server, so the cache is persisted on the file system.
>>
>> Strangely, the facet count is updated correctly if I modify the
>> document instead of deleting it (i.e. removing a keyword from the
>> content so that it isn't matched by the search query any more). So it
>> looks like only delete triggers the issue.
>>
>> Now, an interesting fact is that if, on my real environment, I delete
>> one of the documents and then add a new one, the facet count becomes
>> 3. So the last commit to the index, which inserts a new document,
>> doesn't trigger a re-computation of the facet cache. The previous
>> facet cache is simply incremented, so the error is perpetuated. At
>> this point I don't even know how to fix the facet cache without
>> deleting the Solr data folder so that the full index is rebuild.
>>
>> I'm still trying to figure out what is the difference between the
>> integration test and my real environment (as I used the same schema
>> and configuration). Do you know what might be wrong?
>>
>> Thanks,
>> Marius
>>


Re: Boost Query Example

2014-02-17 Thread Michael Della Bitta
Hi,

Filter queries don't affect score, so boosting won't have an effect there.
If you want those query terms to get boosted, move them into the q
parameter.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Hope that helps!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)  wrote:

>
> Hi can some one help me on the Boost & Sort query example.
>
> http://localhost:8983/solr/ProductCollection/select?q=*%3A*&wt=json&indent=true&fq=SKU:223-CL10V3^100
> OR SKU:223-CL1^90
>
> There is not different in the query Order, Let me know if I am missing
> something. Also I like to Order with the exact match for SKU:223-CL10V3^100
>
> Thanks
>
> Ravi
>


DIH and Tika

2014-02-17 Thread Teague James
Is there a way to specify the document types that Tika parses? In my DIH I
index the content of a SQL database which has a field that points to the SQL
record's binary file (which could be Word, PDF, JPG, MOV, etc.). Tika then
uses the document URL to index that document's content. However there are a
lot of document types that Tika cannot parse. I'd like to limit Tika to just
parsing Word and PDF documents so that I don't have to wait for Tika to
determine the document type and whether or not it can parse it. I suspect
that the number of exceptions being thrown over documents that Tika cannot
read is increasing my indexing time significantly. Any guidance is
appreciated.

-Teague



Escape \\n from getting highlighted - highlighter component

2014-02-17 Thread Developer
Hi,

When searching for a text like 'talk n text' the highlighter component also
adds the  tags to the special characters like \n. Is there a way to
avoid highlighting the special characters?

\\r\\n Family Messaging

 is getting replaced as 

\\r\\n Family Messaging 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Escape-n-from-getting-highlighted-highlighter-component-tp4117895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-17 Thread Erick Erickson
Ah, OK, I though you were indexing things like 123412335ga, but not so.

Afraid I'm fresh out of ideas. Although I might try using TermsComponent
to examine the index and see if, somehow, there _are_ terms with leading
numbers in the output.

It's also possible that numbers are stripped when building the FST that
is used, but I don't know one way or the other.

Best,
Erick


On Mon, Feb 17, 2014 at 11:30 AM, Developer  wrote:

> Hi Erik,
>
> Thanks a lot for your reply.
>
> I expect it to return zero suggestions since the suggested keyword doesnt
> actually start with numbers.
>
> Expected results
> Searching for ga -> returns galaxy
> Searching for gal -> returns galaxy
> Searching for 12321312321312ga -> should not return any suggestion since
> there is no keyword (combination) exists in the index.
>
> Thanks
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Erick Erickson
Glad it's resolved, thanks for letting us know, it
removes some uncertainty.

Erick


On Mon, Feb 17, 2014 at 12:23 PM, Eric_Peng wrote:

> I solved it , my mistake
> I was using Solr4.6.1 jars,  but in my solrconfig.xml I used
> LucenMatcheVersion 4.5
> I just coped from last project and didn't check it.
> My really stupid mistake
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR suggester component - Get suggestion dump

2014-02-17 Thread bbi123
I started using terms component to view the terms and the counts...

terms?terms.fl=autocomplete_phrase&terms.regex=a.*&terms.limit=1000



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-suggester-component-Get-suggestion-dump-tp4110026p4117913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Colin Bartolome
We're using Solr version 4.2.1, in case new functionality has helped with 
this issue.


We have our Solr servers doing automatic soft commits with maxTime=1000. 
We also have a scheduled job that triggers a hard commit every fifteen 
minutes. When one of these hard commits happens while a soft commit is 
already in progress, we get that ubiquitous warning:


PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Recently, we had an occasion to have a second scheduled job also issue a 
hard commit every now and then. Since our maxWarmingSearchers value was 
set to the default, 2, we occasionally had a hard commit trigger when two 
other searchers were already warming up, which led to this:


org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
available to handle this request


as the servers started responded with a 503 HTTP response.

It seems like automatic soft commits wait until the hard commits are out 
of the way before they proceed. Is there a way to do the same for hard 
commits? Since we're passing waitSearcher=true in the update request that 
triggers the hard commits, I would expect the request to block until the 
server had enough headroom to service the commit. I did not expect that 
we'd start getting 503 responses.


Is there a way to pull this off, either via some extra request parameters 
or via some server-side configuration?


Slow 95th-percentile

2014-02-17 Thread Allan Carroll
Hi all,

I'm having trouble getting my Solr setup to get consistent performance. Average 
select latency is great, but 95% is dismal (10x average). It's probably 
something slightly misconfigured. I’ve seen it have nice, low variance 
latencies for a few hours here and there, but can’t figure out what’s different 
during those times.


* I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes 
(8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 150 
updates per second. 

* The index has about 11GB of data in 14M docs, the other 10MB of data in 3K 
docs. Stays around 30 segments.

* Soft commits after 10 seconds, hard commits after 120 seconds. Though, 
turning off the update traffic doesn’t seem to have any affect on the select 
latencies.

* I think GC latency is low. Running 3GB heaps with 1G new size. GC time is 
around 3ms per second.
 

Here’s a typical select query:

fl=*,sortScore:textScore&sort=textScore desc&start=0&q=text:(("soccer" OR "MLS" 
OR "premier league" OR "FIFA" OR "world cup") OR ("sorority" OR "fraternity" OR 
"greek life" OR "dorm" OR "campus"))&wt=json&fq=startTime:[139265640 TO 
139271754]&fq={!frange l=2 u=3}timeflag(startTime)&fq={!frange 
l=139265640 u=139269594 
cache=false}timefix(startTime,-2160)&fq=privacy:OPEN&defType=edismax&rows=131


Anyone have any suggestions on where to look next? Or, if you know someone in 
the bay area that would consult for an hour or two and help me track it down, 
that’d be great too.

Thanks!

-Allan

Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Shawn Heisey
On 2/17/2014 6:06 PM, Colin Bartolome wrote:
> We're using Solr version 4.2.1, in case new functionality has helped
> with this issue.
> 
> We have our Solr servers doing automatic soft commits with maxTime=1000.
> We also have a scheduled job that triggers a hard commit every fifteen
> minutes. When one of these hard commits happens while a soft commit is
> already in progress, we get that ubiquitous warning:
> 
> PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> 
> Recently, we had an occasion to have a second scheduled job also issue a
> hard commit every now and then. Since our maxWarmingSearchers value was
> set to the default, 2, we occasionally had a hard commit trigger when
> two other searchers were already warming up, which led to this:
> 
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request
> 
> as the servers started responded with a 503 HTTP response.
> 
> It seems like automatic soft commits wait until the hard commits are out
> of the way before they proceed. Is there a way to do the same for hard
> commits? Since we're passing waitSearcher=true in the update request
> that triggers the hard commits, I would expect the request to block
> until the server had enough headroom to service the commit. I did not
> expect that we'd start getting 503 responses.

Remember this mantra: Hard commits are about durability, soft commits
are about visibility.  You might already know this, but it is the key to
figuring out how to handle commits, whether they are user-triggered or
done automatically by the server.

With Solr 4.x, it's best to *always* configure autoCommit with
openSearcher=false.  This does a hard commit but does not open a new
searcher.  The result: Data is flushed to disk and the current
transaction log is closed.  New documents will not be searchable after
this kind of commit.  For maxTime and maxDocs, pick values that won't
result in huge transaction logs, which increase Solr startup time.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

For document visibility, you can rely on autoSoftCommit, and you
indicated that you already have it configured. Decide how long you can
wait for new content that has just been indexed.  Do you *really* need
new data to be searchable within one second?  If so, you're good.  If
not, increase the maxTime value here.  Be sure to make the value at
least a little bit longer than the amount of time it takes for a soft
commit to finish, including cache warmup time.

Thanks,
Shawn



Re: Slow 95th-percentile

2014-02-17 Thread Shawn Heisey
On 2/17/2014 6:12 PM, Allan Carroll wrote:
> I'm having trouble getting my Solr setup to get consistent performance. 
> Average select latency is great, but 95% is dismal (10x average). It's 
> probably something slightly misconfigured. I’ve seen it have nice, low 
> variance latencies for a few hours here and there, but can’t figure out 
> what’s different during those times.
> 
> 
> * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes 
> (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 
> 150 updates per second. 
> 
> * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K 
> docs. Stays around 30 segments.
> 
> * Soft commits after 10 seconds, hard commits after 120 seconds. Though, 
> turning off the update traffic doesn’t seem to have any affect on the select 
> latencies.
> 
> * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is 
> around 3ms per second.
>  
> 
> Here’s a typical select query:
> 
> fl=*,sortScore:textScore&sort=textScore desc&start=0&q=text:(("soccer" OR 
> "MLS" OR "premier league" OR "FIFA" OR "world cup") OR ("sorority" OR 
> "fraternity" OR "greek life" OR "dorm" OR 
> "campus"))&wt=json&fq=startTime:[139265640 TO 139271754]&fq={!frange 
> l=2 u=3}timeflag(startTime)&fq={!frange l=139265640 u=139269594 
> cache=false}timefix(startTime,-2160)&fq=privacy:OPEN&defType=edismax&rows=131

The first thing to say is that it's fairly normal for the 95th and 99th
percentile values to be quite a lot higher than the median and average
values.  I don't have actual values so I don't know if it's bad or not.

You're good on the most important performance-related resource, which is
memory for the OS disk cache.  The only thing that stands out as a
possible problem from what I know so far is garbage collection.  It
might be a case of full garbage collections happening too frequently, or
it might be a case of garbage collection pauses taking too long.  It
might even be a combination of both.

To fix frequent full collections, increase the heap size.  To fix the
other problem, use the CMS collector and tune it.

Two bits of information will help with recommendations: Your java
startup options, and your solrconfig.xml.

You're using an option in your query that I've never seen before.  I
don't know if frange is slow or not.

One last thing that might cause problems is super-frequent commits.

I could also be completely wrong!

Thanks,
Shawn



Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Colin Bartolome

On 02/17/2014 05:38 PM, Shawn Heisey wrote:

On 2/17/2014 6:06 PM, Colin Bartolome wrote:

We're using Solr version 4.2.1, in case new functionality has helped
with this issue.

We have our Solr servers doing automatic soft commits with maxTime=1000.
We also have a scheduled job that triggers a hard commit every fifteen
minutes. When one of these hard commits happens while a soft commit is
already in progress, we get that ubiquitous warning:

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Recently, we had an occasion to have a second scheduled job also issue a
hard commit every now and then. Since our maxWarmingSearchers value was
set to the default, 2, we occasionally had a hard commit trigger when
two other searchers were already warming up, which led to this:

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request

as the servers started responded with a 503 HTTP response.

It seems like automatic soft commits wait until the hard commits are out
of the way before they proceed. Is there a way to do the same for hard
commits? Since we're passing waitSearcher=true in the update request
that triggers the hard commits, I would expect the request to block
until the server had enough headroom to service the commit. I did not
expect that we'd start getting 503 responses.


Remember this mantra: Hard commits are about durability, soft commits
are about visibility.  You might already know this, but it is the key to
figuring out how to handle commits, whether they are user-triggered or
done automatically by the server.

With Solr 4.x, it's best to *always* configure autoCommit with
openSearcher=false.  This does a hard commit but does not open a new
searcher.  The result: Data is flushed to disk and the current
transaction log is closed.  New documents will not be searchable after
this kind of commit.  For maxTime and maxDocs, pick values that won't
result in huge transaction logs, which increase Solr startup time.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

For document visibility, you can rely on autoSoftCommit, and you
indicated that you already have it configured. Decide how long you can
wait for new content that has just been indexed.  Do you *really* need
new data to be searchable within one second?  If so, you're good.  If
not, increase the maxTime value here.  Be sure to make the value at
least a little bit longer than the amount of time it takes for a soft
commit to finish, including cache warmup time.

Thanks,
Shawn



Increasing the maxTime value doesn't actually solve the problem, though; 
it just makes it a little less likely. Really, the soft commits aren't 
the problem here, as far as we can tell. It's that a request that 
triggers a hard commit simply fails when the server is already at 
maxWarmingSearchers. I would expect the request to queue up and wait 
until the server could handle it.


Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Shawn Heisey
On 2/17/2014 7:06 PM, Colin Bartolome wrote:
> Increasing the maxTime value doesn't actually solve the problem, though;
> it just makes it a little less likely. Really, the soft commits aren't
> the problem here, as far as we can tell. It's that a request that
> triggers a hard commit simply fails when the server is already at
> maxWarmingSearchers. I would expect the request to queue up and wait
> until the server could handle it.

I think I put too much information in my reply.  Apologies.  Here's the
most important information to deal with first:

Don't send hard commits at all.  Configure autoCommit in your server
config, with the all-important openSearcher parameter set to false.
That will take care of all your hard commit needs, but those commits
will never open a new searcher, so they cannot cause an overlap with the
soft commits that DO open a new searcher.

Thanks,
Shawn



Re: Limit amount of search result

2014-02-17 Thread rachun
hi Samee,

Thank you very much for your suggestion.
Now I got it worked now;)

Chun.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062p4117952.html
Sent from the Solr - User mailing list archive at Nabble.com.