date:20150319

Solr 4.7.2 not recovering - "ClusterState says we are the leader, but locally we don't think so"

2015-03-19 Thread Guy Moshkowich

Hi, one morning my Solr server broke with this message below, it didn't 
recover on its own - had to restart it - Is that a 4.7.2 known issue?

My topology is very simple: single Solr with a single shard replica, and 
an  embedded ZK (-zkrun).
Could it be related to a 4.8 fix: SOLR-5799: When registering as the 
leader, if an existing ephemeral registration exists, wait a short time to 
see if it goes away. (Mark Miller) 

ERROR - 2015-03-18 04:48:15.326; 
org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState 
says we are the leader, but locally we don't think so
INFO  - 2015-03-18 04:48:15.327; 
org.apache.solr.update.processor.LogUpdateProcessor; 
[quick-results-collection] webapp=/solr path=/update 
params={wt=javabin&version=2} {} 0 1
ERROR - 2015-03-18 04:48:15.328; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: ClusterState says we are the leader 
(http://9.70.210.149:8983/solr/quick-results-collection), but locally we 
don't think so. Request came from null
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:503)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:267)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

IP Address assgined to solr instance during the Cloud mode start

2015-03-19 Thread davidphilip cherian

Hi,

When I started solr in cloud mode(interactive) and chose 2 nodes, it
started and in the cloud-view screen it showed some different ip with url
169.254.5.207:7574, when clicked on that, it says page not found. When I
modified url to localhost(http://localhost:7574/solr/#/~cloud) it
worked(loaded solr admin page)
Query is, Where is this ip address picked from? How to edit them?

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Derek Poh


Hi Erick

Am I right to saywe need todo the combine of duplicate records into 1 
before feeding it to Solr to index?


I am coming from Endecawhich support the combine of duplicate records 
into 1 recordduring indexing. Was wondering if Solr support this.


-Derek

On 3/18/2015 11:21 PM, Erick Erickson wrote:

I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.

Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh  wrote:

Hi

If I have duplicaterecords in my source data (DB or delimited files). For
simplicity sake they are of the following nature

Product IdBusiness Type
---
12345 Exporter
12345 Agent
12366 Manufacturer
12377 Exporter
12377 Distributor

There are other fields with multiple values as well.

How do I index theduplicate records into 1 document. Eg. Product Id 12345
will be 1 document,12366 as 1 document and 12377 as 1 document.

-Derek

Start stop solr started in solr cloud mode

2015-03-19 Thread davidphilip cherian

Hi,
I started solr in cloud mode (interactive set up). 3 nodes, 3 shards and 1
replica and a collection.  I stopped it using ./solr stop -all. How do I
get the same above cloud mode setup to start? "./solr -c start"  started
the new solr cloud instance all together where as I was looking for the
previously set up instance to start?.  I am going through reference guide.
I did not find any command for this. Please help.

Re: IP Address assgined to solr instance during the Cloud mode start

2015-03-19 Thread davidphilip cherian

I think this is because of change in network ip address. I got it. Thanks.


On Thu, Mar 19, 2015 at 1:32 PM, davidphilip cherian <
davidphilipcher...@gmail.com> wrote:

> Hi,
>
> When I started solr in cloud mode(interactive) and chose 2 nodes, it
> started and in the cloud-view screen it showed some different ip with url
> 169.254.5.207:7574, when clicked on that, it says page not found. When I
> modified url to localhost(http://localhost:7574/solr/#/~cloud) it
> worked(loaded solr admin page)
> Query is, Where is this ip address picked from? How to edit them?
>

Re: Whole RAM consumed while Indexing.

2015-03-19 Thread Nitin Solanki

Hi Alxeandre,
Number of segment counts are different but document
counts are same.
With (soft commit - 300 and hardcommit - 6000) = No. of segment - 43
AND
With (soft commit - 6 and hardcommit - 6) = No. of segment - 31

I dont' have any idea related to segment counts. What is it? How to solve
it? Any idea.
Or it is fine without worrying about segments.
Just want to ask - If segment counts are more than searching will be slow?

On Wed, Mar 18, 2015 at 10:14 PM, Alexandre Rafalovitch 
wrote:

> Probably merged somewhat differently with some terms indexes repeating
> between segments. Check the number of segments in data directory.And
> do search for *:* and make sure both do have the same document counts.
>
> Also, In all these discussions, you still haven't answered about how
> fast after indexing you want to _search_? Because, if you are not
> actually searching while committing, you could even index on a
> completely separate server (e.g. a faster one) and swap (or alias)
> index in afterwards. Unless, of course, I missed it, it's a lot of
> emails in a very short window of time.
>
> Regards,
>Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 18 March 2015 at 12:09, Nitin Solanki  wrote:
> > When I kept my configuration to 300 for soft commit and 3000 for hard
> > commit and indexed some amount of data, I got the data size of the whole
> > index to be 6GB after completing the indexing.
> >
> > When I changed the configuration to 6 for soft commit and 6 for
> > hard commit and indexed same data then I got the data size of the whole
> > index to be 5GB after completing the indexing.
> >
> > But the number of documents in the both scenario were same. I am
> wondering
> > how that can be possible?
> >
> > On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki 
> wrote:
> >
> >> Hi Erick,
> >>  I am just saying. I want to be sure on commits difference..
> >> What if I do frequent commits or not? And why I am saying that I need to
> >> commit things so very quickly because I have to index 28GB of data which
> >> takes 7-8 hours(frequent commits).
> >> As you said, do commits after 6 seconds then it will be more
> expensive.
> >> If I don't encounter with **"overlapping searchers" warning messages**
> >> then I feel it seems to be okay. Is it?
> >>
> >>
> >>
> >>
> >> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> Don't do it. Really, why do you want to do this? This seems like
> >>> an "XY" problem, you haven't explained why you need to commit
> >>> things so very quickly.
> >>>
> >>> I suspect you haven't tried _searching_ while committing at such
> >>> a rate, and you might as well turn all your top-level caches off
> >>> in solrconfig.xml since they won't be useful at all.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki 
> >>> wrote:
> >>> > Hi,
> >>> >If I do very very fast indexing(softcommit = 300 and
> hardcommit =
> >>> > 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6)
> as
> >>> you
> >>> > both said. Will fast indexing fail to index some data?
> >>> > Any suggestion on this ?
> >>> >
> >>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> >>> > andyetitmo...@gmail.com> wrote:
> >>> >
> >>> >> Yes, and doing so is painful and takes lots of people and hardware
> >>> >> resources to get there for large amounts of data and queries :)
> >>> >>
> >>> >> As Erick says, work backwards from 60s and first establish how high
> the
> >>> >> commit interval can be to satisfy your use case..
> >>> >> On 16 Mar 2015 16:04, "Erick Erickson" 
> >>> wrote:
> >>> >>
> >>> >> > First start by lengthening your soft and hard commit intervals
> >>> >> > substantially. Start with 6 and work backwards I'd say.
> >>> >> >
> >>> >> > Ramkumar has tuned the heck out of his installation to get the
> commit
> >>> >> > intervals to be that short ;).
> >>> >> >
> >>> >> > I'm betting that you'll see your RAM usage go way down, but that'
> s a
> >>> >> > guess until you test.
> >>> >> >
> >>> >> > Best,
> >>> >> > Erick
> >>> >> >
> >>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
> >>> nitinml...@gmail.com>
> >>> >> > wrote:
> >>> >> > > Hi Erick,
> >>> >> > > You are saying correct. Something, **"overlapping
> >>> >> searchers"
> >>> >> > > warning messages** are coming in logs.
> >>> >> > > **numDocs numbers** are changing when documents are adding at
> the
> >>> time
> >>> >> of
> >>> >> > > indexing.
> >>> >> > > Any help?
> >>> >> > >
> >>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> >>> >> > erickerick...@gmail.com>
> >>> >> > > wrote:
> >>> >> > >
> >>> >> > >> First, the soft commit interval is very short. Very, very,
> very,
> >>> very
> >>> >> > >> short. 300ms is
> >>> >> > >> just short of insane unless it's a typo ;).
> >>> >> >

Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Zheng Lin Edwin Yeo

Hi,

I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
when I try to index rich-text documents using REST API or the default
Documents module in Solr Admin UI, the documents that are indexed do not
appear immediately when I do a search. It only appears after I restarted
the Solr services (both shard1 and shard2).

However, the same issue do not happen when I index the same documents using
post.jar, and I can search for the indexed documents immediately.

Here's my ExtractingRequestHandler in solrconfig.xml.

  

  true
  ignored_

  
  true
  links
  ignored_

  

What could be the reason why this is happening, and any solutions to solve
it?

Regards,
Edwin

Re: Whole RAM consumed while Indexing.

2015-03-19 Thread Nitin Solanki

Hi Erick..
  I read your Article. Really nice...
Inside that you said that for bulk indexing. Set soft commit = 10 mins and
hard commit = 15sec. Is it also okay for my scenario?

On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson 
wrote:

> bq: As you said, do commits after 6 seconds
>
> No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_
> as Shawn said. So setting it to 6 is every minute.
>
> From solrconfig.xml, conveniently located immediately above the
>  tag:
>
> maxTime - Maximum amount of time in ms that is allowed to pass since a
> document was added before automatically triggering a new commit.
>
> Also, a lot of answers to soft and hard commits is here as I pointed
> out before, did you read it?
>
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best
> Erick
>
> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>  wrote:
> > Probably merged somewhat differently with some terms indexes repeating
> > between segments. Check the number of segments in data directory.And
> > do search for *:* and make sure both do have the same document counts.
> >
> > Also, In all these discussions, you still haven't answered about how
> > fast after indexing you want to _search_? Because, if you are not
> > actually searching while committing, you could even index on a
> > completely separate server (e.g. a faster one) and swap (or alias)
> > index in afterwards. Unless, of course, I missed it, it's a lot of
> > emails in a very short window of time.
> >
> > Regards,
> >Alex.
> >
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 18 March 2015 at 12:09, Nitin Solanki  wrote:
> >> When I kept my configuration to 300 for soft commit and 3000 for hard
> >> commit and indexed some amount of data, I got the data size of the whole
> >> index to be 6GB after completing the indexing.
> >>
> >> When I changed the configuration to 6 for soft commit and 6 for
> >> hard commit and indexed same data then I got the data size of the whole
> >> index to be 5GB after completing the indexing.
> >>
> >> But the number of documents in the both scenario were same. I am
> wondering
> >> how that can be possible?
> >>
> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki 
> wrote:
> >>
> >>> Hi Erick,
> >>>  I am just saying. I want to be sure on commits
> difference..
> >>> What if I do frequent commits or not? And why I am saying that I need
> to
> >>> commit things so very quickly because I have to index 28GB of data
> which
> >>> takes 7-8 hours(frequent commits).
> >>> As you said, do commits after 6 seconds then it will be more
> expensive.
> >>> If I don't encounter with **"overlapping searchers" warning messages**
> >>> then I feel it seems to be okay. Is it?
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>>
>  Don't do it. Really, why do you want to do this? This seems like
>  an "XY" problem, you haven't explained why you need to commit
>  things so very quickly.
> 
>  I suspect you haven't tried _searching_ while committing at such
>  a rate, and you might as well turn all your top-level caches off
>  in solrconfig.xml since they won't be useful at all.
> 
>  Best,
>  Erick
> 
>  On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki 
>  wrote:
>  > Hi,
>  >If I do very very fast indexing(softcommit = 300 and
> hardcommit =
>  > 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6)
> as
>  you
>  > both said. Will fast indexing fail to index some data?
>  > Any suggestion on this ?
>  >
>  > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>  > andyetitmo...@gmail.com> wrote:
>  >
>  >> Yes, and doing so is painful and takes lots of people and hardware
>  >> resources to get there for large amounts of data and queries :)
>  >>
>  >> As Erick says, work backwards from 60s and first establish how
> high the
>  >> commit interval can be to satisfy your use case..
>  >> On 16 Mar 2015 16:04, "Erick Erickson" 
>  wrote:
>  >>
>  >> > First start by lengthening your soft and hard commit intervals
>  >> > substantially. Start with 6 and work backwards I'd say.
>  >> >
>  >> > Ramkumar has tuned the heck out of his installation to get the
> commit
>  >> > intervals to be that short ;).
>  >> >
>  >> > I'm betting that you'll see your RAM usage go way down, but
> that' s a
>  >> > guess until you test.
>  >> >
>  >> > Best,
>  >> > Erick
>  >> >
>  >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>  nitinml...@gmail.com>
>  >> > wrote:
>  >> > > Hi Erick,
>  >> > > You are saying correct. Something, **"overlapping
>  >> searchers"
> >

Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Liu Bo

Hi Edvin

Please review your commit/soft-commit configuration,
"soft commits are about visibility, hard commits are about durability"
  by a wise man. :)

If you are doing NRT index and searching, your probably need a short soft
commit interval or commit explicitly in your request handler. Be advised
that these strategies and configurations need to be tested and adjusted
according to your data size, searching and index updating frequency.

You should be able to find the answer yourself here:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

All the best

Liu Bo

On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
> when I try to index rich-text documents using REST API or the default
> Documents module in Solr Admin UI, the documents that are indexed do not
> appear immediately when I do a search. It only appears after I restarted
> the Solr services (both shard1 and shard2).
>
> However, the same issue do not happen when I index the same documents using
> post.jar, and I can search for the indexed documents immediately.
>
> Here's my ExtractingRequestHandler in solrconfig.xml.
>
>  class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   ignored_
>
>   
>   true
>   links
>   ignored_
> 
>   
>
> What could be the reason why this is happening, and any solutions to solve
> it?
>
> Regards,
> Edwin
>

Re: how to store _text field

2015-03-19 Thread Mirko Torrisi


Hi Erick,

I'm sorry for this delay but I've just seen this reply.

I'm using the last version of solr and the default setting is to use the 
new kind of indexing, it doesn't use schema.xml and for that I have no 
idea about how set "store" for this field.
The content is grabbed because I've obtained results using the search 
function but it is not showed because it is not setted to "store".


I hope to be clear.
Thanks very much.

All the best,

Mirko

On 14/03/15 17:58, Erick Erickson wrote:

Right, your schema.xml file will define, perhaps, some "dynamic
fields". First insure that stored="true" is specified. If you change
this, you have to re-index the docs.

Second, insure that your "fl" parameter with the field is specified on
the requests, something like q=*:*&fl=eoe_txt.

Third, insure that you are actually sending content to that field when
you index docs.

If none of this helps, show us the definition from schema.xml and a
sample input document and a query that illustrate the problem please.

Best,
Erick

On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi
 wrote:

Hi Alexandre,

I need to visualize the content of _txt. For some reasons, actual it is not
showed in the results (the "response").
I guess that it doesn't happen because it isn't stored (for some default
setting that I'd like to change).

Thanks for your help,

Mirko


On 13/03/15 00:27, Alexandre Rafalovitch wrote:

Wait, step back. This is confusing. What's your real problem you are
trying to solve?

Regards,
 Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 19:50, Mirko Torrisi 
wrote:

Hi folks,

I googled and tried without success so I ask you: how can I modify the
setting of a field to store it ?

It is interesting to note that I did not add _text field so I guess it is
a
default one. Maybe it is normal that it is not showed on the result but
actually this is my real problem. It could be grand also to copy it in a
new
field but I do not know how to do it with the last Solr (5) and the new
kind
of schema. I know that I have to use curl but I do not know how to use it
to
copy a field.

Thank you in advance!
Cheers,

   Mirko

Connection pool shutdown error

2015-03-19 Thread phiroc

Hello,

I am trying to use the 4.9.1 SOLR Core API and the 1.3.2.RELEASE version of the 
Spring Data SOLR API, to connect to a SOLR server, but to no avail.

When I run Java application, I get the following errors:

---

Exception in thread "main" 
org.springframework.data.solr.UncategorizedSolrException: Error executing 
query; nested exception is org.apache.solr.client.solrj.SolrServerException: 
Error executing query
...
Caused by: java.lang.IllegalStateException: Connection pool shut down

-

I have tried changing Core API version (4.3.0, 4.4.0, ...) but to no avail.

Any help would be much appreciated.

Cheers,

Philippe




Here's my Solr Context:



package com.myco.archives.SolrGuiMain;



@Configuration
@EnableSolrRepositories(basePackages = { "com.myco.archives" }, 
multicoreSupport = false)
@ComponentScan
public class SolrContext {

private final StringHTTP_SEARCHARCHIVES = 
"http://mysolr.com:8990/solr/collection3";;

@Bean
public SolrServer solrServer() {
SolrServer server = new HttpSolrServer(HTTP_SEARCHARCHIVES);
return server;
}

@Bean
public SolrOperations solrTemplate() {
return new SolrTemplate(solrServer());
}

}

-

Here's my Repository Class:

import org.springframework.data.repository.CrudRepository;

public interface ArchiveDocumentRepository extends CrudRepository {

List findByText(String text);

List findByYmd(Date ymd);

}




And here's my App:

import 
org.springframework.context.annotation.AnnotationConfigApplicationContext;

public class App
{

private ArchiveDocumentRepository   archiveDocumentRepository;

public App() {

setContext();
processDocs();


}
public static void main(String[] args) {

new App();

}

public void setContext() throws RuntimeException {

AnnotationConfigApplicationContext context = new 
AnnotationConfigApplicationContext(SolrContext.class);

if (context != null) {

setArchiveDocumentRepository(context.getBean(ArchiveDocumentRepository.class));
}
context.close();
}

public final ArchiveDocumentRepository getArchiveDocumentRepository() {
return archiveDocumentRepository;
}

public final void 
setArchiveDocumentRepository(ArchiveDocumentRepository 
archiveDocumentRepository) {
this.archiveDocumentRepository = archiveDocumentRepository;
}

public void processDocs() {


Iterable docs = 
getArchiveDocumentRepository().findAll();

for (Document doc : docs) {
System.out.println("doc count = " + doc.getYmd());
}

}

}


---

Re: Connection pool shutdown error

2015-03-19 Thread Andrea Gazzarini

I bet the problem is how the SolrServer instance is used within Spring 
Repository. I think somewhere you should alternatively


- explicitly close the client each time.
- reuse the same instance (and finally close that)

But being a Spring newbie I cannot give you further information.

Best,
Andrea

On 03/19/2015 02:18 PM, phi...@free.fr wrote:

Hello,

I am trying to use the 4.9.1 SOLR Core API and the 1.3.2.RELEASE version of the 
Spring Data SOLR API, to connect to a SOLR server, but to no avail.

When I run Java application, I get the following errors:

---

Exception in thread "main" 
org.springframework.data.solr.UncategorizedSolrException: Error executing query; nested 
exception is org.apache.solr.client.solrj.SolrServerException: Error executing query
...
Caused by: java.lang.IllegalStateException: Connection pool shut down

-

I have tried changing Core API version (4.3.0, 4.4.0, ...) but to no avail.

Any help would be much appreciated.

Cheers,

Philippe




Here's my Solr Context:



package com.myco.archives.SolrGuiMain;



@Configuration
@EnableSolrRepositories(basePackages = { "com.myco.archives" }, 
multicoreSupport = false)
@ComponentScan
public class SolrContext {

private final StringHTTP_SEARCHARCHIVES = 
"http://mysolr.com:8990/solr/collection3";;

@Bean
public SolrServer solrServer() {
SolrServer server = new HttpSolrServer(HTTP_SEARCHARCHIVES);
return server;
}

@Bean
public SolrOperations solrTemplate() {
return new SolrTemplate(solrServer());
}

}

-

Here's my Repository Class:

import org.springframework.data.repository.CrudRepository;

public interface ArchiveDocumentRepository extends CrudRepository {

List findByText(String text);

List findByYmd(Date ymd);

}




And here's my App:

import 
org.springframework.context.annotation.AnnotationConfigApplicationContext;

public class App
{

private ArchiveDocumentRepository   archiveDocumentRepository;

public App() {

setContext();
processDocs();


}
public static void main(String[] args) {

new App();

}

public void setContext() throws RuntimeException {

AnnotationConfigApplicationContext context = new 
AnnotationConfigApplicationContext(SolrContext.class);

if (context != null) {

setArchiveDocumentRepository(context.getBean(ArchiveDocumentRepository.class));
}
context.close();
}

public final ArchiveDocumentRepository getArchiveDocumentRepository() {
return archiveDocumentRepository;
}

public final void 
setArchiveDocumentRepository(ArchiveDocumentRepository 
archiveDocumentRepository) {
this.archiveDocumentRepository = archiveDocumentRepository;
}

public void processDocs() {


Iterable docs = 
getArchiveDocumentRepository().findAll();

for (Document doc : docs) {
System.out.println("doc count = " + doc.getYmd());
}

}

}


---

Re: IP Address assgined to solr instance during the Cloud mode start

2015-03-19 Thread Shawn Heisey

On 3/19/2015 2:02 AM, davidphilip cherian wrote:
> When I started solr in cloud mode(interactive) and chose 2 nodes, it
> started and in the cloud-view screen it showed some different ip with url
> 169.254.5.207:7574, when clicked on that, it says page not found. When I
> modified url to localhost(http://localhost:7574/solr/#/~cloud) it
> worked(loaded solr admin page)
> Query is, Where is this ip address picked from? How to edit them?

An IP address of 169.254.x.x is what Windows will assign to a machine
when a network card configured for DHCP comes up and no DHCP response is
received.

http://packetlife.net/blog/2008/sep/24/169-254-0-0-addresses-explained/

When Solr starts in SolrCloud mode and you do not provide a "host"
property, Solr (when it is in SolrCloud mode) asks Java to ask the
operating system "what is my IP address?"  Whatever the response is to
that question is the default hostname that Solr will use when it
registers itself in zookeeper.

http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

In a nutshell, your operating system networking is misconfigured.  Once
you fix that (or provide a host property to Solr to override the bad
choice), you will need to manually edit your zookeeper data to remove
the bad node entry.  You will probably need to use the zkCli that comes
with zookeeper itself, or perhaps something like the zookeeper plugin
for eclipse.

Thanks,
Shawn

Re: How to configure Solr to use ZooKeeper ACLs in order to protect it's content

2015-03-19 Thread Dmitry Karanfilov

Looks like it is still broken.
The fixed name of system property  zkCredentialsProvider and zkACLProvider
are only impacted on the zkcli.sh script (org.apache.solr.cloud.ZkCLI).
So using command bellow, I'm able to *bootstrap *and *upconfig *to the
Zookeeper with appropriate credentials and ACLs:

export
SOLR_ZK_PROVIDERS="-DzkCredentialsProvider=org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider
-DzkACLProvider=org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider"
export SOLR_ZK_CREDS_AND_ACLS="-DzkDigestUsername=admin-user
-DzkDigestPassword=admin-password -DzkDigestReadonlyUsername=readonly-user
-DzkDigestReadonlyPassword=readonly-password"

java $SOLR_ZK_PROVIDERS $SOLR_ZK_CREDS_AND_ACLS -classpath
"server/solr-webapp/webapp/WEB-INF/lib/*:server/lib/ext/*"
org.apache.solr.cloud.ZkCLI -cmd bootstrap -zkhost 10.0.1.112:2181/solr
-solrhome /opt/solr/example/cloud/node1/solr/
java $SOLR_ZK_PROVIDERS $SOLR_ZK_CREDS_AND_ACLS -classpath
"server/solr-webapp/webapp/WEB-INF/lib/*:server/lib/ext/*"
org.apache.solr.cloud.ZkCLI -zkhost 10.0.1.112:2181/solr -cmd upconfig
-confdir /opt/solr/server/solr/configsets/data_driven_schema_configs/conf
-confname gettingstarted_shard1_replica1


But when I start a Solr it is not able to connect to the Zookeeper:

java $SOLR_ZK_PROVIDERS $SOLR_ZK_CREDS_AND_ACLS
-Dsolr.solr.home=/opt/solr/example/cloud/node1/solr
-Dsolr.data.dir=/opt/solr/example/cloud/node1/solr/gettingstarted_shard1_replica1
-Dsolr.log=/opt/solr/example/cloud/node1/logs -DzkHost=10.0.1.112:2181/solr
-Djetty.port=8983 -jar start.jar

Here is logs:
0[main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
156  [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider  ?
Deployment monitor /opt/solr-5.0.0/server/contexts at interval 0
205  [main] INFO  org.eclipse.jetty.deploy.DeploymentManager  ? Deployable
added: /opt/solr-5.0.0/server/contexts/solr-jetty-context.xml
4253 [main] INFO  org.eclipse.jetty.webapp.StandardDescriptorProcessor  ?
NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
4600 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  ?
SolrDispatchFilter.init()WebAppClassLoader=2048834776@7a1ebcd8
4650 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? JNDI not
configured for solr (NoInitialContextEx)
4651 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? using system
property solr.solr.home: /opt/solr/example/cloud/node1/solr
4657 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? new
SolrResourceLoader for directory: '/opt/solr/example/cloud/node1/solr/'
5305 [main] INFO  org.apache.solr.core.ConfigSolr  ? Loading container
configuration from /opt/solr/example/cloud/node1/solr/solr.xml
5646 [main] INFO  org.apache.solr.core.CoresLocator  ? Config-defined core
root directory: /opt/solr/example/cloud/node1/solr
5677 [main] INFO  org.apache.solr.core.CoreContainer  ? New CoreContainer
510147134
5682 [main] INFO  org.apache.solr.core.CoreContainer  ? Loading cores into
CoreContainer [instanceDir=/opt/solr/example/cloud/node1/solr/]
5749 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting socketTimeout to: 60
5750 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting urlScheme to: null
5760 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting connTimeout to: 6
5761 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting maxConnectionsPerHost to: 20
5771 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting maxConnections to: 1
5771 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting corePoolSize to: 0
5772 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting maximumPoolSize to: 2147483647
5772 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting maxThreadIdleTime to: 5
5778 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting sizeOfQueue to: -1
5779 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting fairnessPolicy to: false
5779 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory
 ? Setting useRetries to: false
6336 [main] INFO  org.apache.solr.update.UpdateShardHandler  ? Creating
UpdateShardHandler HTTP client with params:
socketTimeout=60&connTimeout=6&retry=true
6339 [main] INFO  org.apache.solr.logging.LogWatcher  ? SLF4J impl is
org.slf4j.impl.Log4jLoggerFactory
6340 [main] INFO  org.apache.solr.logging.LogWatcher  ? Registering Log
Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
6346 [main] INFO  org.apache.solr.core.CoreContainer  ? Host Name:
6347 [main] INFO  org.apache.solr.core.ZkContainer  ? Zookeeper client=
10.0.1.112:2181/solr7
6428 [main] INFO  org.apache.solr.cloud.ZkController  ? zkHost includes
chroot
*6430 [main] INFO  org.apache.

Re: Solr Deleted Docs Issue

2015-03-19 Thread Shawn Heisey

On 3/19/2015 12:24 AM, vicky desai wrote:
> I fail to understand why this deleted docs are not removed from index on
> merging. Is there a good documentation which explains how exactly is merging
> done?
>
> What can I do to solve this problem other than optimization?

Deleted docs *are* removed by automatic merging -- but only from the
specific segments that are merged, and only docs deleted before the
merge starts.  Deleted docs residing in other index segments are unaffected.

If you are replacing/updating/deleting documents in your index on a
regular basis, then there will always be deleted documents in the index,
unless you optimize.  As long as you don't do it frequently, there is
nothing wrong with optimizing your index, you just need to be aware of
the cost -- optimizing causes a large amount of I/O, which can affect
Solr performance while the optimize is happening and for a short time
afterwards.

What actual problem are you trying to solve by getting rid of your
deleted documents?  With 2-3 million total docs and about half a million
deleted docs, as long as you have enough memory in the system for
effective disk caching, I don't think performance will be a major
factor.  If you are finding that it does cause much lower performance,
you probably need more RAM in the server.

http://wiki.apache.org/solr/SolrPerformanceProblems

The only other thing that deleted documents might do to your search
results is affect the order of documents returned when you do not
explicitly sort them and rely on relevancy ranking, because the terms in
the deleted documents will affect the similarity calculation.

The most accessible information we have on how merging happens is the
visualization blog post that Erick already shared with you.  The third
video shows how the default merge policy works in recent Solr versions,
with a mergeFactor of 10 ... if you count the number of segments, you
will see that there are quite a lot more than 10 segments in the index
at all times.

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Each of the bars in the graph shows deleted documents with a dark gray
color, and you'll notice that it continually changes while the video
plays ... and the index never reaches a state with minimal deleted
documents.

Thanks,
Shawn

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Shawn Heisey

On 3/19/2015 2:09 AM, Derek Poh wrote:
> Am I right to saywe need todo the combine of duplicate records into 1
> before feeding it to Solr to index?
>
> I am coming from Endecawhich support the combine of duplicate records
> into 1 recordduring indexing. Was wondering if Solr support this.

If you index multiple documents with the same uniqueId field value, Solr
will delete the previous document and index the new one.  The data in
the previous document is never seen.

You could in theory write a custom UpdateRequestProcessor that looks for
the previous document and merges it in whatever way you desire, so the
combined information is what will be indexed, and configure Solr to use
that update processor ...but this capability is not available out of the
box.

An update processor that does this should probably be included with
Solr, but it would either need to be highly configurable, or everyone
would need to agree on exactly what rules should be followed when
combining duplicate records.

Thanks,
Shawn

Re: IP Address assgined to solr instance during the Cloud mode start

2015-03-19 Thread davidphilip cherian

Hi Shawn,

Thanks you for the detailed explanation.

On Thu, Mar 19, 2015 at 7:31 PM, Shawn Heisey  wrote:

> On 3/19/2015 2:02 AM, davidphilip cherian wrote:
> > When I started solr in cloud mode(interactive) and chose 2 nodes, it
> > started and in the cloud-view screen it showed some different ip with url
> > 169.254.5.207:7574, when clicked on that, it says page not found. When I
> > modified url to localhost(http://localhost:7574/solr/#/~cloud) it
> > worked(loaded solr admin page)
> > Query is, Where is this ip address picked from? How to edit them?
>
> An IP address of 169.254.x.x is what Windows will assign to a machine
> when a network card configured for DHCP comes up and no DHCP response is
> received.
>
> http://packetlife.net/blog/2008/sep/24/169-254-0-0-addresses-explained/
>
> When Solr starts in SolrCloud mode and you do not provide a "host"
> property, Solr (when it is in SolrCloud mode) asks Java to ask the
> operating system "what is my IP address?"  Whatever the response is to
> that question is the default hostname that Solr will use when it
> registers itself in zookeeper.
>
> http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params
>
> In a nutshell, your operating system networking is misconfigured.  Once
> you fix that (or provide a host property to Solr to override the bad
> choice), you will need to manually edit your zookeeper data to remove
> the bad node entry.  You will probably need to use the zkCli that comes
> with zookeeper itself, or perhaps something like the zookeeper plugin
> for eclipse.
>
> Thanks,
> Shawn
>
>

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-19 Thread Adnan Yaqoob

Erick

Does the Solr admin UI>>cloud view show the gettingstarted collection?
The "graph" view might help. It _sounds_ like somehow you didn't
actually create the collection.
[Adnan]- Yes

What steps did you follow to create the collection in SolrCloud? It's
possible you have the wrong ZK root somehow I suppose.
[Adnan] - I followed the steps from reference guide -
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

The collection exists and active - I verified from Solr Admin - Cloud graph
as well as zkCli

Adnan

On Wed, Mar 18, 2015 at 8:36 PM, Erick Erickson 
wrote:

> Does the Solr admin UI>>cloud view show the gettingstarted collection?
> The "graph" view might help. It _sounds_ like somehow you didn't
> actually create the collection.
>
> What steps did you follow to create the collection in SolrCloud? It's
> possible you have the wrong ZK root somehow I suppose.
>
> Best,
> Erick
>
> On Wed, Mar 18, 2015 at 12:32 PM, Adnan Yaqoob  wrote:
> > I'm getting following exception while trying to upload document on
> > SolrCloud using CloudSolrServer.
> >
> > Exception in thread "main" org.apache.solr.common.SolrException:
> > *Could not find collection :* gettingstarted
> > at
> org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
> > at
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
> > at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
> > at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
> > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
> > at Test.addDocumentSolrCloud(Test.java:265)
> > at Test.main(Test.java:284)
> >
> > I can query through Solr admin, able to upload document using
> > HttpSolrServer (single instance - non cloud mode) but CloudSolrServer.
> I've
> > also verified the collection exists on zookeeper using zkCli command.
> >
> > Following is the code snippet
> >
> > CloudSolrServer server = new CloudSolrServer("localhost:2181");
> > server.setDefaultCollection("gettingstarted");
> > SolrInputDocument doc = new SolrInputDocument();
> > doc.addField("id", id);
> > doc.addField("name", name);
> >
> > server.add(doc);
> >
> > server.commit();
> >
> > Not sure what I'm missing. My Zookeeper is running externally with two
> solr
> > nodes on same mac
> >
> > --
> > Regards,
> > *Adnan Yaqoob*
>



-- 
Regards,
*Adnan Yaqoob*

Re: Start stop solr started in solr cloud mode

2015-03-19 Thread Adnan Yaqoob

David

starting 1st node

bin\solr start -cloud -p 8983 -s C:\Java\solr-5.0.0\example\cloud\node1\solr


starting 2nd node
--
bin\solr -cloud -p 7574 -s C:\Java\solr-5.0.0\example\cloud\node2\solr -z
localhost:9983


The third would be similar to 2nd. Just modify the ports and path according
to your env

Adnan


On Thu, Mar 19, 2015 at 4:01 AM, davidphilip cherian <
davidphilipcher...@gmail.com> wrote:

> Hi,
> I started solr in cloud mode (interactive set up). 3 nodes, 3 shards and 1
> replica and a collection.  I stopped it using ./solr stop -all. How do I
> get the same above cloud mode setup to start? "./solr -c start"  started
> the new solr cloud instance all together where as I was looking for the
> previously set up instance to start?.  I am going through reference guide.
> I did not find any command for this. Please help.
>



-- 
Regards,
*Adnan Yaqoob*

Re: Solr returns incorrect results after sorting

2015-03-19 Thread kumarraj

*if the number of documents in one group is more than one then you cannot
ensure that this document reflects the main sort 

Is there a way the top record which is coming up in the group is considered
for sorting? 
We require to show the record from 212(even though price is low) in both the
cases of high to low or low to high..and still the main sorting should work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4194008.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr returns incorrect results after sorting

2015-03-19 Thread jim ferenczi

Then you just have to remove the group.sort especially if your group limit
is set to 1.
Le 19 mars 2015 16:57, "kumarraj"  a écrit :

> *if the number of documents in one group is more than one then you cannot
> ensure that this document reflects the main sort
>
> Is there a way the top record which is coming up in the group is considered
> for sorting?
> We require to show the record from 212(even though price is low) in both
> the
> cases of high to low or low to high..and still the main sorting should
> work?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4194008.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-19 Thread Chris Hostetter


: Does the Solr admin UI>>cloud view show the gettingstarted collection?
: The "graph" view might help. It _sounds_ like somehow you didn't
: actually create the collection.
: [Adnan]- Yes

if you can see the collection in the admin ui, can you please use the 
"Dump" menu option in the "Cloud" section to get the full JSON details of 
your cloud setup and include all of that verbatim in an email?

(I can't explain your problem, but those details might help folks spot 
where something went wrong)



: What steps did you follow to create the collection in SolrCloud? It's
: possible you have the wrong ZK root somehow I suppose.
: [Adnan] - I followed the steps from reference guide -
: 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
: 
: The collection exists and active - I verified from Solr Admin - Cloud graph
: as well as zkCli
: 
: Adnan
: 
: On Wed, Mar 18, 2015 at 8:36 PM, Erick Erickson 
: wrote:
: 
: > Does the Solr admin UI>>cloud view show the gettingstarted collection?
: > The "graph" view might help. It _sounds_ like somehow you didn't
: > actually create the collection.
: >
: > What steps did you follow to create the collection in SolrCloud? It's
: > possible you have the wrong ZK root somehow I suppose.
: >
: > Best,
: > Erick
: >
: > On Wed, Mar 18, 2015 at 12:32 PM, Adnan Yaqoob  wrote:
: > > I'm getting following exception while trying to upload document on
: > > SolrCloud using CloudSolrServer.
: > >
: > > Exception in thread "main" org.apache.solr.common.SolrException:
: > > *Could not find collection :* gettingstarted
: > > at
: > 
org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
: > > at
: > 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
: > > at
: > 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
: > > at
: > 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
: > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
: > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
: > > at Test.addDocumentSolrCloud(Test.java:265)
: > > at Test.main(Test.java:284)
: > >
: > > I can query through Solr admin, able to upload document using
: > > HttpSolrServer (single instance - non cloud mode) but CloudSolrServer.
: > I've
: > > also verified the collection exists on zookeeper using zkCli command.
: > >
: > > Following is the code snippet
: > >
: > > CloudSolrServer server = new CloudSolrServer("localhost:2181");
: > > server.setDefaultCollection("gettingstarted");
: > > SolrInputDocument doc = new SolrInputDocument();
: > > doc.addField("id", id);
: > > doc.addField("name", name);
: > >
: > > server.add(doc);
: > >
: > > server.commit();
: > >
: > > Not sure what I'm missing. My Zookeeper is running externally with two
: > solr
: > > nodes on same mac
: > >
: > > --
: > > Regards,
: > > *Adnan Yaqoob*
: >
: 
: 
: 
: -- 
: Regards,
: *Adnan Yaqoob*
: 

-Hoss
http://www.lucidworks.com/

Re: data import

2015-03-19 Thread abhishek tiwari

Hi ,

- architecture : master (1) - slave(3)
solrconfig:

 500 

 15000 false 

schema :
   <
field name="selling_price" type="tfloat" indexed="true" stored="true" /> <
field name="third_price" type="tfloat" indexed="true" stored="true" /> <
field name="discount_percentage" type="tfloat" indexed="true" stored="true"
/> <
field name="sort_2" type="tint" indexed="true" stored="true" />  <
field name="show_metacategory" type="variantFacet" indexed="true" stored=
"true" />  <
field name="products" type="tint" indexed="true" stored="true" /> <
field name="by_drive_supported" type="text_path_new" indexed="true" stored=
"true" multiValued="true"/> <
field name="by_primary_camera" type="text_path_new" indexed="true" stored=
"true" multiValued="true"/><
field name="by_dial_shape" type="text_path_new" indexed="true" stored="true"
multiValued="true"/>   <
field name="by_features" type="text_path_new" indexed="true" stored="true"
multiValued="true"/><
field name="speaker_configuration" type="text_path_new" indexed="true"
stored="true" multiValued="true"/>  id  <
copyField source="product" dest="product_keyword"/>   <
copyField source="list_price" dest="text"/> <
copyField source="seo_name" dest="text"/>

On Fri, Mar 13, 2015 at 2:25 PM, Antonio Jesús Sánchez Padial <
antonio.sanc...@inia.es> wrote:

> Maybe you should add some info about:
>
> - your architecture, number of servers, etc
> - your schema.xml
> - and the data (ammount, type, ...) you are indexing
>
> Best.
>
> El 13/03/2015 a las 9:37, abhishek tiwari escribió:
>
>  solr indexing taking too much time .
>>
>> What should i do to reduce time . working on solr 4.0.
>>
>>
> --
> Antonio Jesús Sánchez Padial
> Jefe del Servicio de Biometría
> antonio.sanc...@inia.es
> Tlfno: +34 91 347 6831
> Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria
> Ctra.m de La Coruña, km.7
> 28040 Madrid
>
>

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-03-19 Thread James Strassburg

Sorry, I've been a bit unfocused from this list for a bit. When I was
working with the APTF code I rewrote a big chunk of it and didn't include
the inclusion of the original tokens as I didn't need it at the time. That
feature could easily be added back in. I will see if I can find a bit of
time for that.

As for the other part of your message, are you suggesting that the token
indexes are not correct? There is a bit of a formatting issue with the text
and I'm not sure what you're getting at. Can you explain further please?

On Sun, Feb 8, 2015 at 3:04 PM, trhodesg  wrote:

> Thanks to everyone for the thought, time and effort put into
> AutoPhrasingTokenFilter(APTF)! It's a real lifesaver.
> While trying to add APTF to my indexing, i discovered that the original
> (TS)
> version throws an exception while indexing a 100MB PDF. The error
> isException writing document to the index; possible analysis errorThe
> modified (JS) version runs without error, but it removes the tokens used to
> create the phrase. They are needed.
> Before looking into this i have a question; Solr would normally tokenize
> the
> phrasethe peoples republic of china isasthe(1) peoples(2) republic(3) of(4)
> china(5) is(6)
> Defining the APTF phrase file asthe Solr admin analysis page reports that
> the APTF indexer tokenizes the phrase asWould it be possible for someone to
> explain the reasoning behind the discontinuous token numbering? As it is
> now
> phrase queries such as "republic of china" will fail. And i can't get
> proximity queries like "republic of"~10 to work either (though it seems
> they
> should). Wouldn't it be more flexible to return the following
> tokenizationThis allows spurious matches such as "peoples peoplesrepublic"
> but it seems like this type of event would be very rare. It has the
> advantage of allowing phrase queries to continue working the way most users
> think.
> Thank you for supporting more than one entity definition per phrase (ie
> peoplesrepublic and peoplesrepublicofchina). This is type of contraction is
> common in longer documents, especially when the first used phrase ends with
> a preposition. It helps support robust matching.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4184888.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Whole RAM consumed while Indexing.

2015-03-19 Thread Erick Erickson

That or even hard commit to 60 seconds. It's strictly a matter of how often
you want to close old segments and open new ones.

On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki  wrote:
> Hi Erick..
>   I read your Article. Really nice...
> Inside that you said that for bulk indexing. Set soft commit = 10 mins and
> hard commit = 15sec. Is it also okay for my scenario?
>
> On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson 
> wrote:
>
>> bq: As you said, do commits after 6 seconds
>>
>> No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_
>> as Shawn said. So setting it to 6 is every minute.
>>
>> From solrconfig.xml, conveniently located immediately above the
>>  tag:
>>
>> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> document was added before automatically triggering a new commit.
>>
>> Also, a lot of answers to soft and hard commits is here as I pointed
>> out before, did you read it?
>>
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best
>> Erick
>>
>> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>>  wrote:
>> > Probably merged somewhat differently with some terms indexes repeating
>> > between segments. Check the number of segments in data directory.And
>> > do search for *:* and make sure both do have the same document counts.
>> >
>> > Also, In all these discussions, you still haven't answered about how
>> > fast after indexing you want to _search_? Because, if you are not
>> > actually searching while committing, you could even index on a
>> > completely separate server (e.g. a faster one) and swap (or alias)
>> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> > emails in a very short window of time.
>> >
>> > Regards,
>> >Alex.
>> >
>> > 
>> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 18 March 2015 at 12:09, Nitin Solanki  wrote:
>> >> When I kept my configuration to 300 for soft commit and 3000 for hard
>> >> commit and indexed some amount of data, I got the data size of the whole
>> >> index to be 6GB after completing the indexing.
>> >>
>> >> When I changed the configuration to 6 for soft commit and 6 for
>> >> hard commit and indexed same data then I got the data size of the whole
>> >> index to be 5GB after completing the indexing.
>> >>
>> >> But the number of documents in the both scenario were same. I am
>> wondering
>> >> how that can be possible?
>> >>
>> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki 
>> wrote:
>> >>
>> >>> Hi Erick,
>> >>>  I am just saying. I want to be sure on commits
>> difference..
>> >>> What if I do frequent commits or not? And why I am saying that I need
>> to
>> >>> commit things so very quickly because I have to index 28GB of data
>> which
>> >>> takes 7-8 hours(frequent commits).
>> >>> As you said, do commits after 6 seconds then it will be more
>> expensive.
>> >>> If I don't encounter with **"overlapping searchers" warning messages**
>> >>> then I feel it seems to be okay. Is it?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >>> wrote:
>> >>>
>>  Don't do it. Really, why do you want to do this? This seems like
>>  an "XY" problem, you haven't explained why you need to commit
>>  things so very quickly.
>> 
>>  I suspect you haven't tried _searching_ while committing at such
>>  a rate, and you might as well turn all your top-level caches off
>>  in solrconfig.xml since they won't be useful at all.
>> 
>>  Best,
>>  Erick
>> 
>>  On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki 
>>  wrote:
>>  > Hi,
>>  >If I do very very fast indexing(softcommit = 300 and
>> hardcommit =
>>  > 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6)
>> as
>>  you
>>  > both said. Will fast indexing fail to index some data?
>>  > Any suggestion on this ?
>>  >
>>  > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>>  > andyetitmo...@gmail.com> wrote:
>>  >
>>  >> Yes, and doing so is painful and takes lots of people and hardware
>>  >> resources to get there for large amounts of data and queries :)
>>  >>
>>  >> As Erick says, work backwards from 60s and first establish how
>> high the
>>  >> commit interval can be to satisfy your use case..
>>  >> On 16 Mar 2015 16:04, "Erick Erickson" 
>>  wrote:
>>  >>
>>  >> > First start by lengthening your soft and hard commit intervals
>>  >> > substantially. Start with 6 and work backwards I'd say.
>>  >> >
>>  >> > Ramkumar has tuned the heck out of his installation to get the
>> commit
>>  >> > intervals to be that short ;).
>>  >> >
>>  >> > I'm betting that you'll see your RAM usage go way down, but
>> that' s a
>>  >> >

Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Erick Erickson

The post jar issues a hard commit (openSearcher=true) as part of the
operation. As Liu says, you are probably not committing the changes
after ingestion.

You can issue this from a browser:
.solr/collection/update?commit=true
to force a commit manually.

Best,
Erick

On Thu, Mar 19, 2015 at 3:54 AM, Liu Bo  wrote:
> Hi Edvin
>
> Please review your commit/soft-commit configuration,
> "soft commits are about visibility, hard commits are about durability"
>   by a wise man. :)
>
> If you are doing NRT index and searching, your probably need a short soft
> commit interval or commit explicitly in your request handler. Be advised
> that these strategies and configurations need to be tested and adjusted
> according to your data size, searching and index updating frequency.
>
> You should be able to find the answer yourself here:
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> All the best
>
> Liu Bo
>
> On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo  wrote:
>
>> Hi,
>>
>> I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
>> when I try to index rich-text documents using REST API or the default
>> Documents module in Solr Admin UI, the documents that are indexed do not
>> appear immediately when I do a search. It only appears after I restarted
>> the Solr services (both shard1 and shard2).
>>
>> However, the same issue do not happen when I index the same documents using
>> post.jar, and I can search for the indexed documents immediately.
>>
>> Here's my ExtractingRequestHandler in solrconfig.xml.
>>
>>   >   class="solr.extraction.ExtractingRequestHandler" >
>> 
>>   true
>>   ignored_
>>
>>   
>>   true
>>   links
>>   ignored_
>> 
>>   
>>
>> What could be the reason why this is happening, and any solutions to solve
>> it?
>>
>> Regards,
>> Edwin
>>

Re: how to store _text field

2015-03-19 Thread Erick Erickson

Hmm, not all that sure. That's one thing about schemaless indexing, it
has to guess. It does the best it can, but it's quite possible that it
guesses wrong.

If this is a "mananged schema", you can use the REST API commands to
make whatever field you want. Or you can start over with a concrete
schema.xml and use _that_. Otherwise, I'm not sure what to say without
actually being on your system.

Wish I could help more.
Erick

On Thu, Mar 19, 2015 at 5:39 AM, Mirko Torrisi
 wrote:
> Hi Erick,
>
> I'm sorry for this delay but I've just seen this reply.
>
> I'm using the last version of solr and the default setting is to use the new
> kind of indexing, it doesn't use schema.xml and for that I have no idea
> about how set "store" for this field.
> The content is grabbed because I've obtained results using the search
> function but it is not showed because it is not setted to "store".
>
> I hope to be clear.
> Thanks very much.
>
> All the best,
>
> Mirko
>
>
> On 14/03/15 17:58, Erick Erickson wrote:
>>
>> Right, your schema.xml file will define, perhaps, some "dynamic
>> fields". First insure that stored="true" is specified. If you change
>> this, you have to re-index the docs.
>>
>> Second, insure that your "fl" parameter with the field is specified on
>> the requests, something like q=*:*&fl=eoe_txt.
>>
>> Third, insure that you are actually sending content to that field when
>> you index docs.
>>
>> If none of this helps, show us the definition from schema.xml and a
>> sample input document and a query that illustrate the problem please.
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi
>>  wrote:
>>>
>>> Hi Alexandre,
>>>
>>> I need to visualize the content of _txt. For some reasons, actual it is
>>> not
>>> showed in the results (the "response").
>>> I guess that it doesn't happen because it isn't stored (for some default
>>> setting that I'd like to change).
>>>
>>> Thanks for your help,
>>>
>>> Mirko
>>>
>>>
>>> On 13/03/15 00:27, Alexandre Rafalovitch wrote:

 Wait, step back. This is confusing. What's your real problem you are
 trying to solve?

 Regards,
  Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 12 March 2015 at 19:50, Mirko Torrisi 
 wrote:
>
> Hi folks,
>
> I googled and tried without success so I ask you: how can I modify the
> setting of a field to store it ?
>
> It is interesting to note that I did not add _text field so I guess it
> is
> a
> default one. Maybe it is normal that it is not showed on the result but
> actually this is my real problem. It could be grand also to copy it in
> a
> new
> field but I do not know how to do it with the last Solr (5) and the new
> kind
> of schema. I know that I have to use curl but I do not know how to use
> it
> to
> copy a field.
>
> Thank you in advance!
> Cheers,
>
>Mirko
>>>
>>>
>

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Erick Erickson

bq: Am I right to saywe need todo the combine of duplicate records
into 1 before feeding it to Solr to index?

That's what I'd do. As Shawn says, if you simply fire them both at
Solr the more recent one will replace the older one.

Best,
Erick

On Thu, Mar 19, 2015 at 7:44 AM, Shawn Heisey  wrote:
> On 3/19/2015 2:09 AM, Derek Poh wrote:
>> Am I right to saywe need todo the combine of duplicate records into 1
>> before feeding it to Solr to index?
>>
>> I am coming from Endecawhich support the combine of duplicate records
>> into 1 recordduring indexing. Was wondering if Solr support this.
>
> If you index multiple documents with the same uniqueId field value, Solr
> will delete the previous document and index the new one.  The data in
> the previous document is never seen.
>
> You could in theory write a custom UpdateRequestProcessor that looks for
> the previous document and merges it in whatever way you desire, so the
> combined information is what will be indexed, and configure Solr to use
> that update processor ...but this capability is not available out of the
> box.
>
> An update processor that does this should probably be included with
> Solr, but it would either need to be highly configurable, or everyone
> would need to agree on exactly what rules should be followed when
> combining duplicate records.
>
> Thanks,
> Shawn
>

Re: data import

2015-03-19 Thread Shawn Heisey

On 3/19/2015 11:47 AM, abhishek tiwari wrote:
>  500 

You're doing soft commits as often as twice a second.  You have
configured 500 milliseconds here.  This might have something to do with
your slow indexing speed.  A soft commit is less expensive than a full
hard commit, but soft commits are *NOT* free, and they aren't even cheap.

I doubt that you *need* your documents to be visible within half a
second of indexing them ... and there's a good chance that even with
this config they won't be visible that soon, because each commit is
probably going to take longer than half a second to complete.  With a
500 millisecond autoSoftCommit configuration, your server may be doing
commit operations close to 100% of the time while indexing is happening.

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Also, the dataimport handler is single threaded, so if you are only
using one handler definition in solrconfig.xml, there is no parallel
indexing.  You'll need to write your own multi-threaded indexing program
if you want parallel indexing.

Thanks,
Shawn

Spatial Search killing Solr process

2015-03-19 Thread Henrique O. Santos

Hello all,

I have a Solr 4.10.3 collection with ~55 million documents (index size about 
6GB) with a LatLonType field and a dynamic field for storing the coordinates, 
like stated here 
https://wiki.apache.org/solr/SpatialSearch#Schema_Configuration 


I am trying to use geofilt to filter query results, but it is triggering the 
"OOM Killer script”, killing Solr process, after some seconds of processing. 
Other queries run fine. I have a machine with 64GB RAM, but just about 10GB 
free. Is that enough to handle a query like this?

Thanks,
Henrique.

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-19 Thread Chris Hostetter


: Chris,
: Please find attached Dump

nothing jumps out at me as looking odd, but i'm not the expert on this 
stuff either -- hopefully someone else can take a look.

can you provide us with some more detials on what exactly you've done?  
you said ...

: > : What steps did you follow to create the collection in SolrCloud? It's
: > : possible you have the wrong ZK root somehow I suppose.
: > : [Adnan] - I followed the steps from reference guide -
: > :
: > 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: > :
: > 
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

...but can you be more explicit please?  how exactly did you start up your 
solr nodes?  what exact commands did you run?

my best guess: maybe you've specified a chroot on your zk server when 
running Solr, but maybe you don't have that same chroot path when 
constructing the client?  knowing exactly how you started the solr 
processes would help (and/or: what "-DzkHost" option do you see in the 
"JVM" "Args" section of the "Dashboard" screen in the Solr UI; and/or 
what command line args do you specify when running zkcli to see the 
collection?) 





-Hoss
http://www.lucidworks.com/

Re: Spatial Search killing Solr process

2015-03-19 Thread david.w.smi...@gmail.com

Hi Henrique,

Please see the Solr reference guide instead of the “community wiki” you
referenced:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search  (you can
download one for 4.10; the online link is always for the latest).

For spatial filtering, *especially* at-scale, you really should be using
RPT instead of LatLonType.  It requires no memory for filtering.  RPT is
poor at distance sorting but you didn’t mention that.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Mar 19, 2015 at 4:30 PM, Henrique O. Santos 
wrote:

> Hello all,
>
> I have a Solr 4.10.3 collection with ~55 million documents (index size
> about 6GB) with a LatLonType field and a dynamic field for storing the
> coordinates, like stated here
> https://wiki.apache.org/solr/SpatialSearch#Schema_Configuration <
> https://wiki.apache.org/solr/SpatialSearch#Schema_Configuration>
>
> I am trying to use geofilt to filter query results, but it is triggering
> the "OOM Killer script”, killing Solr process, after some seconds of
> processing. Other queries run fine. I have a machine with 64GB RAM, but
> just about 10GB free. Is that enough to handle a query like this?
>
> Thanks,
> Henrique.

ApacheCon NA 2015 in Austin, Texas

2015-03-19 Thread Uwe Schindler

Dear Apache Lucene/Solr enthusiast,

In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love 
to have you in attendance. You can save $300 on admission by registering NOW, 
since the early bird price ends on the 21st.

Register at http://s.apache.org/acna2015-reg

ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and 
we'll have Brian Behlendorf, who started this whole thing, keynoting for us, 
and you'll have a chance to meet some of the original Apache Group, who will be 
there to celebrate with us.

We also have talks about Apache Lucene and Apache Solr in 7 tracks of great 
talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and 
evening events where you can deepen your connection with the larger Apache 
community. See the full schedule at http://apacheconna2015.sched.org/

And if you have any questions, comments, or just want to hang out with us 
before and during the event, follow us on Twitter - @apachecon - or drop by 
#apachecon on the Freenode IRC network.

Hope to see you in Austin!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

Re: Spatial Search killing Solr process

2015-03-19 Thread Henrique O. Santos

Thanks, David. I’m looking at it now.

> On Mar 19, 2015, at 4:51 PM, david.w.smi...@gmail.com wrote:
> 
> Hi Henrique,
> 
> Please see the Solr reference guide instead of the “community wiki” you
> referenced:
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search  (you can
> download one for 4.10; the online link is always for the latest).
> 
> For spatial filtering, *especially* at-scale, you really should be using
> RPT instead of LatLonType.  It requires no memory for filtering.  RPT is
> poor at distance sorting but you didn’t mention that.
> 
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
> 
> On Thu, Mar 19, 2015 at 4:30 PM, Henrique O. Santos 
> wrote:
> 
>> Hello all,
>> 
>> I have a Solr 4.10.3 collection with ~55 million documents (index size
>> about 6GB) with a LatLonType field and a dynamic field for storing the
>> coordinates, like stated here
>> https://wiki.apache.org/solr/SpatialSearch#Schema_Configuration <
>> https://wiki.apache.org/solr/SpatialSearch#Schema_Configuration>
>> 
>> I am trying to use geofilt to filter query results, but it is triggering
>> the "OOM Killer script”, killing Solr process, after some seconds of
>> processing. Other queries run fine. I have a machine with 64GB RAM, but
>> just about 10GB free. Is that enough to handle a query like this?
>> 
>> Thanks,
>> Henrique.

Re: Facet pivot sorting while combining Stats Component With Pivots in Solr 5

2015-03-19 Thread Yonik Seeley

On Fri, Mar 13, 2015 at 1:43 PM, Dominique Bejean
 wrote:
> Thank you for the response
>
> This is something Heliosearch can do. Ionic Seeley, created a JIRA ticket
> to back port this feature to Solr 5.

Oh, I'm charged now, am I?  ;-)

I'ts been committed, and will be in Solr 5.1

Here's an example of sorting the buckets by something other than count:

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{
 terms:{
   field : cat,
   sort : "x desc",   // can also use sort:{x:desc}
   facet:{
 x : "avg(price)",
 y : "sum(price)"
   }
 }
   }
 }
'

-Yonik

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-19 Thread Timothy Potter

Are you using a SolrJ client from 4.x to connect to a Solr 5 cluster?

On Wed, Mar 18, 2015 at 1:32 PM, Adnan Yaqoob  wrote:

> I'm getting following exception while trying to upload document on
> SolrCloud using CloudSolrServer.
>
> Exception in thread "main" org.apache.solr.common.SolrException:
> *Could not find collection :* gettingstarted
> at
> org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
> at Test.addDocumentSolrCloud(Test.java:265)
> at Test.main(Test.java:284)
>
> I can query through Solr admin, able to upload document using
> HttpSolrServer (single instance - non cloud mode) but CloudSolrServer. I've
> also verified the collection exists on zookeeper using zkCli command.
>
> Following is the code snippet
>
> CloudSolrServer server = new CloudSolrServer("localhost:2181");
> server.setDefaultCollection("gettingstarted");
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", id);
> doc.addField("name", name);
>
> server.add(doc);
>
> server.commit();
>
> Not sure what I'm missing. My Zookeeper is running externally with two solr
> nodes on same mac
>
> --
> Regards,
> *Adnan Yaqoob*
>

Solr hangs / LRU operations are heavy on cpu

2015-03-19 Thread Sergey Shvets

Hi,

we have quite a problem with Solr. We are running it in a config 6x3, and
suddenly solr started to hang, taking all the available cpu on the nodes.

In the threads dump noticed things like this can eat lot of CPU time


   - org.apache.solr.search.LRUCache.put(LRUCache.java:116)
   -
   org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:705)
   -
   
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:155)
   -
   
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:183)
   -
   
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88)
   -
   org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158)
   -
   
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148)
   -
   
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242)
   -
   org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
   - org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96)
   -
   
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:52)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:758)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
   -
   
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
   -
   
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
   -
   
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
   -
   
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
   -
   
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
   -
   
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
   -
   org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
   -
   
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)


The cache itself is very minimalistic


  




true
20
200

Solr version is 4.10.3

Any of help is appreciated!

sergey

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-19 Thread Adnan Yaqoob

Yes. Just before your email I was able to figure out. My project was set to
user solrj 4.10.3 everything was working fine except cloud so I didn't
noticed.
After I switched to Solrj 5 it's working now

Thanks everyone for supporting

Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Zheng Lin Edwin Yeo

Thank you for the information.

Yes, the program is working correctly now and I can search for the
documents immediately after issuing commit=true.

Regards,
Edwin


On 20 March 2015 at 04:07, Erick Erickson  wrote:

> The post jar issues a hard commit (openSearcher=true) as part of the
> operation. As Liu says, you are probably not committing the changes
> after ingestion.
>
> You can issue this from a browser:
> .solr/collection/update?commit=true
> to force a commit manually.
>
> Best,
> Erick
>
> On Thu, Mar 19, 2015 at 3:54 AM, Liu Bo  wrote:
> > Hi Edvin
> >
> > Please review your commit/soft-commit configuration,
> > "soft commits are about visibility, hard commits are about durability"
> >   by a wise man. :)
> >
> > If you are doing NRT index and searching, your probably need a short soft
> > commit interval or commit explicitly in your request handler. Be advised
> > that these strategies and configurations need to be tested and adjusted
> > according to your data size, searching and index updating frequency.
> >
> > You should be able to find the answer yourself here:
> >
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > All the best
> >
> > Liu Bo
> >
> > On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo 
> wrote:
> >
> >> Hi,
> >>
> >> I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
> >> when I try to index rich-text documents using REST API or the default
> >> Documents module in Solr Admin UI, the documents that are indexed do not
> >> appear immediately when I do a search. It only appears after I restarted
> >> the Solr services (both shard1 and shard2).
> >>
> >> However, the same issue do not happen when I index the same documents
> using
> >> post.jar, and I can search for the indexed documents immediately.
> >>
> >> Here's my ExtractingRequestHandler in solrconfig.xml.
> >>
> >>>>   class="solr.extraction.ExtractingRequestHandler" >
> >> 
> >>   true
> >>   ignored_
> >>
> >>   
> >>   true
> >>   links
> >>   ignored_
> >> 
> >>   
> >>
> >> What could be the reason why this is happening, and any solutions to
> solve
> >> it?
> >>
> >> Regards,
> >> Edwin
> >>
>

Re: Unable to index rich-text documents in Solr Cloud

2015-03-19 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Yes, I'm using the /update/extract handler. I'm not sure about the
shards.qt parameter too.

Regards,
Edwin


On 19 March 2015 at 13:18, Shawn Heisey  wrote:

> On 3/18/2015 1:22 AM, Zheng Lin Edwin Yeo wrote:
> > I'm having some issues with indexing rich-text documents from the Solr
> > Cloud. When I tried to index a pdf or word document, I get the following
> > error:
> >
> >
> > org.apache.solr.common.SolrException: Bad Request
> >
> >
> >
> > request:
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
>
> This request appears to be one of the requests that SolrCloud makes
> between its different nodes, but it is using the /update handler.  I
> assume that when you sent the request, you sent it to the
> /update/extract handler because it's a rich text document?  The /update
> handler can't do rich text documents, it's only for documents in json,
> xml, csv, javabin, etc that are formatted in specific ways.
>
> One thing I'm wondering is whether the Extracting handler requires a
> shards.qt parameter, also set to /update/extract, to work right with
> SolrCloud.  I have never used that handler myself, so I've got no idea
> what is required to make it work right.
>
> Thanks,
> Shawn
>
>

Re: Solr hangs / LRU operations are heavy on cpu

2015-03-19 Thread Umesh Prasad

It might be because LRUCache by default will try to evict its entries on
each call to put and putAll. LRUCache is built on top of java's
LinkedHashMap. Check the javadoc of removeEldestEntry



Try using LFUCache and a separate cleanup thread .. We have been using that
for over 2 yrs now without any issues ..

For comparison of Cache in solr you can check this link


On 20 March 2015 at 04:05, Sergey Shvets  wrote:

> LRUCache


It


-- 
Thanks & Regards
Umesh Prasad
Tech Lead @ flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/

Re: data import

2015-03-19 Thread Midas A

Hi Shawn ,

Thanks for replying .. I need clarity on following points
a) Making store false in schema for few fields will improve indexing time ?
b) Does soft commit and hard commit configuration depends on hard ware ?
c) Should i do merge factor , Rambuffersize configuration ? and how should
i decide these values ?


We are doing full indexing and it takes around 4.5 hrs ..(20 M documents )

Regards,
MA

On Fri, Mar 20, 2015 at 1:57 AM, Shawn Heisey  wrote:

> On 3/19/2015 11:47 AM, abhishek tiwari wrote:
> >  500 
>
> You're doing soft commits as often as twice a second.  You have
> configured 500 milliseconds here.  This might have something to do with
> your slow indexing speed.  A soft commit is less expensive than a full
> hard commit, but soft commits are *NOT* free, and they aren't even cheap.
>
> I doubt that you *need* your documents to be visible within half a
> second of indexing them ... and there's a good chance that even with
> this config they won't be visible that soon, because each commit is
> probably going to take longer than half a second to complete.  With a
> 500 millisecond autoSoftCommit configuration, your server may be doing
> commit operations close to 100% of the time while indexing is happening.
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Also, the dataimport handler is single threaded, so if you are only
> using one handler definition in solrconfig.xml, there is no parallel
> indexing.  You'll need to write your own multi-threaded indexing program
> if you want parallel indexing.
>
> Thanks,
> Shawn
>
>

Re: Whole RAM consumed while Indexing.

2015-03-19 Thread Nitin Solanki

On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson 
wrote:

> That or even hard commit to 60 seconds. It's strictly a matter of how often
> you want to close old segments and open new ones.
>
> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki 
> wrote:
> > Hi Erick..
> >   I read your Article. Really nice...
> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
> and
> > hard commit = 15sec. Is it also okay for my scenario?
> >
> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson  >
> > wrote:
> >
> >> bq: As you said, do commits after 6 seconds
> >>
> >> No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_
> >> as Shawn said. So setting it to 6 is every minute.
> >>
> >> From solrconfig.xml, conveniently located immediately above the
> >>  tag:
> >>
> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
> >> document was added before automatically triggering a new commit.
> >>
> >> Also, a lot of answers to soft and hard commits is here as I pointed
> >> out before, did you read it?
> >>
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
> >>  wrote:
> >> > Probably merged somewhat differently with some terms indexes repeating
> >> > between segments. Check the number of segments in data directory.And
> >> > do search for *:* and make sure both do have the same document counts.
> >> >
> >> > Also, In all these discussions, you still haven't answered about how
> >> > fast after indexing you want to _search_? Because, if you are not
> >> > actually searching while committing, you could even index on a
> >> > completely separate server (e.g. a faster one) and swap (or alias)
> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
> >> > emails in a very short window of time.
> >> >
> >> > Regards,
> >> >Alex.
> >> >
> >> > 
> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > http://www.solr-start.com/
> >> >
> >> >
> >> > On 18 March 2015 at 12:09, Nitin Solanki 
> wrote:
> >> >> When I kept my configuration to 300 for soft commit and 3000 for hard
> >> >> commit and indexed some amount of data, I got the data size of the
> whole
> >> >> index to be 6GB after completing the indexing.
> >> >>
> >> >> When I changed the configuration to 6 for soft commit and 6
> for
> >> >> hard commit and indexed same data then I got the data size of the
> whole
> >> >> index to be 5GB after completing the indexing.
> >> >>
> >> >> But the number of documents in the both scenario were same. I am
> >> wondering
> >> >> how that can be possible?
> >> >>
> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki  >
> >> wrote:
> >> >>
> >> >>> Hi Erick,
> >> >>>  I am just saying. I want to be sure on commits
> >> difference..
> >> >>> What if I do frequent commits or not? And why I am saying that I
> need
> >> to
> >> >>> commit things so very quickly because I have to index 28GB of data
> >> which
> >> >>> takes 7-8 hours(frequent commits).
> >> >>> As you said, do commits after 6 seconds then it will be more
> >> expensive.
> >> >>> If I don't encounter with **"overlapping searchers" warning
> messages**
> >> >>> then I feel it seems to be okay. Is it?
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> >> erickerick...@gmail.com>
> >> >>> wrote:
> >> >>>
> >>  Don't do it. Really, why do you want to do this? This seems like
> >>  an "XY" problem, you haven't explained why you need to commit
> >>  things so very quickly.
> >> 
> >>  I suspect you haven't tried _searching_ while committing at such
> >>  a rate, and you might as well turn all your top-level caches off
> >>  in solrconfig.xml since they won't be useful at all.
> >> 
> >>  Best,
> >>  Erick
> >> 
> >>  On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <
> nitinml...@gmail.com>
> >>  wrote:
> >>  > Hi,
> >>  >If I do very very fast indexing(softcommit = 300 and
> >> hardcommit =
> >>  > 3000) v/s slow indexing (softcommit = 6 and hardcommit =
> 6)
> >> as
> >>  you
> >>  > both said. Will fast indexing fail to index some data?
> >>  > Any suggestion on this ?
> >>  >
> >>  > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> >>  > andyetitmo...@gmail.com> wrote:
> >>  >
> >>  >> Yes, and doing so is painful and takes lots of people and
> hardware
> >>  >> resources to get there for large amounts of data and queries :)
> >>  >>
> >>  >> As Erick says, work backwards from 60s and first establish how
> >> high the
> >>  >> commit interval can be to satisfy your use case..
> >>  >> On 16 Mar 2015 16:04, "Erick Erickson"  >
> >>  wrote:
> >>  >>
> >>  >> > First start by lengthening your s

Re: index duplicate records from data source into 1 document

2015-03-19 Thread Derek Poh


Oh that is how Solr works...

On 3/19/2015 10:44 PM, Shawn Heisey wrote:

On 3/19/2015 2:09 AM, Derek Poh wrote:

Am I right to saywe need todo the combine of duplicate records into 1
before feeding it to Solr to index?

I am coming from Endecawhich support the combine of duplicate records
into 1 recordduring indexing. Was wondering if Solr support this.

If you index multiple documents with the same uniqueId field value, Solr
will delete the previous document and index the new one.  The data in
the previous document is never seen.

You could in theory write a custom UpdateRequestProcessor that looks for
the previous document and merges it in whatever way you desire, so the
combined information is what will be indexed, and configure Solr to use
that update processor ...but this capability is not available out of the
box.

An update processor that does this should probably be included with
Solr, but it would either need to be highly configurable, or everyone
would need to agree on exactly what rules should be followed when
combining duplicate records.

Thanks,
Shawn

Re: Whole RAM consumed while Indexing.

2015-03-19 Thread Nitin Solanki

Hi Erick,
   I read mergeFactor Policy for indexing. By default, mergerFactor
is 10. As said in document,

High value merge factor (e.g., 25):

   - Pro: Generally improves indexing speed
   - Con: Less frequent merges, resulting in a collection with more index
   files which may slow searching

Low value merge factor (e.g., 2):

   - Pro: Smaller number of index files, which speeds up searching.
   - Con: More segment merges slow down indexing.

So, My main purpose is **searching**. Searching must be fast. Therefore, If
I set the value of **mergeFactor = 2 ** then indexing will be slow but
searching may fast right.

Once Again, I will tell. I am indexing(Total data size - 28GB)  2
document at a time that encounter commits after 15 seconds(hard commit) and
10 mins(soft commit).

Is searching be fast, if I set **mergeFactor = 2 ** and what should be the
value for ramBufferSizeMB, maxBufferedDocs, maxIndexingThreads?

Right now, All value are set by default..

On Fri, Mar 20, 2015 at 11:42 AM, Nitin Solanki 
wrote:

>
>
> On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson 
> wrote:
>
>> That or even hard commit to 60 seconds. It's strictly a matter of how
>> often
>> you want to close old segments and open new ones.
>>
>> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki 
>> wrote:
>> > Hi Erick..
>> >   I read your Article. Really nice...
>> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
>> and
>> > hard commit = 15sec. Is it also okay for my scenario?
>> >
>> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> bq: As you said, do commits after 6 seconds
>> >>
>> >> No, No, No. I'm NOT saying 6 seconds! That time is in
>> _milliseconds_
>> >> as Shawn said. So setting it to 6 is every minute.
>> >>
>> >> From solrconfig.xml, conveniently located immediately above the
>> >>  tag:
>> >>
>> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> >> document was added before automatically triggering a new commit.
>> >>
>> >> Also, a lot of answers to soft and hard commits is here as I pointed
>> >> out before, did you read it?
>> >>
>> >>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>> >>  wrote:
>> >> > Probably merged somewhat differently with some terms indexes
>> repeating
>> >> > between segments. Check the number of segments in data directory.And
>> >> > do search for *:* and make sure both do have the same document
>> counts.
>> >> >
>> >> > Also, In all these discussions, you still haven't answered about how
>> >> > fast after indexing you want to _search_? Because, if you are not
>> >> > actually searching while committing, you could even index on a
>> >> > completely separate server (e.g. a faster one) and swap (or alias)
>> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> >> > emails in a very short window of time.
>> >> >
>> >> > Regards,
>> >> >Alex.
>> >> >
>> >> > 
>> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> > http://www.solr-start.com/
>> >> >
>> >> >
>> >> > On 18 March 2015 at 12:09, Nitin Solanki 
>> wrote:
>> >> >> When I kept my configuration to 300 for soft commit and 3000 for
>> hard
>> >> >> commit and indexed some amount of data, I got the data size of the
>> whole
>> >> >> index to be 6GB after completing the indexing.
>> >> >>
>> >> >> When I changed the configuration to 6 for soft commit and 6
>> for
>> >> >> hard commit and indexed same data then I got the data size of the
>> whole
>> >> >> index to be 5GB after completing the indexing.
>> >> >>
>> >> >> But the number of documents in the both scenario were same. I am
>> >> wondering
>> >> >> how that can be possible?
>> >> >>
>> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <
>> nitinml...@gmail.com>
>> >> wrote:
>> >> >>
>> >> >>> Hi Erick,
>> >> >>>  I am just saying. I want to be sure on commits
>> >> difference..
>> >> >>> What if I do frequent commits or not? And why I am saying that I
>> need
>> >> to
>> >> >>> commit things so very quickly because I have to index 28GB of data
>> >> which
>> >> >>> takes 7-8 hours(frequent commits).
>> >> >>> As you said, do commits after 6 seconds then it will be more
>> >> expensive.
>> >> >>> If I don't encounter with **"overlapping searchers" warning
>> messages**
>> >> >>> then I feel it seems to be okay. Is it?
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> >> erickerick...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >>  Don't do it. Really, why do you want to do this? This seems like
>> >>  an "XY" problem, you haven't explained why you need to commit
>> >>  things so very quickly.
>> >> 
>> >>  I suspect you haven't tried _searching_ whi

45 matches

Mail list logo